SAP Predictive Analytics Developer Guide
SAP Predictive Analytics Developer Guide
2018-04-12
1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
10 Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.1 Data Type Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.2 File Format Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.3 Language ISO Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
The SAP Predictive Analytics Developer Guide describes how to integrate Predictive Analytics functions into
your environments and products.
Information Description
Your role when using Predictive Analytics for OEM What You Can Do as an Administrator or Developer [page
10]
The main concepts and contents of the Automated Analytics Main Concepts [page 14]
API
A sample scenario that takes you through the complete Common Operation Workflow [page 40]
process of how to create and use a model
How to integrate Predictive Analytics for OEM for in-process Integration [page 57]
or client/server use
The set of input and output parameters that can be found in Model Parameter Reference [page 91]
the parameter tree of each object
How to write scripts for data-mining tasks using the propri KxShell: SAP Predictive Analytics Command Line Interpreter
etary KxShell tool [page 212]
How to access data using the Data Access API Integrating with the Data Access API [page 227]
Audience
This guide is intended for integrators familiar with the use of Software Development Kits and Application
Programming Interfaces.
Version Changes
A section about user roles has been added, see What You
Can Do as an Administrator or Developer [page 10].
API References
SAP Predictive Analytics API is available in different integration schemes (Java, C++, CORBA,…). While the
objects and calls manipulated in the API have the same name from one scheme to another, there are only
minor changes from one scheme to another. Find the following Automated Analytics API reference on the SAP
Help Portal.
Reference Description
Data Access The C API to define a specific data access to SAP Predictive
Analytics
The library to be implemented by the integrator is specified through the KxDataAccess.h header file, which
describes all the API calls to implement when creating a new UserStore. Sample files of a UserStore code are
also available. Find these files on the SAP Help Portal.
Learn More
Go one step beyond in the exploration of the Python API by reviewing the following tutorials on the SAP
community:
This guide presents the latest release of Predictive Analytics for OEM, which corresponds to the version 3.3 of
the Automated Analytics API.
What's New:
● Support of a Python scheme for the Automated Analytics API. See Implementations [page 28].
● Python sample scripts are available in the installation directory. See Sample Scripts [page 54].
Predictive Analytics for OEM provides advanced data analysis functions that can be embedded into third-party
products and environments.
This is not a standalone statistical environment. Its components can be integrated into full-blown
environments such as SAS or SPSS.
An engineering product
It integrates some state of the art algorithms, with their associated engineering heuristics and scientific
methodology describing the usage constraints. Selected algorithms must be able to give some a-priori
estimations such as the estimated memory usage, and the estimated time to completion, and a way to assess
the validity of their results.
In particular, it must be able to check that it is used in a proper environment and to warn the users in
understandable form for any violation of usage constraint. It allows non-statisticians to quickly apply the
provided algorithms on proprietary data.
Capabilities
Training, running, deploying and maintaining predictive models are its main capabilities. It relies on the
Automated Analytics API that allows you to develop and perform the following types of predictive analysis:
● Clustering analysis
● Classification and regression analysis
● Time series analysis
● Social network analysis
A scientific laboratory
Instead of giving all the possible tuning parameters to the end-user, the product can use some proven
heuristics associated to the mathematical foundations to allow a fast design of operational solutions, and to
reduce the number of tuning parameters.
A complete data analysis environment
SAP Predictive Analytics for OEM does not offer extended data pre-processing and data visualization facilities.
There are a lot of tools on the market, and the most complex or domain specific pre-processing should be done
within the data sources.
A vector for scientific spread
Most of the internal technology remains the sole property of SAP and is subject to patent.
The concepts listed in this topic are the main concepts that you manipulate when working with the APIs. They
are divided in two categories:
A list of datasets and a list of protocols define a model. Models and datasets can be saved and restored from
stores.
You will also find descriptions of useful interfaces that are Context and Factory.
Related Information
4.1.1 Model
Models are responsible for managing state transitions of the data analysis tasks and for connecting the
components used in this data analysis, such as the data sources and the transforms. Models are the only entity
that can be saved and restored between working sessions.
● Regression
● Classification
State Diagram
1. Initialize and set the context and some characteristics such as names, parameters, protocols, datasets,
and stores. Parameter changes are validated through the validateParameter method. Every model can
show a parameter view of its own datasets and protocols under the sections 'Datasets' and 'Protocols' of
the parameter tree.
2. Check consistency between the components such as the datasets and the transforms. To do so you need
to proceed as follows:
1. Check that all datasets can be opened and that the transforms are initialized.
2. Check or enforce that all datasets have the same variable descriptions.
3. Ask the transforms to do the following:
○ To check the required datasets are present.
○ To check the compatibility of the variable types and eventually install some extra transforms.
○ To set the normalization type and reject policy.
○ To set the unknown reject policy.
○ Stop the process to change some characteristics or to compute some intermediate results.
○ Resume/Abort the execution. The learn/adapt/apply phase can lead to several internal phases
as decided by the transform.
3. Adapt/Learn/Apply the service to/of the dataset. The send method runs the process directly and the post
method creates an internal thread to return immediately to the frontend.
4. Extract/Interpret the adaptation/learning/application results through the examination of the result
parameters.
Related Information
4.1.1.1 Regression
This predictive model builds a mapping to associate a value to some input variables (presented as a Case). The
Training data set must provide examples. An example is made of a case containing the input variables
associated with one target (dependent) value. Cases can be made of variables of different types: continuous,
nominal or ordinal.
The target variable must be coded as a number, which implies it is a continuous type. When the target variable
is discrete (which means that the task is actually a classification forced into a regression), the model will code
this target. This code is generally effected by associating '1' (for example) to each case having a specific
discrete value and '-1' to the others.
There are two main types of transforms used in regression: some are purely numeric algorithms, which means
that they only accept numbers as inputs. For these algorithms, nominal variables must be coded with numbers.
The other type is symbolic and only accepts symbols as inputs. For these algorithms, continuous variables
must be coded into ranges.
Depending on the transform, the model will apply some intermediate transforms in order to adapt the variables
coding to the final algorithm or some intermediate transforms add extra variable based on the input variables
in order to add more information on the model. For example, a date variable will be reencoding into multiple
variable that describe the day of year, or the day of week.
Regression model results are made of some statistics on the difference between the target values and the
generated values on the training set.
This predictive model builds a mapping to associate a class to some input variables. The training data set must
provide examples. An example is made of a case containing the input variables associated with one target
(dependent) value. Cases can be made of variables of different types: continuous, nominal or ordinal.
The target variable must be a discrete symbol representing the class. When the desired variable is continuous
(which means that the task is actually a regression), the model will code the target. This code is generally
obtained by associating one symbol to the target values above the median and another symbol for the target
values lower than the median.
Again, there are two main types of transforms used in classification: some are purely numeric algorithms,
which means that they only accept numbers as inputs. For these algorithms, nominal variables must be coded
with numbers. The other type is symbolic and only accepts symbols as inputs. For these algorithms,
continuous variables must be coded into ranges.
Depending on the transform, the model will apply some intermediate transforms in order to adapt the variables
coding to the final algorithm.
Classification model results are computed using misclassification costs, and lift curves on the training set. In
the present release, classification can only be made between two classes.
Segmentation models groups case contained in a training data set. This clustering generally relies on a notion
of distance between cases. Most of the times, the system tries to represent the data set through specific cases
(that could be synthetic) that are called prototypes. It is used to understand some underlying coherent
behaviors in the events.
4.1.1.4 Forecasting
Forecasting models are special cases of regression. Specialization is done through some specific pre-
processing of time varying values. Such values are called signals and the extraction of the periodicity of these
signals allows to restrain the search space used by the transforms, and gives better results that the pure
regression techniques.
The purpose of data representation is almost every time linked to data compression or how to find a way (axes)
to represent the cases with the minimum loss of information allowing performing a certain task. Most of the
times, this data compression can only be done if the user knows how data will be used after compression: for
example, an attribute that could be regarded as noise for one task could be very important for another one.
● Synthetic information that helps you understand a set of events. For example, some basic statistics such
as the mean and standard deviation of variables. This information is stored into the transform itself as a
result.
● Infomation saved as new items or numbers for each event or case into databases or files. For example, a
score associated with a credit risk for a particular customer.
Transforms hold the actual statistical or machine learning algorithms and must go through a learning phase to
be ready for use. This learning phase is done at the model level and used for the estimation of parameters and
the computation of some results or descriptive information.
Note
Although frontend users cannot initiate learning or adaptation phases, for specific uses it is possible to test
a transform and store the results in a temporary space. This particular mechanism saves intermediate
results. However, users do not need to know anything about the implementation details of the transform
algorithms.
Transforms must be customized using parameters. However, all components are designed to keep the number
of user-defined parameters as small as possible. The frontend is only allowed to edit the names and the
parameters. The models actually control the transforms.
Internal parameters that are computed during 'Learn' or adapted during 'Adapt' can be accessed by the
frontend through the parameters under the Results directory. Results are read-only.
When the frontend wants to stop the running computation, it asks for the Model::stopRequest method. This
sends the appropriate event. It is up to the computation thread to process the stop request at some breakable
points.
State Diagram
The following state diagram shows the possible transition paths from the 'Created' to the 'Ready' state.
The classification/regression engine builds models implementing a mapping between a set of descriptive
variables (inputs of the model) and a target variable (output of the model). It belongs to the family of the so-
called regression algorithms. This is a proprietary algorithm that is our own derivation of a principle described
by V. Vapnik called "Structural Risk Minimization". The first quality of the models built with SAP Predictive
Analytics classification/regression engine is their robustness, which makes them very useful for very noisy data
sets. The second quality of these models is that they are based on polynomial functions, which are well known
mathematical objects. Furthermore, it is very easy to interpret the order 1 polynomial coefficients into variable
contributions. This allows end-users to make fruitful variable selection based on the robust information
extracted from the data.
The learning phase is very fast: only tens of seconds for a problem on 50,000 cases described with 15
attributes.
The data encoding engine is one of the corner stone of the application's features preprocessing utilities. Its
purpose is to find robust encoding of discrete variables into ordinal numbers that can be injected into
numerical algorithms such as Neural Networks, and the classification/regression engine.
The data encoding engine encode all discrete variables possible values with a number related to their rank in
terms of target mean. Its only originality is to eliminate poorly represented values.
4.1.3 Protocol
Protocols are responsible for holding together a chain of interfaces containing the algorithms, named
transforms, from the raw data to their encoding. The protocol passes through the final transforms in charge of
information extraction and generation.
You can choose the variable role through protocols. The four roles are:
Role Characteristics
weigth The variable indicated the relative weight case versus the
others. This mechanism allows giving more stress to some
specific cases in a database.
Strange Values
Information extraction can manage the problem of processing strange values, which are either missing or out
of range. This problem is solved through a set of policies that are implemented by the protocols during the
check phase.
Each protocol is characterized by a policy that can take one of the three following values:
● Transform deals
The protocol asks its transforms to discuss together in order to process strange values. This is the default
mode.
● skip
The protocol forces all transforms to use a skip action, meaning that the cases with strange values will be
discarded both during training or application.
● Warning
This level is associated with warnings that are generated for every strange value. If this level is higher than
the context level, the user will not see any warnings.
Note
You can change these three parameters between training and application phases.
4.1.4 Context
The Context interface provides a way for the user to integrate models within in-house environments or
applications. It is basically a handle to four callback procedures that are called when events to be displayed to
the user occur.
'userMessage' is the method called by any component that want to inform the front end of some event.
Messages are build from a keyword and some arguments. A topic manager links this keyword with a message
template, thus allowing very easy internationalization or customization of the messages. Messages are
associated with a level. Only messages, with a level lower than the context level, are displayed to the user (level
0 means an error).
When a component decides it needs confirmation from the user, it calls the userConfirm method. This
displays a prompt to the user and asks for confirmation. It returns a Boolean value to the caller component.
Sometimes, especially, when restoring a model from a store, a component will require a value to the user
frontend (logins and passwords for example). It does so by using the userAskOne method.
Finally, when a running component hits a request for stop, it asks for the stopCallback method that asks the
user for the next operation which can be abort or resume.
4.1.5 Factory
The Factory implementation and cardinality depends on the interface framework (C++ or CORBA). Its role is to
allow the creation of components by user frontends.
Besides the component creation role, the Factory collects information about the host machine such as
duration of basic operations (add, multiply operators...) and available memory.
Stores federates several physical spaces and help frontend designers to show available stores, spaces and
models. Stores can be viewed as a file directory, a database, or an Excel workbook.
Models can be saved into specific spaces contained in stores, allowing one to view the models saved into a
store and to restore the model.
● Its name
● Its class name
● Its version
● The date of its last save
● The space name of its last save
Open allows to access data spaces in a given location with a user and a password that can be empty strings.
When a store is open, it can be asked for its subspaces. There is one specific subspace called 'KxAdmin' which
holds the descriptions of models saved in this store.
4.1.7 Space
Spaces are responsible for preparing the environment to fetch the data from their actual sources, and for the
description of these variables composing a case. The actual fetch is made through the creation of case
iterators.
Note
A space is generally associated with a model through a role name. This association between a name and
space is called a dataset.
● Name
● Description
● Storage type (number, string, and date)
● Value type (discrete, continuous, ordinal, date)
● Level indicating if the variable is considered as a key that could be used to retrieve this sample in the data
sources
● Level indicating if the variable corresponds to an order of the cases
● Group indicating if the variable must be considered with others
All the other operations are done internally within the components and are just shown here to present the
interactions between the spaces and models or protocols. Most of the operations are either about the access
or guess variable descriptions or statistics. They mainly consist of opening a new space within an open store
and, via the begin method, creating case iterators that are used by all subsequent protocol stacks to read or
write data within a space.
A space can only be opened in 'read' or 'write' mode and not both. If the user wants to perform both, it has to
open two spaces, one in 'read' mode and the other in 'write' mode.
State Diagram
Case iterators are responsible for reading cases from opened spaces and are classical design patterns used in
all state of the art environments.
The main operations consist of actioning case iterators to fetch or write the next line, and to access cells within
an iterator (get or set values). Cells will be accessed accordingly to their storage types defined in the
description and cells can have empty values.
● The Control API allows you to create, estimate, and validate descriptive or predictive models.
● The Data Access API allows you to extract information from data sources of different formats such as text
files, or relational databases tables. It has been specifically designed to minimize memory consumption
and can be used on large databases and data warehouses.
The purpose of the application components is to provide executable modules that embeds objects described in
this overview. See the two following topics to view two class diagrams that illustrate the main concepts of the
API.
Related Information
The following class diagram describes the inner structure of the Control API.
The following class diagram describes the inner structure of the Data Access API.
4.3 Implementations
You can use the Automated Analytics API through different schemes. All the API calls have the same
parameters and the same semantic in all integration schemes.
Scheme Description
Java Native Interface (JNI) This scheme uses a JNI layer (this option may not be available on Linux plat
forms). Here, two dynamic libraries are used (a Java Native wrapper library and
the standard C++ library).
CORBA The CORBA object model that you can use with a wide range of programming lan
guage such as C++, Java, or Python. In this case, the API appears as a standalone
executable, which is accessed by the CORBA communication layer.
JNI and CORBA Common Interface It is provided on top of JNI and CORBA Java implementations to facilitate the
switch from one environment to another, and facilitate the integration process.
C++ and CORBA Common Interface It is provided on top of the C++ native and CORBA C++ implementations. This
scheme, provides more natural C++ API (error by exceptions, function has no
output arguments, but returned values), and the same types are used in the in-
process or client/server integration. Thus, it allows to switch easily from in-proc
ess to client/server integration (as only a few lines may differ).
Python In Python language directly, through a SWIG layer. It relies on a Python module
and on two dynamic libraries, that is, a SWIG layer and the SAP Predictive Analyt
ics C++ library.
The following table resume the different schemes and the different integration paths.
Some of the design choices of the components were made because these components are designed and used
in distributed computing environments such as CORBA. This section presents some of these choices.
One of the major things that could strike people familiar with distributed computing designs is that not all
objects can be created remotely, even if most of the objects can be accessed through remote interfaces.
For example, frontends cannot create data spaces and transforms, they must go through a model to do so. This
is because data spaces are class factories for case iterators and, for obvious performance reasons, these
iterators must belong to the same memory space than the transforms.
So we choose to force transforms and the data spaces to be created in the same memory space. We could
make one the class factory of the other but there was no philosophical reasons to choose one or the other, so
we decide to choose a higher abstraction level to do so: and this was the birth of the model. Furthermore,
models allow having a single entry point to save and restore full protocols, which is very handy.
The following table presents the accessibility of the objects in the distributed environment CORBA:
No (front-end) Yes
No Yes Parameter
No Yes Parameter
No Yes
4.4.1 Internationalization
As it has been said the kernel uses complex mathematical functions that could encounter problems very
deeply in their execution. Front-end applications must/can be informed of a lot of events that occur during
fairly long processes. These events come from a large variety of problems. This is why a resource mechanism,
that can be easily customized with a text editor, has been integrated within the components.
The core software uses an internal messaging service based on these resources. This system allows, in a
distributed environment, several front-ends (with several languages) to communicate with a single modeling
server.
All training scenarios are based on using a training dataset in order to adapt the values of the transform
parameters to a particular task. It is the responsibility of the external user of the software to prepare this
dataset before entering this scenario.
Preparation here only means collect the data into a single file, table, view or SQL statement compatible with an
ODBC driver. This dataset can come into a single dataset called the 'training' dataset, or several datasets,
called 'estimation', 'validation', and eventually 'test' datasets. In this step the user declares the directory in
which the files are stored or the ODBC source in which the table will be accessed.
Accessing Metadata
Variable or column of every training dataset must be described in terms of storage detailing the string, number,
date, datetime, angle and value whether it is nominal, ordinal, continuous or textual.
This variable can be taken as an identifier and used later by the system to synchronize the newly generated
values at the proper place. The API provides a facility to guess this information directly from the data through
SQL statement and a scan of the first 50 lines of the file by using the metadata, except if a KxDesc_"dataset
name" file exists in the same directory.
Managing Variables
The user can exclude some variables from the study. The variables are not taken as inputs of the model to
generate the target variable which the user chooses. The user can choose either a continuous variable, in this
case the problem solved is a regression problem, or a Boolean variable, of which the problem is a two-classes
classification problem.
This is done through a generic structure called Parameter. The design of this hierarchical structure is close to
the Microsoft© Windows registry. It is a very versatile structure reachable by varying text based or graphical
user front ends.
As an example, for a classification model only one parameter is needed, which is the order of the polynomial
models created. For data encoding, the only important parameter is the profit strategy, used only for
classification problems to associate a profit to each of the two possible classes and thus compare some
models, based on their relative profit.
The model checks variable descriptions and their compatibility with the chosen transform to be sure that the
training is valid. Internally, this creates a CaseIterator that scans through the data space to pass information
to the transforms. Memory management for this iteration is handled by a run time class called Stack.
CaseIterator and Stack are volatile objects that cannot be seen from the external world. Missing and out of
range values appearing in the cases that are processed by this stack are automatically taken care of.
In this scenario the user can either start from a model generated in the previous scenario or reload a previously
trained model.
The user applies the model on new data. This requires that the user specify a new space in which input
variables values are stored. It is very important that this space has the same structure than the one used for
In this scenario, the user asks the model to be applied on a single case that is filled, variable by variable, from a
user interface.
This simulation capability allows the user to test the results of the application of the model. When the problem
is a regression problem, the user can request an error bar on top of the actual estimation. When the problem is
a classification problem, the user can request a probability instead of a score.
Here are some key elements to the functional specification of the application.
User front ends can specify internally created components through the generic parameter interface. This is
available for Models and Stores on the one hand, and Spaces, Protocols, and Transforms on the other hand
(dark gray in the previous schema).
To take advantage of the international messaging service offered by the application, all end user applications
have to define a specific adapter called Context. The context is a call back handler that allows the components
to inform the user front end of progress.
SAP Predictive Analytics provides data source classes to access text files, and ODBC compliant data sources
for Windows and Linux environments (light blue in the previous schema). Extra data source types can be
added, depending on the user demands, such as DBMSCopy, Excel and OLE DB (yellow in the previous
schema).
This section presents the functions to be called in the API for the main modeling functionalities. Java Samples
[../../EXE/Clients/Samples/java] described in this tutorial are also available.
5.1 Import
The following table presents the object definitions to import in your development environment for the different
schemes:
Scheme Import
C++
#include "Kxen_CPP.h"
Java JNI
import com.kxen.KxModelJni.*;
Java CORBA
import com.kxen.KxModel.*;
Python
import aalib
5.2 Configuration
You configure the SAP Predictive Analytics kernel to perform the following tasks:
You can configure the SAP Predictive Analytics kernel by setting key/value pairs through one of the following
API calls:
A configuration file is a text file that contains a set of key/value pairs. Each line contains a key, a tabulation, and
the associated value. You must make sure to keep the tabulation while editing the file to avoid loading issues.
You can also specify the configuration in an ODBC table if ODBC is available on your system.
Note
● Standalone KxCORBA and KxShell executables load their own configuration file at startup
(KxCORBA.cfg and KxShell.cfg). The files are located in the same directory as the executables.
● The configuration files of the standard distribution load the license file provided by SAP Predictive
Analytics that is also a configuration file.
Example
You can set the key/value pairs directly instead of loading the configuration from a text file.
Call the setConfiguration function for each key/value pair you want to set.
Example
The following table details the different configuration keys supported by SAP Predictive Analytics components.
Key Description
FileStoreRoot Adds new entry in the list of File "Root stores". File Root
stores is the list of store retrieved by SAP Predictive
Analytics, when one asks for the list of available store
(store.lsDirGet) at the top position ("", the blank
string).
TempDirectory Sets the directory used for temporary internal storage. This
directory might be used by certain components for tempo
rary internal storage.
UserStore Declares a new possible Store class, using the Data Access
API. This allows declaring that a Dynamic Library is available
on the system to access external data. The value associated
with this key should be a couple of string, separated by a ':'
character:
CustomerId Sets the CustomerId license key. The string value associ
ated with this key is the encrypted key provided by SAP
Predictive Analytics through the license file. This entry is
generally set by the license file (which is in turn set by a
Config entry).
EngineMode Sets the EngineMode license key. The string value associ
ated with this key is the encrypted key provided by SAP
Predictive Analytics through the license file. This entry is
generally set by the license file (which is in turn set by a
Config entry).
Note
Make sure that files are referenced with relative paths in the global configuration file as shown below.
loadAdditionnalConfig("KJWizard.cfg");
Note
Make sure that the file name is passed as argument, not its absolute path. You can also specify this
additional file relatively to the store name.
Example
Key=Value
FileStoreRoot=UserDir
FileStoreRoot=../../../Samples
FileStoreRoot=DefaultRoots
LogConf=logconf.txt
MessageDirectory=../../../Resources/Messages
KTCStore.StoreClass=Kxen.FileStore
KTCStore.StoreOpen=../../../Resources/KTCData
#uncomment to define your own ktc store
#UserDefinedKTCStore.StoreClass=Kxen.FileStore
#UserDefinedKTCStore.StoreOpen=KTC_Test_Data\Test_Rules
KxDesc=KxDesc
KxAdmin=KxAdmin
# Comment to activate the Explain feature
DataAccessExplanation.*.Activated=false
SKDXml=../../../Resources/BusinessObjects_KCDefinitions_dfo.xml
Config=$UserDir/.SAP_AA_License.cfg
Config=../../../../License.cfg
Config=../../../KxStatTr.cfg
Config=../../../DataCacheManager.cfg
Config=../../../FastWrite.cfg
Config=../../../SparkConnector/Spark.cfg
Note
If the configuration file cannot be found, the error 2147024894 E_SYSTEM__NOENT occurs when calling
loadAdditionnalConfig. Check the path.
Once the license is loaded, you may get the error KXEN_E_ABSTRACTTRANSFORM_BADLICENSE if the key
code defined in License.cfg is not correct or if BusinessObjects_KCDefinitions_dfo.xml is not
reachable.
In this section you can find code snippets of the KxTutorial.java sample script that describe the workflow
of commonly used operations.
Samples are provided in Java language using the Java common interface layer, which allows you to switch
easily from CORBA (client/server mode) to JNI (standalone application).
Example
First, the client application must initialize the SSL Layer. Then call the getFactory function to get the Class
Factory.
mFactory = KxenClassFactory.getJNIFactory(CONFIG_DIR,CONFIG_FILE);
CORBA:
Authenticated Server:
You are required to load a license file in order to build a model. There are three ways to load a license file:
A model is the holder of the different Transforms and holds the whole process chain from raw data to the final
outputs.
mModel = mFactory.createModel("Kxen.SimpleModel");
Note
The string Kxen.SimpleModel is a keyword to one possible model definition. Currently, only this type of
model is supported.
Once a model is created, it does not include any Transform. You can add a SAP Predictive Analytics Modeler -
Regression/Classification (K2R) engine in it, by doing one of the following:
Such a simple model is able to perform general regression or classification task. However, to perform a
regression or a classification model, the training dataset must be encoded. SAP Predictive Analytics provides a
component (Consistent Coder) to encode your training dataset. This component could be explicitly or
implicitly added in your learning process.
You define datasets to be used with the current model. A dataset is a physical data support, like a text file or an
ODBC table, associated to the model through a role, which defines how the model is going to use the data.
● A store, for example the directory or the ODBC source where the space can be found
● A space name, for example the file name, the table name, or a SQL statement
● A role, for example "Training", "Estimation" or "ApplyOut"
/** Default value for the kind of store used for reading data */
private static String sStoreClass = "Kxen.FileStore";
/** Default value for the Name of the store to open, in this case it is the
path of the current directory.
*/private static String sStoreName = ".";
Note
There is no user nor password used here because the directory where data are stored is a filestore,
hence the double quotes.
/** Default value for the Name of the data file to open as a space */
private static String sDataFile = "Census01.csv";
3. Read or guess the data descriptions by calling one of the following functions:
○ Load them:
/** Default value for the description file to use for the data file.
private static String sDescriptionFile = "Desc_Census01.csv";
mModel.readSpaceDescription("Training", sDescriptionFile,
lStoreIndex);
mModel.guessSpaceDescription("Training");
The variables are created in the protocol so that they can be accessed for parameter modifications. By default,
the current algorithm will select the last compliant variable as target.
Parameters are trees of key/values pairs. Before running the model, you can tune the following parameters:
1. Call the getParameter function on the object with an empty string ("") to retrieve its parameter tree.
2. Set the correct values to the corresponding node in the tree.
3. Validate or commit the changes made to the parameter tree.
Example
Set the CutTrainingPolicy of the model to the value random and the role of the variable class to
target through the following script:
Note
getParameter and validateParameter are expensive function calls, as they convert an object
internal state into a tree of parameters or revert the parameter tree into the internal state. You can group
the changeParameter calls for each object, in order to limits the number of such calls. See the
documentation Components Parameters for an in-depth description of such parameters.
A role is defined for each variable. The following table describes the available roles and their corresponding
code:
Role Code
Note
● skip for all variables which have a KeyLevel different from 0 (key of your dataset)
● target for the last compliant variable if the current algorithm requires a target
Example
For example, to exclude the age variable, call the changeParameter function and set the variable age to
skip as follows:
mModel.changeParameter("Protocols/Default/Variables/age", "skip");
mModel.getParameter("")
mModel.getParameter("Protocols/Default/Variables")
gKxenModel.ParaModel.getSubEntries "Value", lstrTemp1, lstrTemp2
Note
Value can be replaced to obtain all roles, storages and so on.
Op Description
tion
The postMode function is an asynchronous call, which means it waits while the model state is running.
Note
The loop above corresponds to the one implemented by the KxShell command waitRunning.
In this case, the learning process will be fired in a separate thread. This can be used to keep a Graphical User
Interface active while learning the model. For example, in Visual Basic environment a call to DoEvents will keep
the GUI reactive. Also, retrieving information and error messages can be done here. For more information, see
Message Management [page 52].
Once the model has been generated you will need to retrieve its results.
Use the function getSubValue to get a value (iElement) from the parameter tree:
It returns a double value. If any error occurs on converting to double, it returns -1.
Example
Call the getKiKr function to retrieve the statistic values for Prediction Confidence "KR" and Predictive
Power "KI":
1. Load variable names and role from the model using the getParameter function.
2. Get roles as an array of name/role using the getSubEntries and getValue functions.
Example
All the output results are stored in the resulting parameter tree. You can retrieve output values computed by
the model through the parameter objects, by using the getParameter and getNameValue functions. In the
Java Wizard interface, all the displayed graphs are simple plots of values found in the model parameter tree
after the training phase. You can also apply the model.
Apply the transformation built during the training phase on new data and produce some expected output by
proceeding as follows:
a. Set a new input dataset, with a role ApplyIn.
b. Set a new output data, created by the model, with a role ApplyOut.
c. Send an Apply message to the model with the sendMode function.
The data files in the following code are located in the same store than the training data. It illustrates the latest
procedure:
Once you create and validate a model on the basis of KI/KR (Prediction Confidence/Predictive Power)
indicators, you can save it.
mModel.setName(sModelName);
Here, for simplicity, use the same store to save the model in. A model can be indifferently saved in a text file
or in an ODBC table.
Note
A commitModel call is also available to be able to save a new version of the same model in the same
Space (file or table).
mModel = lStore.restoreLastModelD(sModelName);
Alternatively, instead of restoring the latest version, you can also load a specific version of a model:
int lModelVersion = 1;
mModel = lStore.restoreModelD(sModelName, lModelVersion);
Note
When the model is loaded, create some memory space to avoid any performance drop. To do so you
need to delete the store:
lStore.release();
At that stage, the model is fully restored and ready for use. It can be applied to a new dataset or some
parameters can be retrieved to display some curves, indicators and so on. For more information, see Displaying
the Results
When you release a model from the memory, you free all the corresponding resources on the server (for
example in CORBA framework).
mModel.release();
Note
It also releases recursively all the objects created through the model such as Transforms, Space, Parameters
and so on.
To remove a model from the disk storage, call the eraseModelD function at the store level.
SAP Predictive Analytics for OEM may send messages to your integration environment.
The type of message is indicated by a message level as described in the following table:
● Push
You implement a class that inherits from IKxenContext and pass such a "context" object to the Automated
Analytics API objects used. To do this, you call the setContext function that is available from models and
stores. All messages from theses objects are forwarded to the context object in the integration
environment.
● Pull
You generally need to filter and dispatch the messages according to the level of interaction needed.
Progress report messages are sent to the integration environment as regular string messages.
Where:
Example
This example illustrates the parsing of the message string before print:
The message translation is done by SAP Predictive Analytics for OEM and depends on the language requested
by the application.
● Push
In the setContext call, when you give the context object to the Automated Analytics API
● Pull
In the getMessage call, when you get back each message.
Kx<Module>_<languageCode>.umsg
For example, the distribution provides the following language files: KxTool_us.umsg, KxTool_fr.umsg, and
KxObject_us.umsg.
These files must be loaded in the system at the start up of the application. This is already done for server
processes, such as the SAP Predictive Analytics CORBA server. For in-process integration (JNI, C++), you load
them through a configuration file that lists the available message files for translation. You load the configuration
file with the loadAdditionnalConfig function.
SAP Predictive Analytics provides Java and Python sample scripts to test the Automated Analytics API.
5.5.1 Prerequisites
● You must install a valid license in the following location of your system:
○ In the C:\Program Files\SAP Predictive Analytics\ folder on Microsoft Windows
○ In the folder where you have decompressed the KXENAF archive file on Linux.
● You must set up J2SDK 1.6 on your system and declare the JAVA_HOME environment variable.
● You must set the PATH environment variable to the following value:
○ %JAVA_HOME%/bin on Microsoft Windows
○ $JAVA_HOME/bin on Linux
Use the following sample scripts to run the sample scenario described in this guide.
File Description
KxTutorial.java The complete script described in Common Operation Workflow [page 40].
KxContext.java An object used to retrieve SAP Predictive Analytics messages. See Message Man
agement [page 52].
KxTutorialAdvanced.java An advanced scenario that scores each line of a data file (simulation).
The following batch files are located in the C:\Program Files\SAP Predictive Analytics
\Predictive Analytics\...\EXE\Clients\Samples\java\script, where they must be executed.
1. Run the prepare.bat batch file to retrieve the files needed to run the sample scenario, such as
configuration files and datasets.
These files are located in the C:\Program Files\SAP Predictive Analytics\Predictive
Analytics\...\EXE\Clients\Samples\java\script directory.
2. Run the compile.bat batch file to compile the Java sample scripts.
These files are located in the C:\Program Files\SAP Predictive Analytics\Predictive
Analytics\...\EXE\Clients\Samples\java\src\ directory.
3. Finally, run the run.bat batch file to execute the scenario.
The batch files are located in the KXROOT/SamplesSrc/java/script directory, where they must be
executed.
1. Run the prepare.sh batch file to retrieve the files needed to run the sample scenario, such as
configuration files and datasets.
These files are located in the KXROOT/SamplesSrc/java/script directory.
2. Run the compile.sh batch file to compile the Java sample scripts.
These files are located in the KXROOT/SamplesSrc/java/src directory.
3. Finally, run the run.sh batch file to execute the scenario.
File Description
classification.py A script that demonstrates the classification model scenario with the Python
scheme of the Automated Analytics API.
clustering.py A script that demonstrates the clustering model scenario with the Python
scheme of the Automated Analytics API.
Make sure you have installed Python 3.5 on your system. Possible distributions are CPython, Anaconda, or
WinPython.
The installation of Python 3.5 automatically declares the PYTHONHOME environment variable and updates the
PATH environment variable with the %PYTHONHOME% value.
1. Run the setvars.bat batch file to configure the PATH and PYTHONPATH environment variables.
The file is located in the C:\Program Files\SAP Predictive Analytics\OEM\EXE\Clients
\Python35 directory.
○ PATH is updated with the directory of the Automated Analytics C++ dynamic libraries.
○ PYTHONPATH is updated with the directory of the aalib.py file.
2. Open the sample script for editing and modify the AA_DIRECTORY value with the Automated Analytics
installation directory, for example C:\Program Files\SAP Predictive Analytics\OEM\ or C:
\Program Files\SAP Predictive Analytics\Desktop\Automated.
The sample scripts are located in the C:\Program Files\SAP Predictive Analytics\OEM\EXE
\Clients\Samples\Python directory.
3. Run the sample script with the Python executable python.exe either on a command line or within a
Python notebook, Jupyter for example.
Make sure you have installed Python 3.5 on your system. Possible distributions are CPython or Anaconda.
The installation of Python 3.5 automatically declares the PYTHONHOME environment variable and updates the
PATH environment variable with the $PYTHONHOME value.
1. Source the setvars.sh batch file to configure the PATH and LD_LIBRARY_PATH environment variables.
The file is located in the Python35 directory of the Automated Analytics installation.
○ PATH is updated with the directory of the Automated Analytics C++ dynamic libraries.
○ LD_LIBRARY_PATH is updated with the directory of the aalib.py file.
2. Open the sample script for editing and modify the AA_DIRECTORY value with the Automated Analytics
installation directory, for example /opt/AutomatedAnalyticsOem_X86-64-redhat-
Linux-2.6.18-8.El5smp_v3.3.
The sample scripts are located in the Samples/Src/Python directory of the Automated Analytics
installation.
3. Run the sample script with the Python executable python either on a command line or within a Python
notebook, Jupyter for example.
The way to deploy and distribute SAP Predictive Analytics for OEM within your application depends on the kind
of integration. The components and files that must be embedded in your software are reviewed in the next
sections according to the different integration schemes.
In an "in-process" integration, the embedding software includes SAP Predictive Analytics for OEM within the
same process and memory space.
C++ Integration
You install the following components and files to use the C++ native implementation of the SAP Predictive
Analytics kernel.
Yes, this is the minimum configuration ● The SAP Predictive Analytics C++ library
for using the basic functionality. ● Some resource files and message translations for the supported languages.
The location of the messages should be specified to the kernel using a con
figuration key or file.
● For Linux, ODBC libraries, as provided by SAP Predictive Analytics
No, it depends on your modeling or ● SAP Predictive Analytics Advanced Access provides access to external data
data access needs. files such as SAS data files or SPSS files. It is implemented as an external
data access plug-in to SAP Predictive Analytics.
● A Japanese library to process Japanese language with Text Coding feature. A
set of external libraries must be used (specific stemming process).
● Teradata FastLoad to improve the read/write performance of Teradata con
nections. It is implemented as an external data access plug-in to SAP
Predictive Analytics. Teradata FastLoad must be installed on the target ma
chine.
Note
● These components are activated by some additional configuration keys.
You can reuse existing distributed SAP Predictive Analytics configuration
files.
● Some of these components may not be available for all platforms sup
ported by the SAP Predictive Analytics kernel.
The following table gives the name of the files associated with each resource presented above according to the
platform. Files between parentheses may not be available on all platforms.
statrn64.dll libst.so
iconv.dll (libst.so.11)
libxml2.dll (libstodbc.so.11)
(libiconv_st.so*)
libmecab.dll
Note
● On Microsoft Windows, the DLL files can be found for example in the EXE\Clients\CPP directory.
● On Linux, the libraries are located in SAP Predictive Analytics libs directory. The libraries needed at
runtime must also be included in the shared library search path (environment variable
LD_LIBRARY_PATH).
Java Integration
Java integration is done through Java Native Interface (JNI). To deploy SAP Predictive Analytics with a Java
application using JNI, you install the components and files needed for the C++ integration and the following JNI
resource files:
● KxCommonInterf.jar
● KxJniInterf.jar
● KxJni.jar
● KxUtils.jar
● KxenJni3.dll on Microsoft Windows, libKxenJni3.so on Linux.
Python Implementation
Python integration is done through a SWIG wrapper. To deploy SAP Predictive Analytics with a Python script,
you install the components and files needed for the C++ integration and the following Python integration layer:
In the following case of a client/server integration, your client application communicates with an SAP Predictive
Analytics Authenticated Server. It is assumed that the server standard installer is used to configure the server.
Java Integration
The CommonInterf layer can be derived for CORBA communication. You must deploy the following JAR files
with the application:
● KxCommonInterf.jar
● KxAuthentInterf.jar
● KxAuthent.jar
● KxUtils.jar
SAP Predictive Analytics does not provide specific wrappers for other languages. You must use a CORBA client
implementation layer and integrate the CORBA description of the SAP Predictive Analytics Server used with
these layers.
For example, using Python and a CORBA implementation (for example omniORBPy), you need to import
KxAuthServer.idl into the development environment.
Segmented modeling means adding a filter to a data space so only rows matching the filter will feed the
Automated Analytics modeling engine.
For example:
Age>10
Default Filtering
Default filtering (or abstract filtering) is used by the engine when the data storage has no native filtering
capability (File, SAS File, etc.):
For the filter age>10 AND Class<>empty, the following process is applied:
Optimized filtering is used by the engine when the data storage has a native filtering capability (DBMS):
● The filter is translated into an SQL expression and added to a WHERE clause.
● Only rows matching the filter are sent by DBMS so all rows are sent to the modeling layer.
For example, the previous examples are translated into SQL as:
Logical Operators
The operators AND, OR are available. These operators can have any number of operands (>2) not only 2. The
classic optimization is used which means that the evaluation stops:
Comparison Operators
The operators <, <=, > and >= are available. The usage of empty values as value to test is checked and
forbidden.
The operators = and <> are available. The usage of the empty values is allowed and translated as IS NULL and
IS NOT NULL.
Remarks
The filter evaluation is done after the mapping process meaning the variable names are the logical variable
names and not the physical fields.
The Data Cache is compatible with Filters. And only the filtered values are stored in the cache.
The KxIndex variable cannot be used in a filter (it may work in a file but will trigger an error in an ODBC). This
is not checked at this time.
The values of KxIndex in a filtered space are generated after the filtering: the standard sequence 1,2,3... is
visible (and not 1,45,74... depending on the KxIndex values coming from the non filtered space)
Syntax of a filter
<FilterCondition>:= <LogicalOperator>{<SimpleFilter>} |
<LogicalOperator><FilterCondition>
<SimpleFilter>:= <Operator><Variable><Value> |
<Operator>:=Equal|NotEqual|Greater|GreaterEqual|Less|LessEqual
<logicalOperator>:=And|Or
In the previous BNF, each bold word is a new parameter name or subtree name.
For example:
Parameters/FilterCondition/Operator “And”
/SimpleFilter1/Operator “Greater”
/SimpleFilter1/Variable “Age”
/SimpleFilter1/Value “10”
/SimpleFilter2/Operator “NotEqual”
/SimpleFilter2/Variable “Class”
/SimpleFilter2/Value “”
st.newSpace "adult01_500.csv" sp
sp.getParameter ""
sp.bindParameter "Parameters/FilterCondition" AndFilter
AndFilter.insert "SimpleFilter1" SimpleFilter
delete SimpleFilter
AndFilter.insert "SimpleFilter2" SimpleFilter
delete SimpleFilter
delete AndFilter
sp.validateParameter
sp.getParameter ""
sp.changeParameter "Parameters/FilterCondition/Operator" "And"
sp.changeParameter "Parameters/FilterCondition/SimpleFilter1/Operator" "Greater"
sp.changeParameter "Parameters/FilterCondition/SimpleFilter1/Variable" "age"
sp.changeParameter "Parameters/FilterCondition/SimpleFilter1/Value" "10"
Parameters/FilterCondition/Operator “Or”
/SimpleFilter1/Operator “Less”
/SimpleFilter1/Variable “Age”
/SimpleFilter1/Value
“10”
/FilterCondition2/Operator “AND”
/SimpleFilter1/Operator “Greater”
/SimpleFilter1/Variable “Age”
/SimpleFilter1/Value
“10”
/SimpleFilter2/Operator “NotEqual”
/SimpleFilter2/Variable “Class”
/SimpleFilter2/Value “”
st.newSpace "adult01_500.csv" sp
sp.getParameter ""
sp.bindParameter "Parameters/FilterCondition" OrFilter
OrAndFilter.insert "SimpleFilter1" SimpleFilter
delete SimpleFilter
OrFilter.insert "FilterCondition2" FilterCondition2
FilterCondition2.insert “SimpleFilter1” SimpleFilterLevel2
delete SimpleFilterLevel2
FilterCondition2.insert “SimpleFilter2” SimpleFilterLevel2
delete SimpleFilterLevel2
delete FilterCondition2
delete OrFilter
sp.validateParameter
sp.getParameter ""
sp.changeParameter "Parameters/FilterCondition/Operator" "Or"
sp.changeParameter "Parameters/FilterCondition/SimpleFilter1/Operator" "Less"
sp.changeParameter "Parameters/FilterCondition/SimpleFilter1/Variable" "age"
sp.changeParameter "Parameters/FilterCondition/SimpleFilter1/Value" "10"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/Operator" "And"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/SimpleFilter1/
Operator" "Greater"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/SimpleFilter1/ /
Variable" "Age"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/SimpleFilter1/
Value" "10"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/SimpleFilter2/
Operator" "NotEqual"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/SimpleFilter2/ /
Variable" "Class"
sp.changeParameter "Parameters/FilterCondition/FilterCondition2/SimpleFilter2/
Value" ""
sp.validateParameter
< Less
<= LessEqual
> Greater
>= Greater
= Equal
<> NotEqual
Special Case
An elementary filter like age>10 must be expressed as AND operator with only one operand.
Example
Age>10
Parameters/FilterCondition/Operator “And”
/SimpleFilter1/Operator “Greater”
/SimpleFilter1/Variable “Age”
/SimpleFilter1/Value “10”
Links to information about the new features and documentation changes for Integrating Generated Codes.
Codes for SAP HANA, Hive, Spark, and Vora are now sup ● About Code Generation [page 66]
ported. ● Other SQL Codes [page 85]
MySQL, WX2 and Neoview are no longer supported. ● About Code Generation [page 66]
● Other SQL Codes [page 85]
●
Sybase has been added to the list of available UDFs. SQL UDF [page 86]
The table listing the differences of results has been updated. Available Implementations of Code Generation [page 68]
Code generation is a component that exports regression and segmentation models in different programming
languages. The generated code enables you to apply models from outside of the application. It reproduces
operations made by the application when encoding data and creating either classification, regression,
clustering or recommendation models. All code types are not available depending on the model definition.
The following table details how to use the variables used in the API call.
Variable Description
iType Key code of the generated language. For the list of available
key codes, see section Available Keycodes.
Key Code
AWK
CPP
JAVA
JSON
PMML3.2
SAS
Note
For SAS code, use the call generateCode2 directly to set the application dataset and the key of the
application dataset.
Note
For SQL code, use the call generateCode2 directly to set the application dataset and the key of the
application dataset.
HIVE Hive
SPARK Spark
Full Documentation
Scores obtained by using generated codes should be the same as those obtained with the application.
However, slight differences may exist, mainly due to precision issues in computation.
Caution
● Only C++ and Java codes can work with composite variables.
● All generated codes, except for PMM3.2 and AWK, can work with dateparts. A datepart is an automatic
information extraction from an input date or datetime variable. Note that the code generator doesn't
support date or datetime variables that are not split into dateparts. It means that if the final equation
contains a date or datetime variable that is not split into dateparts, the application cannot generate an
export of your model.
AWK ! ! ! !
C ++ ++ ++ ++
CCL ++ !! !! ++
CPP ++ ++ ++ ++
DB2V9 ++ !! !! ++
DB2UDF ++ !! !! ++
HANA ++ !! !! ++
HANA UDF ++ !! !! ++
Hive ++ !! !! ++
JAVA ++ ++ ++ ++
ORACLE ++ !! !! ++
Oracle UDF ++ !! !! ++
PMML 3.2 ++ !! ++ ++
PostgreSQL ++ !! !! ++
SAS ++ ++ ++ ++
Spark ++ !! !! ++
SQLDB2 ++ !! !! ++
SQLNetezza ++ !! !! ++
SQLServer ++ !! !! ++
SQLServerUDF ++ !! !! ++
SQLTeradata ++ !! !! ++
SQLVertica ++ !! !! ++
SybaseIQ ++ !! !! ++
SybaseIQUDF ++ !! !! ++
Vora ++ !! !! ++
Caption
Symbol Meaning
++ The syntax is correct and results are the same as the ones
obtained with Automated Analytics engines1.
Note
1Results may be slightly different due to precision issues. Since each variable introduces a delta, the more
variables the model contains, the more the results can differ.
2 Database types without right trim (RTrim) consider as disctinct two categories with names only differing
by an ending whitespace.
The generated AWK code allows applying a model on flat files with a simple script. There is: no need to compile,
just run it through a single command line as follows:awk -F"," -f myawkmodel.awk myinputdata >
output.csv
6.4.2 C Code
iValues an array of string (char*) containing the input values for the
current record. These values must be in the exact same
order as the training dataset (target and skipped values
included, even empty)
This function writes the model output in the argument file as follows:
where n is the number of targets. For Classification/Regression, the output is the classification score when
dealing with binary target and the predicted value when dealing with continuous target, and for Clustering, the
output is the cluster index.
Note
When generating probability in classification mode for Classification/Regression, the output appears as
shown below:
Example
On Windows platform, using Microsoft C compiler, the following command should compile sources:
cl /o model.exe
-DKX_GENERATEDCFILENAME=\"myModel.c\"
-DKX_MODELFUNCTIONNAME=mymodel_apply
-DKX_FIELDSEPARATOR=","
-DKX_NB_FIELDS=300 main.c
This command generates the model.exe file. To run this executable file to score a flat dataset, use the
following syntax:
where:
Note
myinputdata must have its columns in the exact same order as the training dataset - target and skipped
variables included.
Continuous Computation Language (CCL) is an event processing language of SAP HANA Smart Data
Streaming (SDS). CCL is based on Structured Query Language (SQL), and adapted for stream processing.
Generating CCL code allows embedding prediction code associated to a model in a Smart Data Streaming
project.
The code generation module of Automated Analytics generates one or more output stream definitions that
compute scores and other prediction values. Those stream definitions can then be included in a streaming
project. Your Advanced Analytics model must be trained using a dataset containing a subset of your streaming
events. This dataset must have variable names matching exactly the name of the input stream that will be used
for scoring.
The following table details the parameters needed to generate CCL code.
In some situations where additional information beyond the score is requested, the CCL code generator may
produce more than one stream. In that case, cascading streams are produced to build the required prediction
information. The name of the final output stream is deduced from the name of the generated file by removing
Only the key information is added by default to the output stream along with the prediction information. If you
want to keep all input fields and not only the key, you can set the global parameter named
CodeGenerator.CCL.AddAllInputStreamInResult in the Automated Analytics configuration file.
Example
Add the following line to the KJWizard.cfg file or the KxCORBA.cfg file, depending on the component you
have installed.
CodeGenerator.CCL.AddAllInputStreamInResult=true
The framework is close to the Java code generator framework as both frameworks use vectors and maps to
manage models.
The C++ code generator framework is based on several interfaces and classes:
The C++ code generator produces a single file, the model class definition and the implementation.
Each file contains only a single class defining the apply function of a model. In addition, the name of the model
is equal to the class name (this name is created by the application). Each generated model is registered in the
model manager (this process is automatic, see static at the end of this example).
Example
Sample Code
...
// definition of all categories
...
class cKxMyModel : public virtual KxCppRTModel
{
public :
cKxMyModel();
cKxMyModel(KxCppRTModelManager& iModelManager);
// Others functions definition...
This section details the behavior and usage of each class in the C++ code generator framework.
6.4.4.2.1 KxCppRTModel
● String getModelName(): this method returns the name of the model as a string
● apply(KxCppRTCase, KxCppRTCase): this method applies the model on a data row called a case. It
needs input object providing input variable values and returns an object containing results value. Both
input and output values are using an object to represent a set of values that will be described further.
● StringVector getInputVariables(): this method specifies the input variables the generated model
needs. It returns a vector of needed input variable names.
● StringVector getOutputVariables(): this method specifies the output variables generated by the
model. It returns a vector of output variable names.
class KxCppRTModel
{
public:
virtual ~KxCppRTModel() {};
virtual const KxSTL::string& getModelName() const = 0;
virtual const KxSTL::vector<KxSTL::string>& getModelInputVariables() const =
0;
KxCppRTCase interface allows feeding models with values. It provides services allowing the model to access
values by rank (generated codes use the variable rank internally), and allows the external environment to set
values using input variables names.
KxCppRTCase interface is implemented by the integrator to connect its physical data values to the model class
instances.
class KxCppRTCase
{
virtual ~KxCppRTCase() {};
virtual void setValue(KxSTL::string const& iName, KxCppRTValue const&
iValue) = 0;
virtual const KxCppRTValue& getValue(int i) const = 0;
virtual const KxCppRTValue& getValueFromName(KxSTL::string const& iName)
const = 0;
};
6.4.4.2.3 KxCppRTModelManager
The framework uses a model manager providing model registering facilities. It associates each model with a
name.
struct sPrivateData;
class KxCppRTModelManager
{
public:
~KxCppRTModelManager ();
static KxCppRTModelManager& instance();
void registerModel(KxSTL::string const& iModelName, KxCppRTModel* iModelPtr) {
mModelFactory[iModelName] = iModelPtr;
}
static const KxCppRTModel& getKxModel (KxSTL::string const& iModelName) {
return instance().getModel(iModelName);
}
KxSTL::vector<KxSTL::string> getListModel() {
...
}
private:
KxCppRTModelManager () {}
const KxCppRTModel& getModel (KxSTL::string iModelName) {...}
6.4.4.2.4 KxCppRTValue
struct sValueData;
class KxCppRTValue
{
public:
KxCppRTValue(KxCppRTValue const& iOther);
KxCppRTValue();
KxCppRTValue(KxSTL::string const& iValue);
KxCppRTValue(const char* iValue);
~KxCppRTValue();
KxSTL::string const& getValue() const;
KxCppRTValue& operator=(KxCppRTValue const& iOther);
private:
struct sValueData* mValueData;
};
struct sValueData {
KxSTL::string mValue;
sValueData() {}
sValueData(KxSTL::string const& iValue) : mValue(iValue) {}
sValueData(const char* iValue) : mValue(iValue) {}
sValueData(sValueData const& iOther) : mValue(iOther.mValue) {}
};
The provided main sample program will create an instance of the generated model and feed it with the proper
cases.
Sample Code
#include "StringUtilities.h"
#include "KxCppRTModelManager.h"
#include "SampleMappedCase.cpp"
int main( int argc, char ** argv )
{
FILE* lInFile = NULL;
FILE* lOutFile = stdout;
...
lInFile = fopen(..., "r");
lModelName = ...;
lOutFile = fopen(..., "w");
...
// return model called CPPModel
const KxCppRTModel& lModel = KxCppRTModelManager::getKxModel(lModelName);
// return the variable names used
KxSTL::vector<KxSTL::string> lInputNames = lModel.getModelInputVariables();
In this sample implementation, generated codes are located in a stand-alone dynamic library (DLL). To add a
new model runtime in this DLL, update dll_X86-WIN32.mak makefile. Replace the value of MODEL_OBJECTS
target by the list of generated models.
The generated HTML code is an HTML page that can be viewed in any Javascript compliant Web browser. The
user has to fill each variable value and click the target link at the bottom to access the model output.
where model.java is the generated java code. This generates a file named model.class containing java
bytecode.
Then, to use the model, the KxJRT.IKxJModelInputWithNames interface must be implemented. This object
is passed as an argument to the IKxJModel.apply() method and defines how to retrieve the input data.
● java.lang.String[] getVariables()
returns the variable names used in the data source the model is to be applied on.
● boolean isEmpty( int iVarIdx, java.lang.String iMissingString )
returns whether the current value of the variable with index iVarIdx in the array returned by
getVariables is the empty value or not.
Other methods (floatValue(), intValue(), ..., dayOfWeek()) that convert a value into a correct data
type are also available. These methods are described in the IKxJModelInput interface.
Example
Consider an object DataProvider able to provide variable values, the following code could be:
...
import KxJRT.*;
...
class KxJModelInput extends DataProvider implements IKxJModelInputWithNames {
...
// IKxJModelInputWithNames interface
String[] getVariables() {
// use DataProvider to get the list of available variables
...
}
The previous class reads a flat file and converts string values into the desired type. To have a concrete
implementation of this class, please refer to KxJRT.KxFileReaderWithNames for an example code.
...
import KxJRT.*;
...
// String containing the model class name. For example,
// if model.java was generated, then lModelName is "model" .
java.lang.String lModelName;
...
// ask the factory to instanciate the class.
IKxJModel mKxModel = KxJModelFactory .getKxJModel( lModelName );
...
// instanciate IKxJInputModelWithNames
KxJModelInput lInput = new KxJModelInput( ... );
...
// instanciate KxJRT.KxJModelInputMapper (see KxJRT.Mapper class )
KxJModelInputMapper mMapper = new KxJModelInputMapper ( lInput,
mKxJModel );
...
// instanciate a data source (it may be already done via lInput)
DataProvider mDataProvider = new DataProvider( ... );
...
// main loop that reads each data row and applies the model on it
while( mDataProvider.hasMoreRows() ) {
Object[] lResults = mKxJModel. apply( mMapper );
// store or print the results somewhere.
// here print results on standard output
for( int i=0; i<lResults.length; i++ ) {
System.out.print( lResults[i].toString() );
if( i+1<lResults.length ) System.out.print(",");
}
System.out.println();
}
...
The above code should be compiled with KxJRT.jar in it, and executed with both KxJRT.jar and the
directory containing model.java in it.
Note
See KxJRT.KxJApplyOnFile for sample code.
"Usage: [-nonames] [-separator <sep>] [-out <file>] -model <model> -in <file>
Note
Setting nonames implies the input file has the same
structure as dataset used for training.
Example
To apply the model SampleModel.java on SampleDataset.csv (that is a comma separated values file)
and store results in results.csv: javac -classpath KxJRT.jar SampleModel.java java -jar
KxJRT.jar -model mymodel -in SampleDataset.csv -out results.csv".
The help is obtained by typing this command: java -jar KxJRT.jar -usage.
Predictive Model Markup Language (PMML) is an XML markup language used to describe statistical and data
mining models. It is edited by the Data Mining Group (http://www.dmg.org/ ). PMML is supported by
products listed on this Web page: http://www.dmg.org/products.html .
This section explains how to generate scores in a database with PMML and DB2 IM Scoring V7.1 from a model.
During the DB2 installation, use the same logon and password for the DB2 instance user as for your Windows
account. Otherwise, the scripts will be unable to automatically stop and start your DB2 instance when required.
Tip
To launch a DB2 command, use C:\>db2cmd DOS command.
To launch script.sql SQL script in the DB2 environment (in a DB2 CLP DOS window), use C:\>db2 -
stf script.sql DB2 command.
The Oracle8i 8.1.7 database version is required to work with DB2 IM Scoring.
Tip
To launch an Oracle command, use C:\>sqlplus user/password@connectionstring DOS command.
To launch script.sql SQL script in the Oracle environment, use SQL> @script.sql Oracle command.
Caution
Before installing DB2 IM Scoring, create a specific user (into DB2 or Oracle) to store all IM Scoring tables.
Download Program Temporary Fixes (PTFs) from IBM Web site: http://www-3.ibm.com/software/data/
iminer/scoring/downloads.html .
1. Launch the DB2 command in IM Scoring bin directory: idmEnableDB.bat DatabaseName fenced
2. Set the database parameters to increase memory management:
a. Launch the following DOS script:
b. Be sure that your DB2 instance has been restarted by checking the DB2 instance UDF_MEM_SIZE
property. This parameter must be set to 60000.
1. Use idm_setup.bat DOS script with the SYS Oracle user (default password: change_on_install).
2. In samples\Oracle IM Scoring directory, use idm_create_demotab.sql Oracle SQL script to create
tables containing PMML models.
Once the PMML file has been generated, you have to load it into the database.
This insert is made with an SQL query and with an IM Scoring function.
In IM Scoring, a regression PMML model is inserted into REGRESSIONMODELS table with DM_impRegFileE
function.
DB2 IM Scoring create its own data types and functions (User Defined Function) on this schema.
DM_impRegFileE function from the IDMMX schema is used to transform a PMML file into
DM_REGRESSIONMODEL DB2 IM Scoring data type.
Sample Code
connect to DATABASE;
insert into IDMMX.REGRESSIONMODELS
values (
'KXENModel',
IDMMX.DM_impRegFileE(' C:\directory\model.pmml ','Windows-1252')
);
Finally, when the PMML model is in the database, you can apply the model on a table and generate scores.
We are working in a database with an SQL query, so this apply can be done with an UPDATE or INSERT query.
In DB2 IM Scoring, the functions DM_getPredValue, DM_applyRegModel and Dm_ApplData are used to
apply a regression model.
The script below is a sample DB2 SQL query for creating a result table SCORE_RESULT and applying the model
on TABLE_TO_SCORE table. The original model was generated with TABLE_TO_SCORE table. Scores are
generated in SCORE_ RESULT table with two columns, CLASS column to be predicted and the score computed
by the model.
connect to DATABASE;
create table SCORE_RESULT (
CLASS INTEGER,
SCORE DOUBLE
);
insert into SCORE_RESULT
select
t.CLASS,
IDMMX.DM_getPredValue(
IDMMX.DM_applyRegModel(r.model,
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
IDMMX.DM_applData (
'age', t.AGE),
'workclass', t.WORKCLASS),
'fnlwgt', t.FNLWGT),
'education', t.EDUCATION),
'education-num', t.EDUCATIONNUM),
'marital-status', t.MARITALSTATUS),
'occupation', t.OCCUPATION),
'relationship', t.RELATIONSHIP),
'race', t.RACE),
data in_dataset ;
set out_dataset;
Note
The parameter &Key must be replaced by the ID of the dataset to have an ID with a score.
The SQL code is database-dependent and some context-dependent variables have to be set in the SQL query
before applying:
Note
In an automatic process, the search-and-replace job can be done with a PERL or AWK script. A special SQL
for MySQL database has been released because of the symbol surrounding variable names ( " instead of
standard " ).
The generated SQL ANSI code is compliant and should work on most databases which are not yet supported
by the application.
SAP Automated Analytics allows users to set the following parameters for code generation:
Separator [SQL]: allows customizing the SQL separator between two SQL queries. By default, this
parameter is set to GO.
Hive N/A ?
Spark N/A ?
VORA N/A ?
The generated SQL code creates a SQL User Defined Function (UDF) computing a score from model
parameters.
The syntax for creating a UDF and describing its parameters depends on the DBMS. The SQL UDF code
generators for the following DBMS are available:
● DB2
● Oracle
● SAP HANA
● SQLServer 2000
● Sybase
The file generated by the code generator contains all necessary SQL instructions to install the UDF.
You must use the standard SQL front-end of the DBMS to execute the generated file.
The following table details the standard SQL front-end for each DBMS:
Oracle SQLPlus
Caution
Since the generated file can contain the instruction to drop the UDF before re-creating it, it is normal for the
SQL front-end to signal an error the first time when the UDF does not exist yet.
Except the UDF header, the generated SQL code does not use special features of the DBMS or ANSI mode
setup.
The current user must have the actual rights to drop/create a UDF. Check with your DBA.
When installed, the UDF extends SQL exactly as a standard SQL function.
SELECT
ClassPredictedByKXAF(age,workclass,fnlwgt,education,educationnum,maritalstatus,occu
pation,relationship,race,sex,capitalgain,capitalloss,hoursperweek,nativecountry)
FROM Adult
As a convenience, all generated UDF code files include a comment with a typical usage of the current UDF.
The application allows users to set the following parameters for code generation:
SmartHeaderDeclaration [UDF]: this parameter allows excluding from the generated code all the non-
contributive variables (variables with a contribution of 0). The default value of the parameter is set to true for
the application to generate an UDF declaration with only useful variables. In some cases, it can significantly
reduce the size of the generated code. Changing the value of this parameter has no effect on the final results.
Each DBMS has different options for fine tuning the UDF.
● ORACLE
○ PARALLEL_ENABLE: self explained.
○ DETERMINISTIC: SQL code does not use or change external values. Each UDF call with the same
actual parameters gives the same result. It allows Oracle to cache previous calls result.
● DB2
○ DETERMINISTIC: SQL code does not use or change external values. Each UDF call with the same
actual parameters gives the same result. It allows Oracle to cache previous calls result.
○ NO EXTERNAL ACTION: no external DBMS resource (file, lock, and so on) is used or changed
○ CONTAINS SQL: indicates that the code does not read or modify other SQL data.
● SybaseIQ
○ DETERMINISTIC: SQL code does not use or change external values. Each UDF call with the same
actual parameters gives the same result. It allows Oracle to cache previous calls result.
● Other UDFs
○ No specific option is used.
The application generates specific Teradata User Defined Function. The scoring code is written in C (with
Teradata coding conventions) and the UDF itself is described in SQL.
Caution
Teradata C UDF is only available on Teradata V2R5.1.
Considering a model with n targets, the code generator generates n+2 files:
● The Teradata C code: the file name is given by the user with the extension .teraudf. This file contains all
the C code needed to build the UDF for all targets.
● The SQL creation and description of the UDF: the file name is the same as the one given to the C code file
with a .bteq extension. This file contains all SQL wrappers needed to describe the UDF from a SQL
perspective.
● One C file per target (that is n files) which name is <Name><TargetName>.c, where <Name> is the name
given by the user. These files are only created for technical reasons linked to the limitations of the Teradata
compilation environment.
Example
If the user has chosen the name MyUDF and a model with two targets, class and sex, four files will be
generated:
You must use Teradata BTEQ tool to execute the .bteq file. In the BTEQ tool, enter:
● .login <node>/<login>
● .run file=<UDF Name>.bteq
Caution
Since the generated file contains the instruction to drop the UDF before re-creating it, it is normal bteq
signals an error the first time when the UDF does not exist yet.
The current user must have the actual rights to drop/create a UDF. Check with your DBA.
When installed, the UDF extends SQL exactly as a standard SQL function.
SELECT
ClassKXAFPredicted(age,workclass,fnlwgt,education,educationnum,maritalstatus,occupa
tion,relationship,race,sex,capitalgain,capitalloss,hoursperweek,nativecountry) FROM
Adult
As a convenience, the .bteq file includes a comment with a typical use of the UDF.
This comment contains useful information, for example the application version and the generation date.
To see this comment in any Teradata request tool, type: Comment on function <UDF Name>
You can also display a full description of the UDF and its parameters by typing: Show function <UDF Name>
in any Teradata request tool.
Since parameters types (INTEGER, FLOAT, and so on) are explicitly described in the UDF, you must call the
function with correct types parameters. If actual parameters types do not match, the DBMS displays this error:
<UDF name> not found. To solve this issue, use standard SQL CAST operators.
For the same reason, passing an explicit NULL value as an actual parameter also needs a CAST operator.
Because NULL is a special SQL value with no type, the parameter type will never match.
Example
If the first parameter of the UDF is described as a FLOAT, TheUDFName( NULL,… ) call must be replaced
by TheUDFName( CAST( NULL AS FLOAT ),…)
The test consisted in applying a model on an existing dataset (update mode) in three different ways:
6.4.11 VB Code
These variants must be in the exact order specified by the following function generated at the beginning of the
script:
This section presents the model parameters as they are displayed in the parameter tree.
Syntax
Path: <Model_Name>
In SAP Predictive Analytics, a model is more than an algorithm. It contains all the information needed to
process the data in the original format they are stored in the source database up to the modeling results. This is
why a model is described by the data sets it has been trained on and by the protocol (which is a chain of
transforms) it uses. All this information is saved in a tabular form when a model is saved.
The models are adaptive models: they must be generated, or trained (the terms "estimated" or "adapted" are
also used), on a training data set before they can be used, or applied, on new data sets. As all objects defined in
SAP Predictive Analytics architecture, a model is described by parameters, which are described in the following
sections.
Syntax
Path: Parameters
CutTrainingPolicy Indicates the way a training dataset is cut into three subsets ● random with no
when needed (estimation, validation and test). The impact test (default except for
of each of these strategies can be finely tuned through the Time Series)
parameters of the training dataset.
● sequential with
no test (default for
Time Series)
● random
● periodic
● sequential
● periodic with
test at end
● random with test
at end
● periodic with no
test
CodingStrategy Version of the model strategy used to build the model By default, it is the number of
the SAP Predictive Analytics
version used to generate the
model.
Syntax
Path: Parameters/AutoSave
This folder contains the necessary information to automatically save the model at the end of the learning
phase.
AutosaveEnabled Boolean value that indicates whether the model must be au ● False (default): the
tomatically saved at the end of the learning phase model will not be saved
● True: the model will be
automatically saved
AutosaveStoreIdx Integer that represents the return value of the openStore ● 0 (default)
function used when saving the model to indicate the store ● Any unsigned integer
location
Syntax
Path: Parameters/CodeGeneration
This folder contains the parameters used to generate the code corresponding to the model.
NbLineForScoreCard Number of colors used to display the result lines in the score ● 2 (default)
card
● any integer
Separator Allows you to customize the SQL separator between two ● GO (default)
SQL queries
● Any user-defined string
SmartHeaderDeclarat Allows you to exclude from the generated code all the non- ● true (default): all non-
ion contributive variables (variables with a contribution of 0). In contributive variables
some cases, this can significantly reduce the size of the gen are excluded from the
erated code. Changing the value of this parameter has no ef generated code
fect on the final results.
● false: all variables are
included in the gener
ated code
7.2 Infos
Syntax
Path: Infos
Additional information is stored in this folder after the model has been generated, that is, when the
Parameters/State value is ready. All these parameters are read-only.
Author Name of the user who has generated the model By default it is the name of
the user logged on the ma
chine
ApplyTime Duration of the latest application of the model (in seconds) ● 0 (default): this value is
used when the model
This duration is updated each time a model receives one of
has never been applied
the following commands using the sendMode call:
● Positive integer
● Kxen_apply
● Kxen_filterOutlier
● Kxen_test
BuildDate GMT time at the end of the generation process, in the format ● blank (default)
YYYY-MM-DD HH:MM:SS ● Any date
KxenVersion Version of SAP Predictive Analytics used to generate the Any SAP Predictive Analytics
model version number (for example
2.1.0)
Model32Bits Boolean value that indicates if the model has been gener ● true
ated on a 32-bit architecture
● false
FilterConditionStri Definition of the filter applied on the training dataset, if there ● blank (default)
ng is one
● Logical expression
LastApplyStatus The status of the last application (or test) task on the model ● Success: the task has
been successfully com
It is empty if the model has never been applied.
pleted with no warning.
● Failure: errors have
prevented the task from
completing successfully.
● SuccessWithWarni
ngs: the task has been
successfully executed
but some warnings have
been encountered (for
example conversion
problems).
● Aborted: the user has
canceled the process.
Syntax
Path: Infos/ClassName
This parameter allows you to assign a class to the model. This will make it easier to sort your models and find
them (for example when using Model Manager). You can use the project name or the type of campaign (churn,
up-sale, ...).
Default If the value of this parameter is not set by the user, the value Any user-defined string
set in the Default subparameter is used. The Default
System values:
parameter is automatically filled by the system and depends
on the type of model. ● Kxen.Classificat
ion (classification
model - nominal target)
● Kxen.Regression
(regression model - con
tinuous target)
● Kxen.Segmentatio
n (segmentation model
- SQL Mode)
● Kxen.Clustering
(clustering model - no
SQL Mode)
● Kxen.TimeSeries
(time series model)
● Kxen.Association
Rules (association
rules model)
● Kxen.Social (social
networks analysis
model)
● Kxen.Recommendat
ion (recommendations
model)
● Kxen.SimpleModel
(multi-target models,
any other model)
7.3 Protocols
Syntax
Path: Protocols
A protocol is a stack of transforms applied on the data as it is stored in the databases or files.
Each transform in this stack generates information of higher and higher abstraction level, and more and more
related to business questions. A good way of picturing a protocol is to remember communication protocol
stacks that are in charge of transporting information from one point to the other using more and more complex
structures. A protocol is referred to in the model through its name ('Default' for SAP Predictive Analytics).
A protocol contains information about the chain of transforms processing the data, and about all variables
used or produced by these transforms. Furthermore, the high level strategies driving the transforms behaviors
are also defined in the protocol. 'Kxen.SimpleModel' uses only one protocol (whose default name is
'Default'), but, as an example, multi-class models will use several protocol (each protocol dedicated to a two-
classes problem).
Syntax
Path: Protocols/Default/Variables
This folder contains all information on variables used in the modeling process. This does not only mean the
variables that are stored into the spaces (files or DBMS tables), but also variables that are created by data
preparation transforms within protocols.
Variables are defined by their name, some high level descriptions, and their role with respect to the protocol.
They contain information about statistics on each useful dataset. Variables can be accessed either through the
protocol objects or from the datasets objects.
Syntax
Path: Protocols/Default/Variables/<Variable_Name>
The parameter tree devoted to the variable can be very large, because it contains information about the
statistics collected on this variable. Parameters are under several groups for clarity of the presentation.
Basic Description
KeyLevel Read-only when the Integer A number different than 0 indicates that this
model is in ready state variable can be considered as a part of the iden
tifier of each line.
OrderLevel Read-only when the integer A number different from 0 indicates that this
model is in ready state variable can be used to sort the cases in a natu
ral order.
MissingString Read-only when the String When a value equal to the string specified here
model is in ready state is found in the input space, the variable is con
sidered as missing. This allows coping when
specific codes are used to represent missing
values (such as 9999 for a four digit unknown
year for example).
Group Read-only when the String Used to identify the group of the variable.
model is in ready state
This notion can be used to optimize internal
computation. Information is not searched in
crossing variables that belongs to the same
group. For example, when a color information is
already encoded into three disjunctive columns
in a dataset (one column for green, the second
for red, the third for blue), if an event is green, it
is useless to search for information in object
that could be both green and blue.
Advanced Description
The second group collects information about some elements that can be refined by the advanced user.
UserPriority Read-only when the Number This parameter can be used by the components
model is in ready state in internal computations.
UserEnableCompr Read-only when the Boolean When set to false allows the user to deacti
ess model is in ready state vate the target based optimal grouping per
formed by K2C on this single variable.
UseNaturalEncod Read-only when the Boolean The Natural Encoding Mode has been added for
ing model is in ready state the Variables. In this mode, only the original ver
sion of the variable is used by SAP Predictive
Analytics. Encoded versions are disabled and
exclusion criteria are relaxed for original ver
sions.
Caution
This will generally lead to non stable data
representation and coding.
UserModulus Read-only when the Allows the user to enforce the bands of the con
model is in ready state tinuous variables to be modulus of the given
value. For example this allows the user to en
force the fact that bands are always multiple of
1000 when dealing with monetary values.
InSpaceName Read-write String Another name of the variable that can be used
when getting results from an input data set. The
user can change the actual column in which the
values are read by changing the
InSpaceName of the variable. If not explicitly
changed, the technical column name to read
from is the name of the variable.
The final group is the entry point to get information about variable statistics on the different datasets.
Monotonicity Read-write Boolean Indicates that the variable is monotonic with re
spect to the line number.
StatForCode Read-only String Name of the data set used to encode the data.
SpaceOrigin Read-only Boolean true if the variable has been found in the input
space (training for example), and false other
wise.
Statistics Read-write Folder Folder in which the statistics on all data sets are
stored.
Syntax
Path: Protocols/Default/Variables/<Variable_Name>/Statistics/<Dataset_Name>
Statistics are collected for each dataset. Each dataset is referenced through its name. In this folder, you find as
many directories as there are valid datasets defined for the model. None of these computed elements can be
changed by the user.
Parameter Description
NbUnknown Number of observations where this variable is not known on the dataset
NbOutOfRange Number of observations where the variable is out of range (outside the data dic
tionary builds for nominal and ordinal variables, and outside the [min, max] range
for ordinal continuous variables)
IsMerged Used to reload some models that were saved with a lack of precision.
ProbaDeviation Probability that there is a significant deviation in the distribution of categories (or
bands in the case of continuous variables) between this data set and the data set
jointed by the parameter StatForCode. This probability is computed through a
CHI-square test.
Targets The new architecture of components allows for multiple targets. This folder is
prepared to hold information specific to each target, this comprises the groups of
the original categories and the computed values such as KI and KR. See Targets
[page 103].
Categories Sub tree collecting statistics of each category of nominal or ordinal variables. It
also correspond to some binning of continuous variables when this is required by
the following transforms. See Categories [page 104].
7.3.1.2.1 Targets
Syntax
Path: Protocols/Default/Variables/<Variable_Name>/Statistics/<Dataset_Name>/
Targets/<Target_Name>
This folder holds information that is specific to each possible target. You find a subfolder for each target
variable.
Parameter Description
AUC Area Under the ROC Curve of the variable with respect to this target variable
GINI GINI index of the variable with respect to this target variable
GroupProbaDeviation Probability that there is a significant deviation in the distribution of the categories
groups, created with respect to the target, between this data set and the data set
jointed by the parameter StatForCode
Groups Statistics of the groups that have been created for the target variable.
The grouping is different for each target variable, as the grouping strategy de
pends on the target variable itself. Groups for continuous variables are made. The
information under the "Groups" folder is equivalent to the information on the
original categories of the variables. For each, you will find a sub tree under the
group label. The sub tree is exactly equivalent to the one found under
CategoryName (see below).
OrderForTarget Default (or "natural) order of categories that gives the best KI (Predictive Power)
for the variable with respect to the target. This parameter is especially useful for
continuous and ordinal variables to express non-linear relationship with the tar
get, if the value is "profit" or "basic profit". Possible values:
● Increase
● Decrease
● Profit
● Basic Profit
This parameter corresponds to the points allowing to draw the curve used to en
code a continuous variable with respect to the target. Each point is given through
[X, Y] coordinates.
BarCurvePoints Only when the variable is an estimator of another variable (for example,
rr_<target>).
It corresponds to the points that allow drawing the piecewise linear interpolation
curve giving the error bar in relation to the score. This values can be used to plot
an expected error bar around the predicted value.
ProbabilityCurvePoints Only when the variable is an estimator of a nominal variable (for example
rr_<target> when target is nominal).
It corresponds to the points that allow drawing the piecewise linear interpolation
curve giving the probability in relation to the score.
7.3.1.2.2 Categories
Syntax
Path: Protocols/Default/Variables/<Variable_Name>/Statistics/<Dataset_Name>/
Categories/<Category_Name>
Code Code of the category is randomly assigned in the order of category appearance in
the dataset
Count Number of cases where this category has been encountered on the dataset.
Frequency Category frequency (ratio of the cases with this category on the number of cases
where this variable is known). This is found only in the parameter tree when pre
sented to the user (not in the saved version of the models).
ProbaDeviation Probability that the category is significantly different from its distribution in
StatForCode
SegmentMean Mean of the variable for this segment when this variable is continuous.
TargetMean Mean of the target for cases belonging to this category when the target is contin
uous.
SmoothedTargetMean TargetMean corrected to obtain a smoothed cycle. This component will only
be usable for a model generated by SAP Predictive Analytics Modeler – Time Ser
ies.
TargetVariance Variance of the target for cases belonging to this category when the target is con
tinuous.
UserProfit Profit that can be associated with this category when the variable is the target
variable. This user profit is then used when asking the model for profit curve with
user profits.
Targets Subtree collecting information about how the (several) targets are distributed for
cases with this category in the data set. Of course, this is only available when the
variable is not a Target Variable Name
NormalProfit Coding number associated with this category and used by SAP Predictive Analyt
ics Modeler - Regression/Classification in order to translate the category name
into a number when appropriate.
GroupCode Technical detail reserved for internal use. This parameter links towards the code
of the associated group, if any.
ProbaDeviation Probability that the category is significantly different from its distribution in
StatForCode, taken the distribution over the groups into account for the tar
get under consideration.
<Target_Category_Name>
● Count Number of times the category of the target has been encountered on
the data set for cases with this category.
Concept hierarchies allow the mining of knowledge at multiple levels of abstraction. A concept hierarchy
defines a sequence of mappings from a low level concepts to a higher level, and more generally, concepts
related to a single dimension (or variable, or column). In general, concept hierarchies describe several levels of
● A data structure provided by system users generally reflects a grouping used to store information in large
scale data bases or OLAP systems.
● A data structure provided by domain experts generally reflects background knowledge (a dimension
location aggregates cities in states, then aggregated in countries).
● A data structure provided by knowledge engineers generally reflects a grouping strategy to improve the
robustness of internal representations (a category very infrequent is aggregated with a larger one).
● A data structure can be automatically generated by SAP Predictive Analytics components or an external
tool (some tools generate ranges for continuous variables).
A data structure can be used by the components to build higher levels of abstraction. A data structure
represents the first level of aggregation of concept hierarchies. Data structure elements depend upon the
variable type (nominal, ordinal, or continuous).
● Nominal Variables
The data structure is described by groups of categories. Each group is designed with a name for the user, and
the list of possible values belonging to this group. In version 2.1 of the components, this list must be given in
extension (all values must be listed), but the possibility to use regular expressions will be given in the future.
The entry point in the parameter tree is called NominalGroups.
● Ordinal Variables
The data structure is described by ranges of values (called bands). A band is defined by a name for the user, a
minimum value, and a maximum value. These two values are assumed to belong to the range. These values can
be either number or strings (for which the alphabetical order will be used as the sort function). The entry point
in the parameter tree is called OrdinalBands.
● Continuous Variables
The data structure is described by ranges of values (called bands). A band is defined by a name for the user, a
minimum value, a flag indicating if the minimum value belongs to the range or not (open or closed boundary), a
maximum value, and a flag indicating if the maximum value belongs to the range or not (open or closed
boundary). The data structure for continuous variables is checked after the user has described it from a
parameter tree. The system automatically checks if the bands given by the user overlap (in which case it
outputs an error message), or if there is a 'hole' between successive bands (in which case the system
completes with the needed segment). The entry point in the parameter tree is called ContinuousBands.
Syntax
Path: Protocols/Default/Parameters
StrangeValueLev Read-write Used to send warning messages to the client ● 10 (default value)
el context when strange values are encountered ● Positive integer
by one transform in the protocol
VariableCountIn Read-only Number of variables in the source data sets ● 0 (default value)
Space ● Positive integer
WeightIndex Read-only Index of the weight variable in the source data ● -1 (default value)
sets. A WeightIndex of -1 means that there ● Integer
is no weight column defined in the source data
set
StrangeValuePol Read-write read-only The strange value policy is used when the sys ● Dialog (default
icy when the model is in tem checks the compatibility of the transforms value)
ready state and the available data. This "check" phase takes ● skip
place before model generation (or training). Af
ter the model has been generated, this value is
no longer used, because it is converted into a
set of actions to take place for each variable
contained into the protocols.
CopyTarget Forces the copy of the target into the apply out ● true (default
put during an apply process.
value)
● false
SoftMode Enables a mode where all the involved trans ● false (default
forms produce a default model when no model
value)
is produced by the standard learning process. In
this mode, when no model is found, SAP Predic ● true
tive Analytics Modeler - Regression/Classifica-
tion produces a constant model and SAP Pre
dictive Analytics Modeler – Time Series produ
ces a Lag1 Model. No effect on other trans
forms. This mode is disabled by default.
7.3.3 Transforms
Syntax
Path: Protocols/Default/Transforms/
Transforms are holding the actual statistical or machine learning algorithms. All transforms must go through a
learning phase to be ready for use. This learning phase is used for the estimation of parameters, and the
computation of some results or descriptive information. The transforms held by the protocols are the data
processing unit.
Whilst most transforms parameters are specific each the kind of transform used (SAP Predictive Analytics
Modeler - Data Encoding, SAP Predictive Analytics Modeler - Regression/Classification,... see below), some are
common to all transform types.
Syntax
Path: Protocols/Default/Transforms/<Transform_Name>/Infos
Additional information is stored in this folder after the model has been generated, that is, when the
Parameters/State value is ready. All these parameters are read-only.
LearnTime Read-only The time (in seconds) that this transform took to process
the learn request.
Syntax
Path: Protocols/Default/Transforms/<Transform_Name>/Parameters
The following parameters can be found in the Parameters folder of each Transform:
VariablePrefix Read-only when trans It indicates to the user the used prefix on the ● rr (default)
form is in ready state target.
● User-defined
string
Syntax
Path: Protocols/Default/Transforms/<Transform_Name>
Parameter Description
Extensions Folders under which an integrator can add specific information. Any information
stored in these folders will be saved in the model.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression
SAP Predictive Analytics Modeler - Regression/Classification builds models implementing a mapping between
a set of descriptive attributes (model inputs) and a target attribute (model output). It belongs to the regression
algorithms family building predictive models.
It also allows to specify weighting factor for each training case in order to adapt the cost function to the user
requirements. The output model can be analyzed in terms of attributes contribution weighing the relative
importance of the inputs and is characterized by two indicators:
7.3.3.2.1 Thresholds
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Thresholds/
This folder contains one subfolder for each discrete target variable and defines for each the threshold used for
the classification decision.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Thresholds/<Target_Name>
Syntax
Path Protocols/Default/Transforms/Kxen.RobustRegression/Thresholds/<Target_Name>/
Threshold
7.3.3.2.2 SelectionProcess
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/SelectionProcess
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/SelectionProcess/
Iterations
This folder contains all the iterations made during the selection process.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/SelectionProcess/
Iterations/<Iteration_Number>
Number of the iteration of the variable selection process. This folder contains the following parameters:
Parameter Description
Ki Predictive Power (KI) obtained on the validation dataset with the current itera
tion.
The predictive power is the quality indicator of the models generated using Chile.
This indicator corresponds to the proportion of information contained in the tar
get variable that the explanatory variables are able to explain.
Kr Prediction Confidence (KR) obtained on the estimation dataset with the current
iteration.
KiE Predictive Power (KI) obtained on the estimation dataset with the current itera
tion.
L1 is the residual mean (the mean of the absolute value of the difference between
the predicted value and the actual value); also known as City Block distance or
Manhattan distance.
L2 L2 obtained with the current iteration. L2 is the square root of the mean of square
residuals.
LInf LInf obtained with the current iteration. LInf is the maximum residual absolute
value.
Chosen This parameter indicates whether the current iteration is the one selected by the
variable selection process.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/SelectionProcess/
Iterations/LastUsedIterations
This folder contains the list of the input dataset variables with the number of the last iteration in which each
was used.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters
Order The degree of the polynomial model. This parameter is set ● 1 (default)
by the user before the learning phase and cannot be
● A positive integer
changed later on for a given model.
Strategy Allows the user to specify the type of strategy used to build ● WithoutPostProce
the regression model ssing (default)
● WithOriginalTarg
etEncoding
● WithUniformTarge
tEncoding
EncodingStrategy Controls the way the inputs are encoded. By default the en ● PieceWize
coding in a piece-wise linear. When in risk mode, SAP Predic Encoding (default)
tive Analytics uses a step wise linear encoding.
● StepWize
Encoding
MaximumKeptCorrelat Allows the user to set the maximum number of displayed ● 1024 (default)
ions correlation. This parameter accepts only an unsigned inte
● Integer
ger.
LowerBound Allows the user to set the threshold to define if a correlation ● 0.5 (default)
has to be displayed or not.
● real number
ContinuousEncode Allows the user to de-activate the encoding of the continu ● true (default)
ous variables. This is given to compare results between ver
● false
sion 1 and version 2 and allows a more precise control for ad
vanced users.
ExtraMode A special flag that drive the kind of outputs that the classifi- Assisted Modes / Generated
cation/regression engine will generate.
outputs
Depending on its value, the outputs generation will be done
● No Extra default) /
either in the expert mode or the assisted mode. The former key + predicted value +
allows selectively choosing the outputs and the latter will score
generate consistent and system-defined outputs. ● Min Extra / No
Extra + probabilities +
Note that when switching from the expert mode to the as
error bars
sisted one, the user choices are discarded and replaced by
● IndividualContri
those implied by the specified extra mode value.
butions / No Extra
+ variable individual
contributions
● Decision / No
Extra + decision
● Quantiles / No
Extra + approximated
quantiles
Expert Mode
● Advanced Apply
Settings (set in
ApplySettings)
PutTargetNameInIndi Used to guarantee the backward compatibility with version ● true (default)
vContrib prior to 2.1.1. It will prevent generating the name of the tar
● false
get in the individual contributions column names
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
IDBScoreDevConfig
This folder contains the parameter allowing you to generate SQL code that computes score deviations for the
model. This code will be executed at the end of the in-database application process.
Apply This parameter allows you to activate or deactivate the ● true (default): acti
IDBScoreDevConfig feature. vated
● false: not activated
7.3.3.2.3.2 GainChartConfig
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/GainChartConfig
This folder contains the parameters allowing to compute the gain chart. The gain chart allows you to rank your
data in order of descending scores and split it into exact quantiles (decile, vingtile, percentile).
The gain chart can be computed when training a model or when applying it. Two different folders contain the
gain chart parameters depending on the task to perform on the model:
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
GainChartConfig/Learn
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
GainChartConfig/Apply
NbQuantiles This parameter allows you to set the number of quantiles ● 10 (default)
you want to compute for the gain chart.
● Positive integer
ValueVariables This folder allows you to set the list of the variables for which
aggregated values must be computed for each quantiles.
7.3.3.2.3.3 VariableExclusionSettings
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
VariableExclusionSettings
ExcludeSmallKR This parameter allows you to indicate whether variables with ● system (default): the
a low predictive confidence (KR) must be excluded from the value (true/false) is au
modeling. tomatically selected by
SAP Predictive Analytics
● true: the variables will
be excluded
● false: the variable will
not be excluded
ExcludeSmallKIAddKR This parameter allows you to indicate if the variables for ● system (default): the
which the sum of the predictive power and the prediction value (true/false) is au
confidence (KI+KR) is too low must be excluded from the tomatically selected by
modeling. SAP Predictive Analytics
● true: the variables will
be excluded
● false: the variable will
not be excluded
7.3.3.2.3.4 VariableSelection
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
VariableSelection
DumpIntermediateSte It allows the user to specify that all intermediate iterations ● true
ps are saved in the parameter tree or not.
● false (default)
StopCriteria Contains the settings of the stop criteria to use for the varia
ble selection process.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
VariableSelection/SelectionMode
Mode Allows the user to specify the type of automatic process var ● ContributionBase
iable selection. d: for each iteration, the
variables that contains
the less quantity of in
formation are skipped.
Used with the parame
ter
PercentageContri
b.
● VariableBased: for
each iteration, a speci
fied number of variables
is skipped. Used with
the parameter
NbVariableRemove
dByStep.
NbVariableRemovedBy Allows the user to specify, in the case of the automatic varia ● 1 (default)
Step ble selection is in VariableBased, the number of skipped
● integer
variables by iteration process.
PercentageContrib Allows the user to specify the percentage amount of infor ● 0.95 (default)
mation to keep
● Real number
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/
VariableSelection/StopCriteria
QualityCriteria Allows the user to set the type of quality criteria to be used ● None: no quality criteria
for the automatic variables selection is set
● KiKr:(default) the qual
ity criteria is based on
the sum of the predic
tive power (KI) and the
prediction confidence
(KR).
● Ki: the quality criteria is
based on the sum of the
predictive power (KI)
● Kr: the quality criteria is
based on the sum of the
prediction confidence
(KR).
MaxNbIterations Allows the user to stop the automatic variable selection ● Integer
process when the number of iterations exceeds this value. ● -1 (default): no limit
MinNbOfFinalVariabl Allows the user to fix the final number of kept variable in the ● 1 (default)
es final model.
● Integer
MaxNbOfFinalVariabl Allows the user to fix the maximum number of variable to ● -1 (default) : all varia
es keep in the final model bles
● positive integer
QualityBar Allows the user to specify the quality loss by iteration. ● 0.01 (default)
● Real number
ExactNumberOfVariab Allows the user to force the final number of variable to be ● true
les equal to MinNbOfFinalVariables
● false (default)
SelectBestIteration Allows you to select which model of the variable selection ● true (default): the best
process will be used. Usually the best model is the one be model will be selected
fore last, however the quality of the last model can be suffi-
● false: the last model
cient for your needs and you may want to use it instead.
will be selected
FastVariableUpperBo Allows you to define the strategy to use when you have set ● true (default)
undSelection the parameter MaxNbOfFinalVariables. Two strat
● false
egies are available:
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/RiskMode
Description Values
Allows the user to activate the risk mode for a classification model. It allows ad ● true
vanced users to ask a classification model to translate its internal equation ob
● false (default)
tained with no constraints into a specified range of scores associated with a spe
cific initial score. When this mode is activated, the different types of encoding
that are used internally for continuous and ordinal variables are merged in a sin
gle representation, allowing a simpler view of the model internal equations. To
use this mode, you need to choose a range of scores associated with probabili
ties.
Available only for classification models, that is models with a nominal target
Parameters
RiskScore Allows the user to specify a low probability that will be asso ● 615 (default)
ciated with a low score.
● Real number
GBO Allows the user to specify a high probability that will be asso ● 9 (default)
ciated with a high score.
● Real number
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/RiskMode/
RiskFitting
Description Values
This folder contains the parameters allowing the user to control the way risk ● Frequency_Based (default)
score fitting is performed, that is, how SAP Predictive Analytics fits its own scores
● PDO_Based
to the risk scores.
[Quantile(MinCumulatedFrequency) ;
Quantile(1.0 - MinCumulatedFrequency)]
UseWeights Indicates whether to use score bin frequency as weights ● true (default)
● false
7.3.3.2.3.6 DecisionTree
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/DecisionTree
Description Values
This folder allows writing a request on the decision tree and obtaining the results. ● true
Available only after model training when DecisionTreeMode is set to true. ● false (default)
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/DecisionTree/
DimensionOrder
Contains one folder per target. Each folder contains the list of the 5 most contributive variables.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/DecisionTree/
DimensionOrder/Request
Allows creating a request to obtain a leaf or a level of the decision tree. The request will be processed after a
validateParameter when ProcessRequest value is true.
ProcessRequest Indicates whether the request must be executed or not. ● true (default): the re
quest will be executed
● false: the request will
not be executed
Datasets Lists the data sets on which the request will be made
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/DecisionTree/
DimensionOrder/Result
Contains the result of the request and is displayed after a getParameter when ProcessRequest is set to
true.
Parameter Description
SingleResult Subparameters:
ExpandResult ● Count: population of the current node
● TargetMean: mean of the target in the current node
● Weight: weight of the current node
● Variance: variance of the current node
7.3.3.2.3.7 ApplySettings
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings
This parameter allows you to set the advanced application settings, that is, to select and fine-tune the outputs
that SAP Predictive Analytics will generate. These outputs belong to one of the two following groups:
supervised (target-dependent) or unsupervised (non target-dependent).
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/
Supervised
It is the section for defining target-dependent outputs for classification/regression models. Each of its
subsections corresponds to a target and is outlined depending on this variable type.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/
Supervised/<Target_Name>
Contribution Specifies whether contributions of variables should be pro ● None (default): the flag
duced or not. is deactivated.
● All: all the contribu
tions will be produced.
● Individual: specific
variables of interest can
be selected (inserted as
sub-nodes).
Inputs Allows to transfer apply-in input variables to apply-out varia ● None: the flag is deacti
bles vated.
Note - only the variables used in the model can be transfer ● All: all the variables
red. will be used in the apply-
in data set.
● Individual (default):
specific variables of in
terest can be selected
(inserted as sub-nodes).
PredictedQuantile Indicates whether the quantile associated with the score ● true
value should be produced or not. This flag is activated by
● false (default)
providing a quantile level greater than 0.
PredictedCategoryCo Allows generating in the output file the confidence (also ● none: the flag is deac
nfidence known as the error bar) corresponding to each data set line tivated.
for the different categories of the target variable.
● all: the confidence is
generated for all catego
ries.
● individual: the
confidence will be gen
erated for selected (in
serted as sub-nodes)
categories only.
PredictedCategoryPr Allows generating in the output file the probability for one or ● none: the flag is deac
obabilities more target variable categories, that is, for each observation tivated.
the probability of the target variable value to be the selected
● all: the probability is
category.
generated for each cate
gory.
● individual: the
probability will be gener
ated for selected (in
serted as sub-nodes)
categories only.
PredictedCategorySc Allows generating in the output file the score for one or more ● none: the flag is deac
ores target variable categories. tivated and no score will
be generated.
● all: all the scores will
be produced.
● individual: the
score will be generated
for selected(inserted as
sub-nodes) categories
only.
PredictedRankCatego Allows generating in the output file the best decisions. ● none: the flag is deac
ries tivated.
● all: all the decisions
(ranked by their associ
ated score value) will be
produced.
● individual: the
specified count of best
decisions is generated.
This count has to be
provided (inserted as a
sub-node).
PredictedRankProbab Allows generating in the output file the score for one or more ● none: the flag is deac
ilities target variable categories. tivated and no score will
be generated.
● all: all the scores will
be produced.
● individual: the
score will be generated
for selected(inserted as
sub-nodes) categories
only.
PredictedRankScores Allows to generate in the output file the best score(s) for ● none: the flag is deac
each observation. tivated.
● all: all the scores will
be produced.
● individual: the re
quested number of 'best
scores' is generated.
This value has to be pro
vided (inserted as a sub-
node).
UnSupervised
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/
UnSupervised
This folder contains only information non-dependent on a target, for example constants such as the date of
learn, the application date, the model version.
Default subfolder containing the settings that do not depend on the target. This folder always exists and cannot
be changed.
DatasetId This parameter allows you to add a column containing the ● False (default): the
name of the data set each line belongs to. The possible out column is not added
put values in this column are:
● True: the column is
● Estimation added to the output
data set
● Validation
● Test
● Applyin
Inputs Allows you to add input variables to the output data set. ● none (default): no input
variable is added to the
When the value is set to individual, this parameter be
output data set
comes a folder containing the variables selected by the user.
● all: all input variables
are added to the output
data set
● individual: only the
input variables selected
by the user will be added
to the output data set
Weight If a weight variable has been defined during the variable se ● false (default): the
lection of the model, this parameter allows you to add it to weight variable will not
the output file be added to the output
data set
● true: the weight varia
ble will be added to the
output data set
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/
UnSupervised/Default/Constants
This folder contains the constants related to the model. Its value is always set to export.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/
UnSupervised/Default/Constants/<Constant_Name>
This folder is the name of the constant to be generated in the output data set. ● export: the constant will appear
The constant name can be one of the following: in the output data set
● Apply Date (date when the model was applied) ● skip (default): the constant will
● Build Date (date when the model was created) not appear in the data set
● Model Name
● Model Version
● User defined string (you can define your own constants, for example the
name of the person who created the model, or the project it belongs to). The
name cannot be the same as the name of an existing variable of the refer
ence data set.
● If the name is the same as an already existing user defined constant, the new
constant will replace the previous one
It contains all the settings needed to define a variable in the output data set.
OutVarName Name of the constant in the output data set By default the value is the
name of the constant, but it
can be modified to fit data
base restrictions for exam
ple.
Value Value of the constant that will appear in the output data set Depends on the value of the
storage parameter of the
constant
KeyLevel This parameter allows you to indicate whether the current ● 0 (default): the variable
constant is a key variable or identifier for the record . You is not an identifier
can declare multiple keys. They will be build according to the
● 1: primary identifier
indicated order (1-2-3-...).
● 2: secondary identifier
Storage This parameter allows you to set which kind of values ate ● Number: "computable"
stored in this variable numbers (be careful: a
telephone number, or an
account number should
not be considered num
bers)
● Integer: integer num
bers
● String: character
strings
● Datetime: date and
time stamps
● Date: dates
Origin This parameter indicates the origin of the constant. ● BuitlIn: automati
cally generated by SAP
Predictive Analytics
● UserDefined: cre
ated by the user
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results
This section describes the results for regression and classification models. Under this folder, there is a sub-
folder for each target variable. SAP Predictive Analytics Modeler can handle several targets at the same time.
All the parameters dealt with in this section are read-only once the model has been generated.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>
<Target_Name> corresponds to the name of the target for which the results are listed. There can be more
than one target in the same model.
NbInput The number of explanatory – or, input – variables of the re Integer
gression engine. It corresponds to the sum of the values of
the parameters NbOrgInput and NbContInput.
MaximumKeptCorrelat It links to the other tree parameter called Real value which equals the
ions MaximumKeptCorrelations in the Parameters sec value of
tion. MaximumKeptCorrelat
ions
LowerBound It links to the other tree parameter called LowerBound. Same value as
Parameters/
LowerBound
7.3.3.2.4.1 DataSets
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
DataSets
This folder contains performance indicators for each data set that has been evaluated by the model.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
DataSets/<Dataset_Name>
Dataset_Name is the name of the dataset that has been evaluated by the model. The name of the dataset can
be one of the following:
● Estimation
● Validation
● Test
● ApplyIn
● ApplyIn
Parameters
Parameter Description
L1 The residual mean (the mean of the absolute value of the difference between the
predicted value and the actual value); also known as City Block distance or Man
hattan distance.
ErrorMean The mean of the error, that is, the difference between predicted values and actual
values
The R2 is computed as the square correlation between the target and the model
output ( prefixed by rr_).
7.3.3.2.4.2 GainChartResults
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
GainChartResults
This directory contains information about the gain chart computed for each action made on the model
(training or application).
Parameter Description
Learn This directory contains information about the gain charts computed for the
trained model.
Transversal This directory contains information about the gain charts computed for the ap
plied model.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
GainChartResults/<Model_Type>/Quantiles
This folder contains a folder for each quantile computed for the current gain chart. <Model_Type> is either
Learn or Transversal.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
GainChartResults/<Model_Type>/Quantiles/<Quantile_Index>
This folder contains the metrics computed for the current quantile. <Quantile_Index> is the number of the
quantile. Quantile 1 is the one containing the highest number of positive observations.
Parameter Description
MinScore All observations contained in the current quantile have a score equal or above
this value.
MaxScore All observations contained in the current quantile have a score equal or below
this value.
Predicted Number of positive observations predicted by the model in the current quantile
Values This folder contains the variables for which the value has been aggregated. These
variables are defined in the parameter Protocols/Default/
Transforms/Kxen.RobustRegression/Parameters/
GainChartConfig/<Step Name>/ValueVariables
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
GainChartResults/<Model_Type>/Quantiles/<Quantile_Index>/Values/<Variable_Name>/
Value
<Variable_Name> is the name of the variable whose value is aggregated. Value parameter is the aggregated
value of the current variable for the current quantile.
7.3.3.2.4.3 Coefficients
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
Coefficients
Directory where the actual polynomial coefficients are stored. Coefficients can be used to determine how the
system is using the extended variables. This directory contains one sub-directory for each of the extended
variables.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
Coefficients/<Variable_Name>
<Variable_Name> corresponds to the extended variable for which the coefficients are listed.
Parameter Description
Weight Extended variable weight. This weight is the coefficient associated with the nor
malized extended variable, divided by the sum of all the coefficients.
Contrib Coefficient associated with the normalized extended variable absolute value, div
ided by the sum of all the normalized extended variables coefficients absolute val
ues
7.3.3.2.4.4 SmartCoefficients
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
SmartCoefficients
Directory where the smart coefficients are stored. Smart coefficients are just another view of the coefficients in
which the redundancy between the variables is removed. When two variables are very correlated, the robust
system K2R will almost equalize the contributions on the two variables, the smart coefficients view will put
almost all the contribution on the most contributive variable (which we call the leader variable) out of the very
correlated ones, and will translate the remaining variables into the difference between the leader variable and
this variable. Smart Coefficients can be used to perform variable selection.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
SmartCoefficients/<Variable_Name>
<Variable_Name> corresponds to the extended variable for which the smart coefficients are listed.
Parameter Description
Weight Extended variable weight. This weight is the coefficient associated with the nor
malized extended variable, divided by the sum of all the coefficients.
Contrib Coefficient associated with the normalized extended variable absolute value, div
ided by the sum of all the normalized extended variables coefficients absolute val
ues
7.3.3.2.4.5 MaxCoefficients
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
MaxCoefficients
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
MaxCoefficients<Variable_Name>
<Variable_Name> corresponds to the extended variable for which the max coefficients are listed.
Parameter Description
Weight Extended variable weight. This weight is the coefficient associated with the nor
malized extended variable, divided by the sum of all the coefficients.
Contrib Coefficient associated with the normalized extended variable absolute value, div
ided by the sum of all the normalized extended variables coefficients absolute val
ues
7.3.3.2.4.6 Rule
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/Rule
This folder contains information about SAP Predictive Analytics rule mode.
Parameter Description
Slope The slope of the linear transform to go from basic score to user-defined score.
Intercept The intercept of the linear transform to go from basic score to user-defined score.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
Correlations
This directory contains the correlations observed in the model between different variables (for example,
correlation between age and marital-status).
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
Correlations/<i>
<i> is the index of the correlation used to identify the various correlations found in the model. This index does
not imply any order in the correlations.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
Correlations/<i>/Details/<i>
7.3.3.2.4.8 AutoCorrelations
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/Results/<Target_Name>/
AutoCorrelations
This directory contains the auto-correlations observed in the model. These are correlations between a variable
and its encoded form (for example, correlation between age and c_age). Given here for its descriptive value.
<i> is the index of the correlation used to identify the various correlations found in the model. This index does
not imply any order in the correlations.
7.3.3.2.5 AdditionalResults
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/AdditionalResults/
This folder provides additional information about the results of the modeling.
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/AdditionalResults/
VariableExclusionInfo
This folder provides the list of excluded variables as well as the reason why each variable was excluded by SAP
Predictive Analytics depending on the target variable. These are mainly data quality issues.
Parameter Description
AllTargets This folder contains the variables that have been excluded
from the model with respect to all targets. These are mainly
constant variables. The reason why each variable has been
excluded is indicated as a subparameter of the variable.
TargetSpecific This folder contains the list of excluded variables with re
spect to the given target and the reason why they have been
excluded.
Reason
Syntax
Path: Protocols/Default/Transforms/Kxen.RobustRegression/AdditionalResults/
VariableExclusionInfo/AllTargets/<Variable_Name>/Reason
Description Values
For TargetSpecific:
7.3.3.3 Kxen.SmartSegmenter
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter
SAP Predictive Analytics Modeler - Segmentation/Clustering (formerly known as K2S) builds models
implementing a mapping between a set of descriptive attributes (model inputs) and the id (model output) of
one of several clusters computed by the system. It belongs to the clustering algorithms family building
descriptive models.The goal of these models is to gather similar data in the same cluster. The question of
similarity is discussed below.
Current version of SAP Predictive Analytics Modeler - Segmentation/Clustering uses a K-Means engine to
compute the cluster index output variable. K-Means is a method for finding clusters of similar individual within
a population. The K-Means method proceeds as follows: starting from an initial position of would-be centers of
clusters, the method associates the closest individual with each center, which leads to a first definition of the
clusters. The positions of the centers are then adjusted to the true central positions within the clusters. The
new positions are then used to recompute the closest individuals, and the process is restarted. This is repeated
iteratively until the centers land in a stable position. In practice, the process converges very quickly to a stable
configuration.
The distance used to determine the closest center is the L infinite distance in the encoded space generated by
SAP Predictive Analytics Modeler - Data Encoding. Hence the segmentation process is explicitly supervised,
which makes SAP Predictive Analytics Modeler - Segmentation/Clustering a unique clustering algorithm: any
distance-based clustering process is supervised, most of the time without even mentioning it ! Indeed, the
encoding phase of the process determines entirely the resulting segmentation, since this is the phase that
decides what is far, and what is close, that is what is similar to what. When dealing with non continuous
As for SAP Predictive Analytics Modeler - Regression/Classification, the target is any variable relevant to the
user's business : for example the purchase amount for a customer, the answer to a marketing campaign, the
fact that an individual churned in the last 2 months, and so on.
SAP Predictive Analytics Modeler - Segmentation/Clustering is now able to output a SQL formula of the
cluster. For example a cluster may be defined as "age <= 35 AND marital-status in [ 'Divorced' ]". This has
several advantages :
● the textual SQL formula may be very easy and natural to interpret if it's not too complex.
● clustering process is made easier to integrate in operational environment
In order to debrief a SAP Predictive Analytics - Segmentation/Clustering model several statistics are provided
beside SQL expressions when available:
The model can also be analyzed in terms of two indicators concerning the generated cluster Id variable:
The model learning phase takes one minute for a problem on 50,000 cases described with 13 attributes on a
regular PC (64-128 MB).SAP Predictive Analytics Modeler - Segmentation/Clustering processes data with 4
sweeps on the estimation set and one sweep on the entire data set. The behavior of SAP Predictive Analytics
Modeler - Segmentation/Clustering is almost linear with the number of lines.
7.3.3.3.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters
This section describes the parameters of SAP Predictive Analytics Modeler - Segmentation/Clustering
component that can be found under the 'Parameters' section of the component.
NbClusters ● Read-write It is the number of clusters for the model. This ● Positive integer
● Read-only when parameter is set by the user before the learning ● Default value: 10
transform is in phase and cannot be changed later on for a
ready state given model.
Supervised Read-only An informative item taking the value true when ● true
the clustering has been built using target varia ● false
ble, false otherwise.
MultiEnginesMod
eForApply
BenchesMode
RunnerConfig Folder
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/Distance
SystemDetermined Lets the system determine the best distance to be used according to the model
build settings. The current policy is to use LInf either in unsupervised mode or
when the clusters SQL expressions have been asked for, and L2 otherwise.
Syntax
Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/EncodingStrategy
Uniform Each variable segment is coded in the range [-1;+1] so that distribution of the vari
ables is uniform.
Each value of a continuous input variable is replaced by the mean of the target for
the segment the value belongs to.
Each category of a nominal input variable is replaced by the mean of the target
for this category.
In case of a nominal target variable, the mean of the target corresponds to the
percentage of positive cases of the target variable for the input variable category.
SystemDetermined Lets the system select the best encoding according to the model parameters. The
TargetMean encoding is used for supervised models.Otherwise, variables are
encoded using the Unsupervised scheme.
This option encodes the categories of the variable in the range [0,1], where 0 cor
responds to the minimum value of the variable and 1 corresponds to the maxi
mum value.
This option encodes the categories of the variable in the range [-1,1], where -1 cor
responds to the minimum value of the variable and 1 corresponds to the maxi
mum value.
This option performs a normalization based on the variable mean and standard
deviation. It is computed using the following formula:
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/ExtraMode
No Extra (default) Generates no extra output, this is the default behavior (provides only the cluster
ID of the current input data)
For K2R Additionally generates a disjunctive coding of the cluster ID. This may be used to
launch SAP Predictive Analytics Modeler - Regression/Classification with as
many targets as clusters.
For K2R copy data Same as above and additionally copy input data set. This is to be used with small
data set.
Target Mean Additionally generates the target mean value of the cluster ID.
Advanced Apply Settings Expert mode allowing the user to selectively choose the outputs of interest.
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/CrossStats
Values Description
disabled Disables cross statistics. This may be useful when there are a lot of input varia
bles : this indeed speeds SAP Predictive Analytics Modeler - Segmentation/Clus
tering up and reduces memory load as well as the size of the saved model. As a
drawback, clusters debriefing in JNI/CORBA client is no more available. KI, KR
and basic clusters statistics are still available.
7.3.3.3.1.1 EnginesConfiguration
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/
EnginesConfiguration/ClustersCountRangePolicy
Specifies how clusters count range should be configured for each target.
Shared All the targets use the shared clusters count specification (default).
Custom Each target will specify its own clusters count range, the initial values being taken
from the shared configuration section.
OverwritableCustom This mode differs from the previous one by the fact that the specific configura-
tion is overwritten by the shared one any time the latter happens to change (pro
vided that both are not altered at the same time).
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/
EnginesConfiguration/Kxen.SharedEngineConfiguration/ClustersCountSpec
Parameter Description
Enumeration Reserved for future use. It will allow specifying clusters count as a set of custom
values instead of a range.
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/
EnginesConfiguration/ByTargets
<target_name> | This is the place where clusters count range customization ● Min: lower clusters
<clusterId>/ can be taken in a per-target basis provided that the clusters number
ClustersCountSpec count range policy has been set either to Custom or
● Max: upper clusters
OverwritableCustom. It can be a Mix/Max range or an number
enumeration. Currently only the Mix/Max range is available.
● Enumeration: re
served for future use. It
will allow specifying
clusters count as a set
of custom values in
stead of a range.
<target_name> | This section is read-only in the current implementation and ● Not applicable
<clusterId>/Engines/ will be opened in the future to allow some fine tunings such
<Engine_index> as the distance, encoding strategy, … to be used by a given
engine.
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/ApplySettings
Common ApplySettings
The table below describes the Apply Settings which are common to both supervised and unsupervised modes.
DisjunctiveT Read-write Allows adding to the output file the dis ● true
opRankNodeId junctive coding of the clusters.
● false(default value)
PredictedNod Read-write Allows adding to the output file the dis ● None(default value): no distance
eIdDistances tance to the various clusters of each will be produced.
observation.
● All: the distance from each of the
clusters will be produced.
● Individual: a set of clusters of
interest can be selected (inserted).
PredictedNod Read-write Allows adding to the output file the ● None(default value): no probabil
eIdProbabili probability of each observation to be ity will be produced.
ties long to the various clusters.
● All: the probability for each of the
clusters will be produced.
● Individual: a set of clusters of
interest can be selected (inserted).
PredictedRan Read-write Allows adding to the output file the dis ● None(default value): no distance
kDistances tances of each observation from the will be produced.
nearest clusters.
● All: the distance from each of the
clusters will be produced.
● Individual: a set of clusters of
interest can be selected (inserted).
PredictedRan Read-write Allows adding to the output file the indi ● None: no cluster index will be pro
kNodeId ces of the nearest clusters for each ob duced.
servation.
● All: all the indices will be gener
ated, sorted (increasingly) accord
ing to their associated distance
● Individual(default value): the
requested nubmer of nearest clus
ter indices is produced. This value
has to be inserted.
PredictedRan Read-write Allows adding to the output file the ● None(default value): no cluster
kNodeName names (instead of the indices) of the name will be produced.
nearest clusters for each observation.
● All: all the cluster names will be
generated, sorted (increasingly)
according to their associated dis
tance.
● Individual: the requested
number of nearest cluster names
is produced. This value has to be
inserted.
PredictedRan Read-write Allows adding to the output file the ● None(default value): the option is
kProbabiliti probabilities that the observation be deactivated.
es longs to each of the nearest clusters.
● All: all the probabilities will be
generated, from the highest to the
lowest.
● Individual: the requested
number of highest probabilities is
produced. This value has to be in
serted.
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/ApplySettings/
UnSupervised/Default
Inputs Read-write Allows you to add to the output file one User-dependent
or more input variables from the data
set.
Weight Rread-write Allows you to add to the output file the User-dependent
weight variable if it had been set during
the variable selection of the model.
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/ApplySettings/
Supervised
<Target_Name> Read-only It extends the common set of flags. TargetMean (read-write): a boolean
value type which allows adding to the
output file the mean of the target for
the cluster containing the observation.
false is the default value.
<Target_Name> Read-only It extends the common set of flags. ● OutlierFlag: [Not Imple
mented]
● OutputCosts: [Not Imple
mented]
● ProfitMatrix: [Not Imple
mented]
● RankType: [Not Implemented]
● TargetMean (read-write): a boo
lean value type which allows add
ing to the output file the proportion
of the least frequent category of
the target variable (key category)
in the cluster containing the cur
rent observation.
7.3.3.3.2 Results
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/Results/
<Target_Name>
This section describes the results of SAP Predictive Analytics Modeler - Segmentation/Clustering. Under this
folder there is a subfolder for each target variable. SAP Predictive Analytics Modeler - Segmentation/Clustering
can indeed run several models simultaneously, that is one per target variable. In multi-engines mode this folder
will hold the results for the winning engine only. Results for all the engines can then be found in the
AdvancedResults folder. All parameters are in read-only mode.
TargetEstimator The name of the variable used for the cluster index prediction (also know as the
target estimator in supervised mode).
Clusters (folder) A folder where all the information about clusters may be found. It contains one
sub-folder for each cluster.
Metrics (folder) Contains the quality metrics for the found segmentation.
Overlapp (only in SQL mode) The off diagonal percentage of the confusion matrix between covers. More infor
mation on covers is availble under the 'Clusters' parameters, below.
GazFrequency (only in SQL mode) The percentage of input data that are not assigned to a cluster.
Index The engine index within the set of engines associated with this target.
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/Results/
<Target_Name>/Clusters
Coordinates (folder) The folder where the coordinates of clusters are stored. This
folder contains one sub-folder per dimension.
TargetMean The mean value of the target for data assigned to the cluster. In classification case (binary
The target mean is the percentage of <label> in the cluster, target) the encoding of the
where <label> is the least frequent category. target is as follows:
Cover (only in SQL mode) A folder where part of the cluster SQL formula is stored. A
cover is an SQL formula made of a conjunction (that is, a
series of 'AND') of basic SQL statements such as
"variable i in [ 'value1', 'value2' ]" for
nominal variables and "variable i < value1 AND
variable i > value2" for continuous variables.
ANDNOT (only in SQL mode) A folder that contains the rest of the information necessary The cluster expression is
to build the cluster SQL formula. The folder contains the ids
built as :Cluster =
of the covers to subtract from this cover to obtain the cluster
formula as shown in the following example: say 2 is the cur Cover2 AND NOT
rent cover index and { 3, 4 } is the set of indices found in the ( Cover3 OR Cover
ANDNOT folder. 4 ) = Cover2 AND
NOT Cover3 AND NOT
Cover4
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/Results/
<Target_Name>/Clusters/Coordinates
KL When used in JNI/CORBA interface to debrief the clusters, it Given a cluster and a dimen
is the Kullback-Leibler divergence between cluster distribu sion, it is computed using the
tion for this dimension ( that is, this input variable ) and pop following formula:
ulation distribution. This distance is used in JNI/CORBA in
terface to debrief clusters.
where :
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/Results/
<Target_Name>/Clusters/Cover
Parameter Description
Syntax
Path: Protocols/Default/Transforms/Kxen.SmartSegmenter/Parameters/Results/
<Target_Name>/Clusters/Cover/Operator
Parameter Description
Values (when the variable is nominal/ordinal) The set of categories of data assigned to
the cluster.
Min (when the variable is continuous) The minimum of the values for data assigned to
the cluster.
MinEqual (when the variable is continuous ) A boolean that specifies whether the Min value
is included in the range.
Max (when the variable is continuous) The maximum of the values for data assigned
to the cluster.
MaxEqual (when the variable is continuous ) A boolean that specifies whether the Max value
is included in the range.
Advanced Results
This is the location for all the (intermediate) results when the multi-engines mode is enabled. The results for
the winning engine are additionally saved in the Results folder. Under this folder there is a sub folder for each
target variable containing in its turn a sub-folder for each engine. Each of these second level sub-directories
has the same layout as the Results folder.
7.3.3.4 Kxen.ConsistentCoder
Syntax
Path: Protocols/Default/Transforms/Kxen.ConsistentCoder
SAP Predictive Analytics Modeler - Data Encoding is a data preparation transform building consistent (robust)
coding scheme for any attribute belonging to a training data set containing a business question (specific target
variable to analyze). Each nominal attribute possible value is:
The attributes may also be called variables, whereas their possible values are sometimes referred to as
categories.
SAP Predictive Analytics Modeler - Data Encoding brings Intelligence to any OLAP system (IOLAP TM) through
a ranking of the variables based on their robust information to explain a business question. SAP Predictive
Analytics Modeler - Data Encoding processes on both estimation and validation sets in a single pass. SAP
Predictive Analytics Modeler - Data Encoding finds a robust (consistent) encoding to nominal variables in order
to be used with numerical algorithms. SAP Predictive Analytics has refined techniques that have been used for
years in this field. The strength of SAP Predictive Analytics Modeler - Data Encoding lies in the fact that it can
SAP Predictive Analytics Modeler - Data Encoding belongs more to the second category. When dealing with a
nominal variable, it first computes the statistics associated with each category of this nominal attribute.
For example: for a nominal variable called 'Color' with three possible categories: 'Red', 'Blue' and 'Green', SAP
Predictive Analytics Modeler - Data Encoding first computes the average of the target for each of these
categories. When the target is a continuous variable, the average of the target for each category is a
straightforward computation. When the target is a nominal variable, the user can associate a cost (profit) for
the different classes.
The nominal variable coding is based on the target average for each categories.
Let's go back to our 'Color' example, and let's assume that the target average for each color given by the
following table.
Red .75
Green .35
Blue .50
Then, the most simple coding scheme is to encode each category with its target average. But this technique
has some drawbacks.
● First you can lose some information when two categories have the same target average (or very close).
● Second, for the target average to have any meaning at all, you must have enough cases for the category.
To counter the first issue, SAP Predictive Analytics Modeler - Data Encoding allows the user to code the
category not directly with the target average but with the rank of the sorted target averages as shown in the
next table:
Red .75 3
Green .35 1
Blue .50 2
In this scheme, each category is coded as the rank. To counter the second issue, SAP Predictive Analytics
Modeler - Data Encoding automatically searches the minimum number of cases for each category needed to
keep this category. All categories that are not represented enough into the data base is associated with a
miscellaneous class (whose default name is 'KxOther'). This search is done using SRM (Structural Risk
Minimization) principle.
Syntax
Path: Protocols/Default/Transforms/Kxen.ConsistentCoder/Parameters
This section describes the SAP Predictive Analytics Modeler - Data Encoding components parameters which
can be found under the 'Parameters' section of the component.
Compress ● Read-write This variable can be used to deactivate the ● true [Default]
● Read-only (when compression of SAP Predictive Analytics Mod
● false
the transform is in eler - Data Encoding, by setting its value to
ready state) false.
ExtraMode ● Read-write This parameter belongs to the set ['No Extra', ● No Extra [De
● Read-only (when 'K2R Coding', 'K2S Coding']. This value is used fault]
the transform is in when SAP Predictive Analytics Modeler - Data
● K2R Coding
ready state) Encoding is used alone in a model. This mode
● K2S Coding
can be used to store back the results of the cod
ing into a space. ● K2S
Unsupervised
● KSVM Coding
7.3.3.4.2 Results
Most of the information generated by SAP Predictive Analytics Modeler - Data Encoding is stored into the
current protocol original nominal variables 'Statistics' section. SAP Predictive Analytics Modeler - Data
Encoding changes the original categories by introducing the 'KxOther' category if needed. After SAP Predictive
Analytics Modeler - Data Encoding has been trained, the original dictionaries are smaller.
7.3.3.5 Kxen.SocialNetwork
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork
SAP Predictive Analytics Social is an automatic data preparation transform that extracts and uses implicit
structural relational information stored in different kinds of data sets, and thus improves model decisions and
prediction capacities. The user configures the loading module and graph filters. The configuration is made
7.3.3.5.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters
7.3.3.5.1.1 LoadSettings
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings
The LoadSettings folder contains all the parameters used to load the graphs.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
GraphOutputDir
Used for debug purposes. It indicates the path where generated graphs will be dumped in DOT format. If left
empty, nothing is done. Default value is empty.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
NodeSettings_<Repository>
It contains the parameters for providing descriptive attributes and identifiers conversion.
IndirectionTableNodeColumnName Indicates the column in the node data set that contains the
entity identifier, such as the customer identifier.
NodeProperties The user can use this parameter to specify the variables to
be used to decorate the node; the NodeProperties pa
rameter changes to a directory containing the variables in
serted by the user.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
GraphFilters
The GraphFilters sub-folder is used to insert parameter templates in the graphs created from a single table
containing the relational data. Every inserted template contains parameters for the concerned graph (type,
column to filter, etc…):
GraphType Directed/Undirected/Bipartite.
NodeSet Can be used to force the graph to store ● First: source node
its nodes in a specific node repository.
● Second: target node
1. FilterMaximum/FilterMinimum:
cannot be set if Filtervalue is set.
Used to filter values between two
bounds.
2. FilterValues/FilterNotValues: can
be used to filter on a precise dis
crete values list (include or ex
clude).
7.3.3.5.1.1.1 PostProcessing
Syntax
Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing
Since version 6.0.0, it is possible to set a priority order to all graph post-processings. For instance it can be
important to force the mega-hub filter to be launched before the community detection. The priority can be set
using the priorityLevel common to all post-processings.
1. Mega-hub filtering
2. Bipartite graph projection
3. Community detection
4. Node pairing
All post-processing parameters must be specified after the graph specifications and before the model learning
stage.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/Node1Column
The name of the column containing the first node identifier (input node if directed graph is built).
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/Node2Column
The name of the column containing the second node identifier (output node if directed graph is built).
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/DateColumn
The name of the column containing the date of the events (optional, may be left blank).
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/KSN.BipartiteGraphProjection/<Graph_Name>
Bipartite projection is used to transform a bipartite graph into a unipartite graph in which nodes from a same
population are connected if they share a certain amount of relevant neighbors. There is one folder for each
projected graph.
Parameter Description
Values:
● First
● Second
Values:
● Support
● Jaccard
● Independence Probability
Values:
● true
● false (default)
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/KSN.MegaHubFilter/<Graph_Name>
Mega-hub filtering can be activated for a given graph to filter high connected nodes. This folder contains one
sub-folder for each graph for which the mega-hub filtering has been activated. The parameters corresponding
to each graph are stored in these sub-folders.
Parameter Description
MethodId Values:
Default value: 4
UserThreshold
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/KSN.CommunityDetection/<Graph_Name>
The Community Detection algorithm can be activated for a given graph. This folder contains one sub-folder for
each graph for which the community detection has been activated.
Parameter Description
EpsilonValue The stop condition for the algorithm. The detection stops if
there is no modularity gain bigger than EpsilonValue.
Default value: 0
Values:
● true
● false (default)
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
PostProcessing/KSN.NodePairing/Pairing<n>
Parameter Description
Default value: 4
Values:
● true
● false (default)
Values:
● true
● false (default)
Values:
● Ratio (default)
● Count
● Jaccard
● Independence Ratio
● Confidence
● Clustering
7.3.3.5.1.1.2 FromModelLoadSettings
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/LoadSettings/
FromModelLoadSettings
This folder contains all the parameters required to specify models, including saved models to be imported.
7.3.3.5.1.2 GraphApplySettings
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/ApplySettings/
GraphApplySettings/<Graph_Name>
There is one folder for each graph generated. The names of the community graphs created for a graph defined
by the user are suffixed _cm_lvl_<level_number>.
CommunityCount This folder allows the user to insert all the discrete variables for which they want
to compute the count in the community. This generates as many columns as
there are categories in the target variable.
CommunityLinks Lists all the links inside the community of the input node. This can be used to ex
tract the subgraph of nodes and links inside a community.
CommunityMean Allows the user to insert all the continuous variables (age, capital_gain…) for
which they want to compute the mean on the community.
CommunityNode Only for community graphs. Gives a node ID inside the input community (ID)
(used for internal display).
CommunityOffNetRatio
CommunityRatio Allows the user to insert all the discrete variables for which they want to compute
the modes in the community. This generates as many columns as there are cate
gories in the target variable.
Values:
Values:
● true (default)
● false
CommunityVariance This folder allows the user to insert all the continuous variables (age, capi
tal_gain…) for which they want to compute the variance on the community
Count This folder allows the user to insert all the discrete variables for which they want
to compute the modes in the first circle (set of direct neighbors). This generates
as many columns as there are categories in the target variable, containing the
count for this category.
Values:
● true (default)
● false
Values:
● true
● false (default)
Values:
InfluenceReach This folder can be used to insert a template parameter in the Social ApplySet
tings Influence Reach node. The label name used for the inserted template has no
importance.
Mean Allows the user to insert all the continuous variables (age, capital_gain…) for
which they want to compute the mean on the first circle
Mode Allows the user to insert all the discrete variables for which they want to compute
the modes in the first circle. This generates as many columns as the number of
categories in the target variable.
OffnetRatio
Profile Same as the Mode parameter but using ratios instead of counts
Ratio This folder allows the user to insert all the discrete variables for which they want
to compute the ratio of each of the categories in the first circle (set of direct
neighbors). This generates as many columns as there are categories in the target
variable, containing the ratio for this category.
Recommendation This folder can be used to Insert a template parameter in the Social ApplySet
tings Recommendation node. This must be done in the bipartite graph apply set
tings.
SPActivation Spreading activation, or graph diffusion, is used to spread a value into the graph.
The diffusion is initiated by labeling all the nodes with weights or "activation" and
then iteratively propagating or "spreading" that activation out to other nodes
linked to the source nodes. The result of the apply is the weight of each nodes af
ter the diffusion process.
Triangle Computes the number of triangles the input node is a part of. If two neighbors of
a node are themselves connected, they are forming a triangle.
Values:
● true
● false
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/ApplySettings/
GraphApplySettings/<Graph_Name>/InfluenceReach/<Label>
Parameter Description
VarName Name of the variable containing the activation date information. The variable
must have a date description.
StartDate Start of the time frame to observe the activation diffusion. Nodes that do not
match the time frame are not taken into account. Can be left blank.
EndDate End of the time frame to observe the activation diffusion. Nodes that do not
match the time frame are not taken into account. Can be left blank.
DeltaMin Minimum delta of time between two nodes to observe the activation diffusion.
Can be left blank.
ActivatedOnly If activated, will only compute cascade size if the node itself is activated.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/ApplySettings/
GraphApplySettings/<Graph_Name>/Recommendation
Parameter Description
FilterPurchased If set to true, will exclude already purchased items from the recommendation
set; that is, the items the user is connected to in the bipartite graph.
Values:
● true
● false
Value:
Values:
● Support
● Confidence
● KI
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Parameters/ApplySettings/
GraphApplySettings/<Graph_Name>/SPActivation
SpreadingFactor Decay factor of the diffusion, part of the score that will be propagated
Values:
● 0.75 (default)
● Any float value in [0 ; 1]
Values:
● 10 (default)
● Any integer >1
ActivationThreshold Nodes with weights above this threshold will be considered as activated. Only ac
tivated nodes will spread weights to their neighbors.
Values:
● 0.1 (default)
● any float value in [0 ; 1]
7.3.3.5.2 Results
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Results
Parameter Description
NodesInFirstRepo Number of unique nodes (from all graph) stored in the first node repository
NodesInSecondRepo Number of unique nodes (from all graph) stored in the second node repository
(for bipartite graphs)
NodesAttributes (folder)
DeriveFrom This parameter is available with a derived graph of a bipartite graph, or a node
pairing graph. It indicates if the graph is a result of a post-processing operation on
another graph.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Results/AvailableGraphs
Nodeset Indicates in what node repository nodes are stored. For bi ● First
partite graphs, it indicates the node repository of the first ● Second
population.
In/ Indicates the power law exponent of the node degree distri
OutPowerLawExponent bution.
Syntax
Path: Protocols/Default/Transforms/Kxen.SocialNetwork/Results/AvailableGraphs/
MegaHubs
Parameter Description
Parameter Description
Modularity Value of the final modularity, a goodness indicator of the community partition
Intra/Inter-LinksMedian Median value of the intra/inter links distribution for all the nodes
SumOfWeights Weighted sum of all the links in the graph. This value is needed for several com
munity detection algorithm or modularity optimization
CommunitySizeDistribution Stores a list of couple of integers that describe the distribution of community size
in the given graph
7.3.3.6 Kxen.DateCoder
Syntax
Path: Protocols/Default/Transforms/Kxen.DateCoder
Date Coder is an automatic data preparation transform that extracts date information from a date or datetime
input variable. This component is automatically inserted by SAP Predictive Analytics Modeler - Data Encoding if
one of the input variables is a date or a datetime variable.
This section of the parameter tree contains all the date and datetime variables used by the Date Coder
component.
Day of week day of week according to the Monday=0 and Sunday=6 <OriginalVariableNa
ISO disposition me>_DoW
Month of quarter month of the quarter ● January, April, July and <OriginalVariableNa
October = 1 me>_MoQ
● February, May, August
and November = 2
● March, June, September
and December = 3
On top of the 7 dateparts above, Datetime variables are broken down into 3 more dateparts:
Hour <OriginalVariableName>_H
Minute <OriginalVariableName>_Mi
Second <OriginalVariableName>_S
The generated variables are stored in the parameter tree under Protocols > Default > Variables.
Note- all generated variables are ordinal except for ’DayOfYear’ and ‘µ seconds’ which are continuous.
7.3.3.7 Kxen.AssociationRules
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules
SAP Predictive Analytics Modeler - Association Rules generates association rules. Association rules provide
clear and useful results, especially for market basket analysis. They bring to light the relations between
products or services and immediately suggest appropriate actions. Association rules are used in exploring
categorical data, also called items.
7.3.3.7.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters
This section describes the parameters of SAP Predictive Analytics Modeler - Association Rules that can been
found under the 'Parameters' section of the component.
ExtraMode read-write A special flag allowing to set the type of outputs ● No Extra
that Association Rules will generate.
● Optimized by
KI
● Optimized by
Confidence
● Full
Description
● Full
Description
and
Optimized by
Confidence
● Full
Description
and
Optimized by
KI
DateColumnName Sequence mode only The column in which the date is stored.
SequencesMode Sequence mode only A flag specifying if the Sequence mode of Asso ● true (or 1):
ciation Rules is activated. means that the
Sequence mode is
activated,
● false (or 0)
means that the
Sequence mode is
deactivated.
References (folder) Used to set the information relative to the Refer TIDColumnName: in
ence data source. dicates the name of
the reference key vari
able.
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters/ExtraMode
Values Description
No Extra Generates basic outputs, that is the session key, the ID of the rule used to find the
consequent and the consequent itself.
Optimized by KI Generates basic outputs. If more than one rule give the same consequent for a
session, the rule presenting the best KI will be selected.
Optimized by Confidence Generates basic outputs. If more than one rule give the same consequent for a
session, the rule presenting the best Confidence will be selected.
Full Description Generates the extended outputs, that is the session key, the rule ID, the conse
quent, the antecedent, the KI, the confidence and the rule support.
Full Description and Generates the extended outputs. If more than one rule give the same consequent
Optimized by Confidence for a session, the rule presenting the best Confidence will be selected.
Full Description and Generates the extended outputs. If more than one rule give the same consequent
Optimized by KI for a session, the rule presenting the best KI will be selected.
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters/Transactions
Parameter Description
LogSpaceName Indicates SAP Predictive Analytics role set for the Transactions space name.
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters/
ARulesEngineParameters
Association rules has two usages in SAP Predictive Analytics. The first one is as a standalone module used to
detect interactions between items that are associated with a common entity: it is used for example to detect
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters/
ARulesEngineParameters/FPV
Default Use
MinimumConfidence Gives the minimum threshold for the ● 0.5 (default value)
confidence of a rule. ● Required value between 0 and 1
ChunkSize Saves the size of the chunks (in number ● 0 (default value)
of sessions) used by FPV to import and ● Required value >=
generate the rules. With a value equal
to 0, the chunk strategy will not be used
and ALL the sessions will be imported
before generating the rules.
AutomaticParameterSettings Used in the Data Quality Mode of KAR ● true: means that thethe Quality
in order to infer what is usually given by Data Mode is selected.
the user such as the maximum length,
● false: means the Default Use
the default support and confidence.
mode is selected.
The maximum length is computed with
respect to the number of columns in or
der to fight for combinatorial explosion.
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters/
ARulesEngineParameters/RulesGenerationFilters
This folder contains the ConsequentFilters folder, which contains the following parameters:
Parameter Description
IncludedList All rules with a consequent belonging to the list of values in
serted below this parameter will be generated. When this list
is empty, all rules are generated (with respect to the Ex
cludeList).
ExcludedList All rules with a consequent not specified in the list of values
below this parameter will be generated. When this list is
empty, all rules are generated (with respect to the IncludeL
ist).
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Parameters/
ARulesEngineParameters/ApplyActivationOptions
Parameter Description
ActivatedConsequentsList Be filled by the user with items. All rules having one of these
items as consequent will be kept to generate recommenda
tions.
7.3.3.7.2 Results
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Results
This section describes the results of SAP Predictive Analytics Modeler - Association Rules. This can be found
under ‘Results’ section of the component.
ItemCategories (folder) The folder where the global information of the Item variable
is stored.
ARulesEngineResults The folder where all the information about the rules found is ● NumberOfItemSets
(folder) stored. Generated
● NumberOfFrequent
ItemSets
● NumberOfFrequent
Items
● NumberOfItems
● NumberOfRules
● FillingRatio
● NumberOfTransact
ions
● Rules (folder)
TransactionStats Contains all the statistics relative to the number of transac ● Mean
(folder) tions by session.
● StDev
● Min
● Max
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Results/ItemCategories
Value
Storage
KeyLevel
OrderLevel
MissingString
Group
Description
ConstantForMissing
(folder)
SpaceName (folder)
InSpaceName
(folder)
IsVirtualKey ● true
● false
UserPriotiry
DescriptionSource
UserEnableKxOther ● true
● false
UserEnableCompress ● true
● false
NominalGroups
Monotonicity
UserModulus
StatForCode
EstimatorOf
ClusterOf (folder)
TargetKey
SpaceOrigin ● true
● false
Translations
NativeInformation
(folder)
Statistics (folder)
Extensions
BasedOn
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Results/
ARulesEngineResults
NumberOfItemSetsGen The total number of itemsets created during the learning An integer value.
erated phase.
NumberOfFrequentIte The number of frequent item sets, that is the number of An integer value.
mSets itemsets whose support is superior to the minimum support
set by the user.
NumberOfFrequentIte The number of frequent items, that is the number of items An integer value.
ms whose the support is superior to the minimum support set
by the user.
NumberOfItems The number of items found in the transaction data set. An integer value.
Rules (folder) The folder where all the rules generated are stored. ● Antecedent (folder)
● Consequent (folder)
● Confidence
● KI
● Lift
● AntecedentSuppor
t
● ConfidenceSuppor
t
● RuleSupport
● SequenceSupportP
ct
● SequenceSupportR
atio
● SequenceConfiden
ce
● SequenceKI
● SequenceLift
● DurationMin
● DurationMax
● DurationMean
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Results/
ARulesEngineResults/Rules
Antecedent (folder) The folder where the names of the items composing the an
tecedent are stored.
Consequent (folder) The folder where the name of the item composing the con
sequent is stored.
Activated ● true
● False
AntecedentSupportPc
t
ConsequentSupportPc
t
RuleSupportPct
SequenceSupportPct Indicates the relative sequence support of the rule. A real value between 0 and 1.
[Sequence mode only]
SequenceConfidence Indicates the rule confidence in the sequence mode. A real value between 0 and 1.
[Sequence mode only]
SequenceKI [Sequence Indicates the rule KI in the sequence mode. A real value between -1 and 1.
mode only]
SequenceLift [Sequence Indicates the rule Lift in the sequence mode. A value strictly greater than
mode only] 0.
DurationMin [Sequence Indicates the minimum amount of time observed between A value expressed in seconds
mode only] an antecedent and its consequent. if the date is in a date or da
tetime format.
DurationMax [Sequence Indicates the maximum amount of time observed between A value expressed in seconds
mode only] an antecedent and its consequent. if the date is in a date or da
tetime format.
DurationMean [Sequence Indicates the average amount of time observed between an A value expressed in seconds
mode only] antecedent and its consequent. if the date is in a date or da
tetime format.
Syntax
Path: Protocols/Default/Transforms/Kxen.AssociationRules/Results/TransactionsStats
Parameter Description
Syntax
Path: Protocols/Default/Transforms/Kxen.EventLog
The purpose of SAP Predictive Analytics Explorer - Event Logging is to build a mineable representation of
events history. It is not a data mining algorithm but a data preparation transform. All algorithms to perform
regression, classification or segmentation only work on a fixed number of columns, but sometimes a customer
can be associated with a list of events (the purchase history for example) with different size for every customer,
so this list of events must be translated in a number of fixed columns in one way. These type of operations are
called pivoting in data mining because they translate information contained in the same column of different
lines into different columns on a single line (for a given customer identifier for example). SAP Predictive
Analytics Explorer - Event Logging and SAP Predictive Analytics Explorer - Sequence Coding belong to these
type of transformations. SAP Predictive Analytics Explorer - Event Logging can be used to represent the history
of a customer, or a log history of defects associated with a machine in a network. This component merges
static information (coming form a static table) and dynamic information (coming from a log table). The user
must have these two tables before using the component. The table containing static information is generally
called the "reference" table, and it is associated in the models with the classical data set names or roles such as
Training, Estimation, Validation, Test or ApplyIn. The table containing the log of events (sometimes called the
"transactions" table) is associated with a name beginning with the name "Events". SAP Predictive Analytics
Explorer - Event Logging is said to build coarse grain representations as it summarize the events on different
periods of interest; this is done with some information loss. A good example of using SAP Predictive Analytics
Explorer - Event Logging is when trying to represent RFA (Recency-Frequency-Amount) views of a customer
based in his purchase history.
7.3.3.8.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.EventLog/Parameters
This section describes the parameters of SAP Predictive Analytics Explorer - Event Logging that can be found
under the 'Parameters' section. All the parameters are read-only when transform is in ready state.
Reference (folder)
ExtraMode Indicates the mode in which ● No Extra(default value): produces all aggregates,
SAP Predictive Analytics Ex and also outputs its input variables.
plorer - Event Logging will be
● Output Only Aggregates: outputs only the vari
applying the aggregations;
ables generated by SAP Predictive Analytics Explorer -
changing this will influence
Event Logging.
the output produced by SAP
Predictive Analytics Explorer
- Event Logging.
Syntax
Path: Protocols/Default/Transforms/Kxen.EventLog/Parameters/Reference
Parameter Description
IdColumnName The name of the column containing the identifier of the main
object (customer ID, machine ID, session ID) in the reference
table. A proper value is mandatory.
DateColumnName [read-write]: when not empty, this name is the name where
the component can find a reference date. The reference date
is used to compute aggregates on periods starting at the ref
erence date. In this case the reference date can be different
for every line. If this value is left empty, then it is assumed
that the user will specify a proper RefDefaultDate which is
defaulted to 1-01-01 00:00:00.
DefaultDate [read-write]: when not empty, this reference date is used ei
ther when there is no reference date column or when this
value is missing. It must be noted that if the user specify
both a ReferenceDateColumnName and a RefDefaultDate,
only the later is taken is taken under consideration.
Syntax
Path: Protocols/Default/Transforms/Kxen.EventLog/Parameters/Representation
Parameter Description
Sum Indicates the sum for the selected variable during the de
fined period.
Average Indicates the average for the selected variable during the de
fined period.
Min Indicates the min for the selected variable during the defined
period.
Max Indicates the max for the selected variable during the de
fined period
● Delta calculates the difference between the values of two consecutive periods for all the periods.
● PercentIncrease calculates the difference in percentage between the values of two consecutive periods
for all the periods.
● Accumulation calculates the current total accumulation for each period.
● BackAccumulation calculates the current total accumulation for each period calculated backwards.
● GlobalSum calculates the sum of all periods values.
Syntax
Path: Protocols/Default/Transforms/Kxen.EventLog/Parameters/Transactions
EventSpaceName Read-write The name that allows this SAP Predictive Ana A character string.
lytics Explorer - Event Logging to find back infor
mation on the transactions. We had to allow dif
'Events': default
ferent names for this role because the user can value.
stack several SAP Predictive Analytics Explorer
Note - a proper value is
- Event Logging in a single protocol each SAP
Predictive Analytics Explorer - Event Logging mandatory. Since it is
can deal with one specific transaction file. It is defaulted to 'Events',
up to the user to specify then several event this default cannot be
space names such as Events HotLine, Events used if two SAP Pre
Products and so on. All these names must start dictive Analytics Ex
with "Events".
plorer - Event Logging
are inserted in the
same model.
IdColumnName Read-write Indicates the name of the column containing A character string.
the identifier of the main object (customer ID,
Note - a proper value is
machine ID, session ID, etc...) in the log (trans
action) table. This is used to join the event log mandatory.
with the proper case.
DateColumnName Read-write Indicates the name of the column containing A character string.
the event (or transaction) in the events (or
transactions) table. Together with the reference
date, it is used to determine which period each
event will be aggregated into.
EventColumnName Read-write Some transactions can be associated with an The default value is left
event code (such as buy, test, receive_mailing, empty, which means
answer_mailing for a customer history, or type that no filter is applied.
of failure for a machine on a network), and the When this value is not
user can force the aggregates only on a sub list empty, it should be
of accepted event codes. used in conjunction
with AcceptedEvent.
AcceptedEvent This list allows filtering some events (only trans The value is left empty
actions with events present in the listed codes by default, which
will be kept). means that all events
are kept.
DismissedEvent This list allows filtering some events (only trans The value is left empty
actions with events present in the listed codes by default, which
will be ignored). means that all events
are kept.
7.3.3.8.2 Results
There is no results provided by SAP Predictive Analytics Explorer - Event Logging. SAP Predictive Analytics
Explorer - Event Logging only outputs new variables with their statistics on all data sets.These variables can be
found in the Variables section of the parameter tree.
Different variables are created depending on the selected operators and meta operators. Different elements
are used to build the variables names:
<prefix>_<Meta>_<Operator>_<Varia-
ble>_P1<->P2
...
<prefix>_<Meta>_<Operator>_<Varia-
ble>_Pn-1<->Pn
<prefix>_<Meta>_<Operator>_<Varia-
ble>_P0+P1+P2
...
<prefix>_<Meta>_<Operator>_<Varia-
ble>_P0+P1+...+Pn
<prefix>_<Meta>_<Operator>_<Varia-
ble>_P1+...+Pn
...
<prefix>_<Meta>_<Operator>_<Varia-
ble>_Pn-1+Pn
7.3.3.9 Kxen.SequenceCoder
Syntax
Path: Protocols/Default/Transforms/Kxen.SequenceCoder
The purpose of SAP Predictive Analytics Explorer - Sequence Coding is to build a mineable representation of
events history. It is not a data mining algorithm but a data preparation transform. All algorithms to perform
regression, classification or segmentation only work on a fixed number of columns, but sometimes a customer
can be associated with a list of events (the purchase history for example) with different size for every customer,
so this list of events (or transactions) must be translated in a number of fixed columns in one way. These type
of operations are called pivoting in data mining because they translate information contained in the same
column of different lines into different columns on a single line (for a given customer identifier for example).
SAP Predictive Analytics Explorer - Event Logging and SAP Predictive Analytics Explorer - Sequence Coding
belong to these types of transformations.
SAP Predictive Analytics Explorer - Sequence Coding is said to build fine-grained representations as it
summarizes the count of different events or even the transitions between different events for a given reference
object. A good usage example of SAP Predictive Analytics Explorer - Sequence Coding is when trying to
represent web log sessions. The reference table contains information about the sessions, and the transactions
table contains the click-stream. SAP Predictive Analytics Explorer - Sequence Coding is able to represent each
session as the transitions between possible pages (or meta-information about the pages).
7.3.3.9.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.SequenceCoder/Parameters
This section describes the parameters of the KSC component, which can be found under the Parameters
section of the component.
Syntax
Path: Protocols/Default/Transforms/Kxen.SequenceCoder/Parameters/Transactions
Parameter Description
Syntax
Path: Protocols/Default/Transforms/Kxen.SequenceCoder/Parameters/Reference
This is the folder for information related to the sequences (static or reference table for sequence) dataset.
Parameter Description
Syntax
Path: Protocols/Default/Transforms/Kxen.SequenceCoder/Parameters/Representation
This folder contains all the information needed to specify what type of encoding of sequences is chosen by the
user.
Parameter Description
InternalStorage This contains the place holder that will allows us to activate
our own internal storage to avoid large memory consump
tion. The default value is "Memory".
Syntax
Path: Protocols/Default/Transforms/Kxen.SequenceCoder/Parameters/Representation/
Operations
Parameter Description
7.3.3.9.2 Results
There is no "real results" provided by SAP Predictive Analytics Explorer - Sequence Coding. SAP Predictive
Analytics Explorer - Sequence Coding only outputs new variables with their statistics on all data sets.These
variables can be found in the Variables section of the parameter tree.
As far as integration is concerned, there is a little trick to know. When integrating a SAP Predictive Analytics
Explorer - Sequence Coding transform into a protocol, it is indeed useful to ask this transform to compute the
output columns. The catch is that, in order to do this, SAP Predictive Analytics Explorer - Sequence Coding
must make a first pass on the transaction table to find what the valid states and transitions are between states.
This is done through a checkMode call to the model. The generated output variables (for example, the output
columns) are provided below.
7.3.3.10 Kxen.TimeSeries
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries
SAP Predictive Analytics Modeler – Time Series lets you build predictive models from data representing time
series. Thanks to SAP Predictive Analytics Modeler – Time Series models, you can:
● Identify and understand the nature of the phenomenon represented by sequences of measures, that is,
time series.
● Forecast the evolution of time series in the short and medium term, that is, to predict their future values.
● The trend. The trend represents the evolution of a time series over the period analyzed. The trend is
represented either by a function of time or by signal differentiating, which is calculated in SAP Predictive
Analytics Modeler – Time Series using the principle that a value can be predicted well enough based on the
previous known value. Calculating the trend allows to build a stationary representation of the time series
(that is, the time series does not increase or decrease any more). This stationary representation is
essential for the analysis of the three others components.
● The cycles. The cyclicity describes the recurrence of a variation in the signal. It is important to distinguish
calendar time from natural time. These two time representations are often out of phase. The former -
which is referred to as seasonality - represents dates (day, month, year and so on), while the latter - which
is referred to as periodicity - represents a continuous time ( 1, 2, 3 and so on).
● The fluctuations. Fluctuations represent disturbances that affect a time series. In other words a time series
does not only depend on external factors but also on its last states (memory phenomena). We try and
explain parts of the fluctuations by modeling them on past values of the time series (ARMA or GARCH
models).
● The residue. The information residue is the information that is not relevant to explain the target variable. As
such, predictive models generated by SAP Predictive Analytics Modeler – Time Series are characterized
only by the three components trend, cycles and fluctuations.
Another important part of a Time Series modeling is to make some forecasts. An SAP Predictive Analytics
Modeler – Time Series model will use its own prediction in order to predict the next value.
7.3.3.10.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Parameters
This section describes parameters of SAP Predictive Analytics Modeler – Time Series which can been found
under the 'Parameters' section of the component.
AutoFeedCount The number of steps in the future for which the model will be An integer value.
optimized (Learning Horizon).
MaxCyclics The maximal number of cycles analyzed by SAP Predictive An integer value.
Analytics Modeler – Time Series. 450 is the maximal num
450 is the default value.
ber of cycles that SAP Predictive Analytics Modeler – Time
Series is able to analyze. During the learning phase, this
number may be reduced to half of the estimation data set.
MaxLags The number of lagged variables equal to a quarter of the es An integer value.
timation set size with no maximum value.
Note - The fluctuations step
can be skipped by setting
this parameter to 0.
LastRowWithForecast Saves the index of the last line of the file. This parameter is An integer value.
ingInformation required if you want to use extra predictable inputs.
ModelsGenerationOpt Controls how the models are generated internally by SAP ● Default
ion Predictive Analytics Modeler – Time Series.
● “Only Based on
Extra
Predictables”
● “Disable the
Polynomial
Trend”
● Customized
CustomizedModelGene Used when the model generation is customized. It contains a Not applicable.
ration (folder) boolean entry for each model.
VariableSelection This parameter groups some controls for the variable selec ● PercentageContri
(folder) tion feature. When a variable selection is used, an automatic b
selection process is performed on trends or AR models dur
● ActivateForEXPTr
ing the competition and the result is kept only if it improves
ends
the final model.
● ActivateForAutoR
egressiveModels
ProcessOutliers Activates an outliers processing strategy. Some extreme val ● true (default value)
ues are avoided when estimating the time-based trends
● false
leading to a more robust trend estimation.
ForcePositiveForeca Activates a mode where the negative forecasts are ignored ● true
st (replaced by zero). This is useful when the user knows that
● false (default value)
the nature of the signal is positive (number of items in stock,
amounts, number of passengers, and so on).
AutoFeedCountApplie The number of steps in the future on which the model will be An integer value.
d (Apply Horizon) applied. This parameter may be different from the learning
Note - If this parameter is not
horizon (AutoFeedCount). This will generate as many fore
casts. set, it is equal to the learning
horizon.
ForecastsConnection Gives the format of the forecasts in the output of SAP Pre ● true(default value):
dictive Analytics Modeler – Time Series. the forecasts are trans
posed at the end of the
kts_1 variable with the
corresponding dates.
● false: the forecasts
stay in the last line of
the file.
ExtraMode A special flag that allows to set the type of outputs that SAP ● No Extra
Predictive Analytics Modeler – Time Series will generate.
● Forecasts and
Error Bars
● Signal
Components
● Component
Residues
ErrorBarsConfidence Used to control the degree of confidence requested to com A percentage value.
pute the confidence interval for each horizon forecast (de
95% is the default value.
fault 95%).
DateColumnName The column in which is stored the date. This parameter is set A character string.
by the user and it cannot be changed later for a given model.
DateNeedSetKeyLevel Unsed internally when the key level is not set. Not applicable.
PredictableExtras The folder that contains all the exogenous variables whose User-dependent
future values are known (like a variable describing "the first
Friday of the month", an "isAWorkingDay" variable describ
ing working days).
UnPredictableExtras The folder that contains all the exogenous variables whose User-dependent
future values are not known (like variables describing
"monthly benefits" or "oil crisis").
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Parameters/
ModelsGenerationOption
Parameter Description
"Only Based on Extra Restricts the models to those using extra predictable variables.
Predictables"
"Disable the Polynomial Generates all the models but those containing a polynomial trend.
Trend"
Customized Gives the possibility to enable/disable any model generated by SAP Predictive
Analytics Modeler – Time Series. A boolean flag is associated to each trend/
cycle/fluctuation model type. These flags are detailed in the CustomizedModel
Generation parameter below.
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Parameters/
CustomizedModelGeneration
The following options are relevant only if the ModelGenerationOption parameter is set to Customized.
Parameter Description
Lag1 Controls the Lag1 trend (the previous value of the signal).
LinearPlusExtraPredictors Controls the Time and ExtraPredictors trend (Linear regression on the time and
extra-predictable Variables).
PolynomialPlusExtraPredict Controls the polynomial in time and linear in ExtraPredictors trend (polynomial
ors regression on the time and linear in extra-predictable variables).
For example, to disable the linear trend, you need to set CustomizedModelGeneration/Linear to false.
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Parameters/VariableSelection
PercentageContrib The percentage of contributions that are kept in the auto 95% is the default value.
matic selection process.
ActivateForEXPTrend When set to true, it performs a variable selection on all ex ● true (default value)
s tra-predictable-based trends.User variables are kept only if ● false
they have sufficient contributions in the trend regression.
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Parameters/ExtraMode
Parameter Description
No Extra Generates basic outputs, that is the target variable and its predicted values.
Forecasts and Error Bars Generates the same as above, with the confidence interval for each learning hori
zon.
Signal Components Generates basic outputs plus signal components (trend, cycles and so on). Pre
dicted values correspond to the sum of all of these signal components.
Component Residues Generates the same outputs as Signal Components plus to the residues' values
for each predicted signal component.
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Results
This section describes the results of SAP Predictive Analytics Modeler – Time Series.
Variables (folder) The folder where all variables used to build the model are The <Variable_Name>
stored. Each variable appears as a folder containing two folder is described by two
subfolders: subfolders:
● Variable (folder)
● KTS_Specifics
(folder)
Model (folder) The folder describing all the components used by the gener ● MaximalHorizon
ated model. The specified value is the name of the model
● Trend (folder)
created by SAP Predictive Analytics Modeler – Time Series.
● Fluctuations
Note - This variable can be found in the 'Variables' folder un ● Outliers
der 'Results' section. If this section does not exist, it means (folder)
that no model has been found.
Perfs (folder) The folder where all performance indicators on all data sets For detailed explanation, see
for all forecasts are stored (details on these indicators is KTS_Specifics Perfs values.
given previously). These performances were computed be
tween signal and all autofeed variables.
TimeAmplitude Set only for datetime or date variable. This describes granu ● hourAmplitude
larity of the amplitude between the first and the last date of ● dayAmplitude
the estimation data set (hour, day, month, year).
● monthAmplitude
● yearAmplitude
TimeGranularity Set only for datetime or date variable. This describes the ● secondAmplitude
average granularity between each dates of the Estimation ● minuteAmplitude
data set (second, minute, hour, day, week, month, year).
● hourAmplitude
● dayAmplitude
● weekAmplitude
● monthAmplitude
● yearAmplitude
IsDeltaTimeConstant A boolean value that indicates if variation between each date ● true
are constant.
● false
DeltaTime The mean difference betwwen two consecutive times ob A real value.
served on the estimated dataset.
NbInit The number of lines reading in the file before learning or ap Not applicable (it is an inter
plying. This index is used to initialize model variables. nal parameter).
Parameters Description
Estimation/MAPE MAPE indicator value for each horizon in the training dataset
Validation/MAPE MAPE indicator value for each horizon in the validation dataset
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Results/Variables
The Variables folder is the folder where all variables used to build the model are stored. Each variable appears
as a folder containing two subfolders. The table below describes the two subfolders available.
Variable (folder) The folder for global information related to internal SAP Pre All SAP Predictive Analytics
dictive Analytics Modeler – Time Series variables. regular variable parameters,
together with some specific
ones (see KTS_Specifics pa
rameter, detailed below).
KTS_Specifics (folder) The folder for specific information for each variable build by Variable-dependent.
SAP Predictive Analytics Modeler – Time Series. The value
specified is the type of the variable generated by SAP Pre
dictive Analytics Modeler – Time Series.
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Results/Model
MaximalHorizon The maximal reliable horizon for the final SAP Predictive An An integer value.
alytics Modeler – Time Series Model. This horizon may be
lower than the horizon requested by the user.
Trend (Folder) The name of the trend used by the model. Model-dependent.
Cycles Contains all the periodic and seasonal variables, separated A list of cyclic components.
by commas, used by KTS. For each cyclic, seasonal, extra-
predictable variable in the cyclic component, the duration
(when relevant) is given under Cycles/CycleName/Duratio
nInSeconds.
Fluctuations Describes the auto-regressive process used by the model. The "AR" process, followed
This process is noted AR, and the number between the pa by its order in parentheses.
rentheses gives the order of this one.
e.g.: [AR(37)]
Outliers (folder) Provides the outliers for the current model. For each data set and each
outlier, the date, signal and
model values are provided.
Note
One or more of the previous elements (Trend, Cycles, Fluctuations) may not exist. In this case, it means that
the related component has not been detected by the model.
Outliers
For each outlier, the following three pieces of information are provided:
Parameter Values
7.3.3.10.3 Infos
Syntax
Path: Protocols/Default/Transforms/Kxen.TimeSeries/Infos
This folder contains the LearnTime parameter, that is the time (in seconds) needed for the model learning.
7.3.3.11 Kxen.TextCoder
Syntax
Path: Protocols/Default/Transforms/Kxen.TextCoder
SAP Predictive Analytics Explorer - Text Coding automatically handles the transformation from unstructured
data to structured data going through a process involving “stop word” removal, merging sequences of words
declared as 'concepts', translating each word into its root through “stemming” rules, and merging synonyms.
SAP Predictive Analytics Explorer - Text Coding allows text fields to be used “as is'” in classification, regression,
and clustering tasks. It comes packaged with rules for several languages such as French, German, English and
Spanish, and can be easily extended to other languages.
SAP Predictive Analytics Explorer - Text Coding improves the quality of predictive models by taking advantage
of previously unused text attributes. For example, messages, emails sent to a support line, marketing survey
results, or call center chats can be used to enhance the results of models for cross-sell or attrition.
7.3.3.11.1 Parameters
Syntax
Path: Protocols/Default/Transforms/Kxen.TextCoder/Parameters
This section describes parameters of SAP Predictive Analytics Explorer - Text Coding that can been found
under the 'Parameters' section of the component.
LanguageDetectionEn A Boolean value that indicates whether the language auto ● False: If set to False,
abled matic detection is enabled. the user-defined lan
guage will be used.
● True: default value.
Syntax
Path: Protocols/Default/Transforms/Kxen.TextCoder/Parameters/ExtraMode
A special flag that allows setting the type of outputs that SAP Predictive Analytics Explorer - Text Coding will
generate during an apply.
Parameter Description
Vectorization Generates all the columns provided in the original data set and for each textual
field:
● a column is created for each root identified by the model. If the root repre
sented by the column is present in the record, the value is set to 1, else it is
set to 0.
● one column provides the number of elements recognized by SAP Predictive
Analytics Explorer - Text Coding in the record
● one column provides the number of distinct root found in the record.
Language Detection Generates for each textual field a column indicating the language recognized by
SAP Predictive Analytics Explorer - Text Coding for this record. The value can be
the ISO language code, or the empty value if no language is recognized.
Generate Only Roots Generates the following columns for each textual field:
● one column for each root identified by the model. If the root represented by
the column is present in the record, the value is set to 1, else it is set to 0.
● one column providing the number of elements recognized by SAP Predictive
Analytics Explorer - Text Coding in the record
● one column providing the number of distinct root found in the record.
Syntax
Path: Protocols/Default/Transforms/Kxen.TextCoder/Parameters/ProcessingOptions
StopListenabled A Boolean value that indicates whether the stop list will be ● true(default value)
used or not.
● false
Stemmingenabled A Boolean value that indicates whether the stemming rules ● true(default value)
will be used or not.
● false
ConceptListenabled A Boolean value that indicates whether the concept list will ● true
be used or not.
● false(default value)
Synonymyenabled A Boolean value that indicates whether the synonymy will be ● true
used or not.
● false(default value)
DebugMode A Boolean value that indicates whether the debug mode is ● true
activated.
● false(default value)
VolatileStopList A parameter used to define a user's list of stop words. The default value is empty.
Syntax
Path: Protocols/Default/Transforms/Kxen.TextCoder/Parameters/RootSelection
RankingStrategy Allows you to select the ranking strategy for the root selec ● Frequency(default
tion, that is, to select which roots to keep in the dictionary. value)
● shannonEntropy
● kullbackInformat
ion
● mutualInformatio
n
● chiSquare
● informationGains
NbRootGenerated Indicates the maximum number of roots generated by SAP 1000: default value
Predictive Analytics Explorer - Text Coding.
MaxThreshold Indicates a threshold in percentage. If a root appears in more 100: default value
than the indicated percentage of all textual fields, it will be
eliminated.
Syntax
Path: Protocols/Default/Transforms/Kxen.TextCoder/Parameters/EncodingStrategy
Each root is converted into a variable and when the root appears in a text, its presence can be encoded with
one of the strategies listed.
boolean Specifies whether the word is present or not. 1: the word is present.
termFrequencyInverseDocumentFreq Stands for the apparition frequency of the root An integer value.
uency in the current text divided by the apparition fre
quency of the root in the whole set of texts.
7.4 DataSets
Syntax
Path: DataSets
A data space is an ordered list of cases (or events). It can be viewed as a file in a folder, a table in a database, a
SELECT statement (using SQL), or an Excel worksheet. A data space is generally associated with a model
through a role name: we call this association a 'dataset'. A classical example is when the transform must be
trained (you may prefer the term 'estimated') on a set of examples: in this case the set of examples used to
estimate the transform parameters will be known to the model as the "Estimation" dataset. Data spaces belong
7.4.1 Parameters
Syntax
Path: DataSets/<Dataset_Name>/Parameters
MappingReportSu Read-write This parameter allows tuning the quantity of in ● NoReport (De
ccessUILevel formation displayed to the user when a suc fault value): it
cessful mapping is done. does not provide
any information.
● SmallReport:
it displays only a
short report on
important infor
mation (for exam
ple, the number of
unmapped col
umns).
● FullReport: it
displays all de
tailed informa
tions (for exam
ple, the full list of
columns not map
ped).
MappingReportFa Read-write This parameter allows tuning the quantity of in ● NoReport it
ilureUILevel formation displayed to the user when a map does not give any
ping has failed. information.
● SmallReport (the
default value): it
displays only a
short report on
important infor
mation related to
the failure (for ex
ample, the num
ber of mandatory
variables not
mapped).
● FullReport it
displays all de
tailed information
(for example, the
full list of varia
bles not mapped).
MappingReportPa Read-write For each mapping, whatever its result, a report ● NoReport it
rameterLevel is stored in the parameters tree, which the user does not give any
can programmatically investigate. As previously, information.
the level of information stored in the parame
● SmallReport it
ters tree can be tuned with this parameter.
displays only a
short report on
important infor
mation related to
the failure (for ex
ample, the num
ber of mandatory
variables not
mapped).
● FullReport (De
fault Value): it dis
plays all detailed
information (for
example, the full
list of variables
not mapped).
Explain Read-only If the parameter is set to true, then when sub ● true
mitting an SQL request, instead of returning the ● false(default
resulting values, the DBMS returns a specific re
value)
sult set describing step by step how the SQL re
quest will be executed and how much time each
step will take.
FilterCondition Read-only A character string featuring a filter condition. A character string, for
String example 'where'
Connector Read-write It indicates whether or not the space is a data ● true: the root of
manipulation. the data manipu
lation structure.
● false
SkippedRows ● Read-write It is the number of rows that the system must An integer
● Read-only for all skip before actually reading data. This is filled in
input spaces ex when the user specify a periodic cut strategy for
cept ApplyIn a model.
when the model is
in ready state, not
available for out
put space
LastRow ● Read-write It is the last valid row that the system will take An integer value.
● Read-only for all as part of the data set. This is filled when the
input spaces ex user specify a periodic cut strategy for a model.
cept ApplyIn
when the model is
in ready state, not
available for out
put space
ModuloMin ● Read-write It is used for periodic cut training strategy and An integer value.
● Read-only for all filled by the system.
input spaces ex
cept ApplyIn
when the model is
in ready state, not
available for out
put space
ModuloMax ● Read-write It is used for periodic cut training strategy and An integer value.
● Read-only for all filled by the system.
input spaces ex
cept ApplyIn
when the model is
in ready state, not
available for out
put space
ModuloPeriod ● Read-write It is used for periodic cut training strategy and An integer value.
● Read-only for all filled by the system.
input spaces ex
cept ApplyIn
when the model is
in ready state, not
available for out
put space
RandomMin ● Read-write It is used for random cut training strategy and An integer value.
● Read-only for all filled by the system.
input spaces ex
cept ApplyIn
when the model is
in ready state, not
available for out
put space
RandomMax Read-only for all input It is used for random cut training strategy and An integer value.
spaces except ApplyIn filled by the system.
when the model is in
ready state, not availa
ble for output space
RandomSeed Read-only for all input It is used for random cut training strategy and An integer value.
spaces except ApplyIn filled by the user.
when the model is in
ready state, not availa
ble for output space
HeaderLines ● Read-write It is the number of lines that the system must If this parameter is set
● Read-only for all skip before actually reading the header line. to 0, then the data file
input spaces ex begins with the col
cept ApplyIn umn names.
when the model is
Note - this parameter
in ready state, not
is used only if the
available for out
put space ForceHeaderLine
parameter is set to
true.
RowsForGuess Read-write It is the number of rows that SAP Predictive An An integer value.
alytics will read in order to analyze the actual
data and guess the value of variables (nominal,
ordinal, continuous) and depending on the kind
of data access, their storage (integer, number,
string, date, datetime).
GuessDescriptio Read-write It is a boolean flag used by connectors to give ● true: the infor
nUsesConnectorI priority to or ignore the user-defined informa mation coming
nfo tion (fields storage, value type, description, …) from the user will
in the guess description process. be preferred and
used.
● false: the guess
description will al
ways follow the
full process
(which involves
the analysis of the
first N data rows).
Open Read-only The name of the space (that can be used to A character string.
open the physical space containing data)or the
full SQL string when the space is a data manipu
lation.
Specification Read-only It is either the name of the data manipulation A character string.
being used (when working with a data manipu
lation) or the value of the Open parameter (see
above).
Store Read-only The open string used to open the store to which A character string.
belongs the space. A store can be either a folder
or an ODBC source.
7.4.1.1 MappingResults
MappingOK Specifies if the mapping is successful ● true: when the mapping is suc
or not. cessful
● false: when the mapping failed.
MandatoryVariables Refers to the list of variables that ;ust List of the variables separated by a
be successfully mapped. This set of comma.
variables depends on the data set and
on the operation requested. For exam
ple, for a regression, a training dataset
must have its target mapped. But for an
applyIn dataset, the target is not man
datory.
OptionalVariables Refers to the list of variables that need List of the variables separated by a
not be mapped. comma.
FieldsForMap The list of technical column names that List of the fields separated by a comma.
SAP Predictive Analytics has found in
the data set.
NbMandatoryVariablesNotMap Refers to the number of mandatory var An integer value. Any value other than
ped iables not mapped. 0 indicates a mapping error.
MandatoryVariablesNotMappe Refers to the list of mandatory variables List of the variables separated by a
d not mapped. comma.
VariablesNonCompatible Refers to the list of variables that List of the variables separated by a
matched but needed a conversion de comma.
pending on the CheckPolicy.
VariablesMappedWithForbidd Refers to the list of variables that List of the variables separated by a
enConversion matched but needed a forbidden con comma.
version.
VariablesMapped Refers to the list of variables success List of the variables separated by a
fully mapped whatever their mandatory comma.
status.
VariablesAutoName Refers to the list of variables which List of the variables separated by a
have been automatically matched by comma.
SAP Predictive Analytics without any
user action.
VariablesMappedUserName Refers to the list of variables which List of the variables separated by a
have been explicitly matched by the comma.
user (with the use of the InSpaceName
or Spacename mechanisms).
VariablesMappedWithConvers Refers to the list of variables which List of the variables separated by a
ion have been matched but needed a type comma.
conversion.
FieldsMultiUsed Refers to the list of technical column List of the fields separated by a comma.
names which have been used several
times in the current mapping. A techni
cal column name can be used only
once.
D Stands for "date". It refers to the date The value is of the form: XXX[:Z]
format used within this file.
XXX is a group of 3 letters (among ‘Y’,
‘M’ and ‘D’), indicating in which order
Year, Month and Day are represented. Y
is a symbol giving the separator use be
tween each if these.
Possible paths to the Parameters of the data set (according to the type of dataset):
● DataSets/Estimation/Parameters
● DataSets/Training/Parameters
● DataSets/Validation/Parameters
QuotingPolicy The quoting policy of the fields content. ● never: no matter what the con
tent of the fields are, it is never
quoted.
● ifNeeded: the field content is
quoted only when it contains a
space or a special character.
● always: the field content is al
ways quoted.
Note
Many parameters are available to tune the parameters for ODBC Spaces. See ODBC Fine Tuning
documentation.
7.5 Plan
Syntax
Path: Plan
The Plan groups together all parameters involved when performing In-database Application (IDBA). The IDBA
is also an optimized scoring mode.
Syntax
Path: Plan/Conditions
All following parameters must be true to fully perform the in-database-apply process.
OnODBCStore Read-only true the applyIn store and applyOut store are an
ODBC type
OnSameDataBase Read-only true the applyIn space and applyOut space are on
the same ODBC source
OnDifferentTabl Read-only true the applyIn space and applyOut space are dif
e ferent
KMXLicenseAvail Read-only true a license scorer is valid for the ODBC source
able
KMXDefinedForTr Read-only true all transforms in the current model could be ex
ansformChain ported with the in-database-apply process
PrimaryKeyDefin Read-only true the current model has a physical primary key
edForApplyIn
LastTransformCo Read-only true the tuning parameter of the last transform for
mpliant the current model is exported with the in-data
base-apply process
Syntax
Path: Plan/Options
NbColumnByUpdat Read-only [n] The value of this parameter defines the number
e of element that is updated by pass.
Syntax
Path: Plan/Steps
[UPDATE]
Syntax
Path: Plan/Results
An external executable lets you define a list of programs (executables or scripts) in a configuration file. These
executables can be run from a client on a server.
See the SAP Predictive Analytics Administrator Guide for more information.
7.6.1 ExternalExecutableAvailable
Syntax
Path: ExternalExecutableAvailable
This parameter contains the list of available external executables. Each external executable is defined by a label
and a description.
This parameter contains the name of the external executable as defined by the key
ExternalExecutable.Name in the configuration file.
Parameter Description
Label This parameter contains the label of the current external ex
ecutable. The label is in the current language if it has been
translated in the configuration file.
Syntax
Path: ExternalExecutableAvailable/ExternalExecutableName
This node appears after the node ExternalExecutableName has been set and a command
validateParameter has been performed. It contains all information about the current script.
7.6.2 External_Executable_Name
Syntax
Path: <External_Executable_Name>
This node appears after the node ExternalExecutableName has been set and a command
validateParameter has been performed. It contains all information about the current script.
ExternalExecutableName This parameter contains the name of Its value corresponds to the value of
the file that will be executed. the key
ExternalExecutable.<Extern
al Executable Id>.External
ExecutableName defined in the
configuration file.
Description This parameter contains the descrip Its value corresponds to the value of
tion of current external executable. the key
ExternalExecutable.<Extern
al Executable Id>.Descript
ion defined in the configuration file.
Label This parameter contains an identifier Its value corresponds to the value of
that will be displayed by the SAP Pre the key
dictive Analytics Modeler. It is generally ExternalExecutable.<Extern
a word used to identify the external exe al Executable Id>.Label de
cutable. fined in the configuration file.
DefaultOutput This parameter is a Boolean that indi Its value corresponds to the value of
cates whether the output is in the the key
standard stream (in which case it will ExternalExecutable.<Extern
be displayed in the SAP Predictive Ana al Executable Id>.DefaultO
lytics Modeler). utput defined in the configuration file.
FormatOutput This parameter is a string that indicates Its value corresponds to the value of
the format of the output. If its value is the key
set to txt or html, the output will be ExternalExecutable.<Extern
displayed by the SAP Predictive Analyt al Executable Id>.DefaultO
ics Modeler. utputFormat defined in the configu-
ration file.
Possible Values:
● txt
● html
● User value
IsScript This parameter is a Boolean that indi Its value corresponds to the value of
cates if the current external executable the key
is a script or an executable. ExternalExecutable.<Extern
al Executable Id>.isScript
defined in the configuration file.
NbArgument This parameter is an integer that indi Its value corresponds to the value of
cates the number of arguments re
the key
quired by the external executable.
ExternalExecutable.<Extern
al Executable Id>.NbArgume
nt defined in the configuration file.
Syntax
Path: <External_Executable_Name>/Arguments/Argument_<n>
This parameter contains the information on the nth parameter of the current external executable.
Label The name of the nth parameter of the Its value corresponds to the value of
current external executable the key
ExternalExecutable.<Extern
al Executable Id>.Argument
.<id>.Label defined in the configu-
ration file.
DefaultValue The default value of the nth parameter Its value corresponds to the value of
of the current external executable the key
ExternalExecutable.<Extern
al Executable Id>.Argument
.<id>.DefaultValue defined in
the configuration file.
Description The description of the nth parameter of Its value corresponds to the value of
the current external executable the key
ExternalExecutable.<Extern
al Executable Id>.Argument
.<n>.Description defined in the
configuration file.
ArgumentType The type of the nth parameter of the Its value corresponds to the value of
current external executable the key
ExternalExecutable.<Extern
al Executable Id>.Argument
.<n>.Type defined in the configura-
tion file.
Possible Values:
● ExistingStore
● ExistingSpace
● Store
● Space
● Bool
● Index
● Number
● Integer
● Double
● String
NbAllowedValue This node contains the number of al If the value is set to 0 any value of the
lowed values. correct type will be accepted. This
number also represents the number of
nodes under the node
AllowedValue.
Syntax
Path: ScriptInformation
ScriptLauncher The command used to launch the ex The default values are the following:
ternal executable when it is a script.
● Linux: /bin/sh
● Windows: c:\windows
\system32\cmd.exe
ScriptLauncherOption This parameter contains the option The default values are the following:
used to launch the external executable
● Linux: -c
when it is a script.
● Windows:/C
ScriptExtension This parameter contains the extension The default values are the following:
that will be added at the end of the
● Linux: .sh
name of the script if needed.
● Windows: .bat
Note
The syntax of the command is:
Syntax
Path: ExternalExecutableName
This parameter is used to specify the name of the script that you want to execute. The value is one of the values
listed in the node ExternalExecutablesAvailable.
Syntax
Path: UseDefaultPath
Learn how to write scripts for the most common data-mining tasks with the scripting tool, KxShell.
Find the KxShell reference guide in the Automated Analytics API Reference on the SAP Help Portal.
Related Information
8.1 Overview
This document explains how to write scripts for the most common data-mining tasks with the scripting tool,
KxShell.
KxShell is distributed with its source code as an example of how to use the C++ library directly in a C++
program.
● Running the command kxshell.exe script.txt where script.txt is a text file containing the script.
● Launching KxShell and typing the command read script.txt where script.txt is a text file
containing the script.
● Launching KxShell and typing interactively the commands, which can for example be copied from a
document and pasted into the KxShell console.
The KxShell can be used to automate data-mining tasks because it does not require interaction. For example, a
program can automatically generate a script file named script.txt and launch the external command
kxshell.exe script.txt.
Every command/instruction executed is terminated with a 'OK' or 'not OK' status: For example:
Store.openStore "C:\"
This command is correctly executed because the instruction was executed successfully.
Store.openStore "DoesNotExist"
This command displays an error because the instruction did not complete successfully.
Inst1..
Inst2..
Inst3..
Typing the command kxshell.exe script.txt helps executing the sequence of instructions contained in
the script.txt script.
If an error is generated by a given instruction in the script, the execution of the script stops.
A modifier ( '-' or '+' ) can be used at the beginning of each instruction / line of the script.
● if the command is preceded by '-': If there is an error, it is ignored and the execution of the script goes on,
the next instruction is executed. For example, before applying a model, it is required to delete an output
table if it exists:
-store.dropSpace "MyOutputTable"
model.apply
If the MyOutputTable table does not exist, the error is ignored and the model.apply instruction is
executed.
● if the command is preceded by '+': here, the execution of the script goes on only if the current instruction
fails. The '+' prefix is mostly used by developers but its use may be useful in some cases. For example, it
may be required to check that a table does not exist. An attempt to read the table is performed and this
attempt must fail. Otherwise, it means that the table exists.
+store.readSpace "MyTestTable"
model.apply
If the MyTestTable table does not exist, the first instruction causes an error. Then the model.apply
instruction is executed.
If the MyTestTable exists, the first instruction completes sucessfully. That stops the execution of the
script.
The scripts of the first section will show how to use KxShell in a regression/classification task context, while the
scripts of the second section focus on clustering tasks. All the scripts use the data set known as Adult Census,
which is distributed with the software along with its description file. When files are used, it is assumed that
Census01.csv and desc_Census01.csv are in the current working directory; the models and output files
are also saved in the current directory. When an ODBC source is used, it is assumed that an ODBC connection
with the name Database is set up for user UserName/password.
This script trains a regression model on a training data set. It uses files as the data source. The script executes
the steps listed below:
This script has exactly the same behavior as the previous one, except for the fact that it uses an ODBC source.
Now take a look at the way the data set is cut into estimation, validation and test data. There are two
possibilities:
● specify three files that will be used as Estimation, Validation and Test respectively,
● define a single file with the three roles and choose between the periodic and random cutting strategies.
The default cutting strategy is random, which means that each record will be chosen to be part of the
estimation, validation or test set depending on the value of a random number calculated for that record.
This script will change the cutting strategy to the periodic method. In this example, out of 10 lines, 7 lines will be
used for Estimation, the 8th and 9th for Validation and the 10th for Test.
Some variables are defined as nominal, which means that their values represent categories. In that case the
coding engine (SAP Predictive Analytics Modeler - Data Encoding) analyzes how relevant the categories are
and 'compresses' them by grouping together categories that have the same behavior regarding the target and
by creating a 'KxOther' category for unimportant categories. This script shows how to disable this feature of
the Data Encoder.
1. Create a new model named census2 and define a training data set with its description:
createModel Kxen.SimpleModel census2
census2.openNewStore Kxen.FileStore .
census2.newDataSet Training Adult01.csv
census2.readSpaceDescription Training desc_Adult01.csv
2. Add the the encoding protocol with a specific name to be able to change its parameters. The syntax of the
addTransformInProtocol function allows giving a symbolic name to a transform as shown below:
census2.addTransformInProtocol Default Kxen.ConsistentCoder myK2CTransform
census2.addTransformInProtocol Default Kxen.RobustRegression
3. Load the parameters of the Kxen.ConsistentCoder transform:
myK2CTransform.getParameter ""
4. Modify the compression parameter called Compress, located in the Parameters branch of its sub tree:
myK2CTransform.changeParameter Parameters/Compress false
5. Validate the change by entering the command:
myK2CTransform.validateParameter
6. The rest of the script does not change:
census2.sendMode learn
census2.saveModel Models.txt "This is a model for which compression has been disabled." census2
7. To complete the script, add either delete census2 to free the memory and be able to create new
models, or quit.
This script shows how to exclude a variable and set the target variable. In a model, each variable must have role
one of the following roles: input, skip, target or weight. In order to exclude a variable, its role must be set
to skip. To select the target, the variable role must be set to target. By default, all variables have the input
role except for the last one, which has the target role.
Note - Remember that the target has to be numeric and either nominal with exactly two values (binary), or
continuous.
We are now going to show how to apply a model which has just been created or which we restore from the disk
to a new data set. (In fact, for this tutorial, we will use the same data set but it would be the same method for
another file.)
Since we already showed how to create a new model, we are going to try to restore one of the models we have
created.
This script will do exactly the same thing as the previous one, except that the data will come from an ODBC
source. The scoring file will be a table in the database as well.
1. Create a store variable of the type Kxen.ODBCStore in order to be able to open a table:
createStore Kxen.ODBCStore myStore
2. Open the database:
myStore.openStore Database UserName password
3. Let's assume that a model with the name census1_odbc has been saved in Database, load it in the
variable model:
myStore.restoreModelD census1_odbc 1 model
4. Open the store containing the application data set:
model.openNewStore Kxen.ODBCStore Database UserName password
5. Define the application input data set:
model.newDataSet ApplyIn Census01.csv
8.3 Segmentation
This script trains a clustering model on a training data set. The first steps are the same as for a regression/
classification model: create a model (still of type Kxen.SimpleModel) and define a training data set or three
data sets (estimation, validation and test sets).
This script trains the same clustering model, but using an ODBC source and saving the model in the database.
On a clustering problem, the basic approach is to create a clustering model using SAP Predictive Analytics
Modeler - Segmentation/Clustering. Nevertheless, the available definitions of the clusters may not be entirely
satisfactory and the clusters may be difficult to understand.
An interesting additional approach could be to run a classification model (with SAP Predictive Analytics
Modeler - Regression/Classification) afterwards on the data with the cluster we are interested in as target. This
The following section describes the general methodology to be used and illustrates it below with the Adult
Census data:
1. First, you need to build the clustering model using SAP Predictive Analytics Modeler - Segmentation/
Clustering and apply it to your data in order to have a table with the cluster number for each record.
2. Then you need to join our data and this table. Assuming that our data set is stored in a table named
Dataset, has an key field named ID, and that the result of the model application has been saved in the
table named Clustering, the following SQL statement (where the 10th cluster is used as example) can be
used:
SELECT Dataset.*, target = CASE kc_clusterId WHEN 10 THEN 1 ELSE 0 END FROM Dataset, Clustering
WHERE Dataset.ID = Clustering.ID
Note - The KeyLevel of the ID field must be set to 1 in the description file of the data set to be able to be
used by SAP Predictive Analytics as key in the output table. Otherwise, a field called KxIndex is created
and used to reference the records.
3. Finally, you have to create a new description file for this view, which is the same as for your data set, except
that it has an additional line for the variable 'target' that you created in the SELECT statement (name:
'target', storage: 'number', value: 'nominal').
4. You are then ready to run a classification model on the view created with the SQL statement, and see how
the regression engine characterizes it.
The following script is the application of this methodology to the Adult Census data to characterize the 10th
cluster.
● An ID field has been added to the Adult01 table, which has been renamed Adult01_ID.
● In the desc_Adult01 description table, the KeyLevel of the ID field has been set to 1 and the table has
been renamed desc_Adult01_ID.
● A description for the SQL SELECT statement that will be used has been prepared and named
desc_Cluster. It is the same as in point 2, except that it has an additional line for the target variable.
● The database supports the CASE statement.
Note - Contrary to the other scripts in this document that use ODBC, this one has not been tested with the
Access ODBC driver because Access does not support the CASE SQL function.
This script shows how to build a clustering model, apply it to a data set -either the same one or a new one, and
to automatically characterize one of the clusters using SAP Predictive Analytics Modeler - Regression/
Classification.
This script is the equivalent of the scoring script presented for regression/classification. Its goal is to load a
model and apply it to a new data set (in practice the same one) and ask the model to determine to which
cluster each record belongs.
This script is the equivalent of the previous one for ODBC. The goal of this script is to open the model saved
with the second script of this section (Basic Script Using an ODBC Source) and apply it to build a table
containing the cluster numbers for each record.
8.4 KxCORBAShell
Additionally to the standard KxShell interpreter, another interpreter is included to be able to run KxShell scripts
in a client-server environment.
During SAP Predictive Analytics installation, a sub folder named KxCORBAShell is created at the same level as
KxShell.
KxCORBAShell options
KxCORBAShell uses the same options and the same syntax as KxShell. However some additional options are
provided to specify the connection to the Remote server:
● -ORBInitRef
This option can be used to specify the physical server (<RemoteHostName>) and the default port
(<RemotePort>) used by SAP Predictive Analytics Server.
Syntax: -ORBInitRef NameService=corbaname::<RemoteHostName>:<RemotePort>
Example: -ORBInitRef NameService=corbaname::kxserv:12345
● -Service Name
This option can be used to specify the name of the logical service name used by the KXEN server, if it has
been changed at the installation by the administrator (this is only to be used if several SAP Predictive
Analytics Server have to be started on the same physical machine).
Syntax: -ServiceName <RemoteServiceName>
Example: -ServiceName FactoryEntries3
● -authenticated
This option must be specified if the SAP Predictive Analytics Server is in "Authenticated mode", which
means that a proper authentication is required to connect to the server.
Syntax: -authenticated
● -user
This option must be used in the case of an Authenticated Server to provide the user name to be used to
connect to the server. The user policy depends on the actual SAP Predictive Analytics installation, but
most of the time, it must be a valid user name for the Server’s Operating System.
Copy KxCORBAShell.sh (or .bat for Windows) then update the copy to reflect your installation
RemoteServerName, RemoteServerPort and ServiceName.
Learn how to use the Data Access API, which is the way for integrators and OEMs to extend how SAP Predictive
Analytics accesses external data.
Some integration or operational environments have proprietary data storage. For example, presentation tools
use their own internal layer to access data on many platforms and OLAP tools have their own internal way of
storing their data. In such cases, it can be useful to provide integrators with a solution to connect SAP
Predictive Analytics to their internal storage. This requires specifying a data access API that should be
implemented by the integrators. It can be useful for programming an additional data driver.
Note
Integrators must implement such extensions in C. This language is used for stability reasons, because C++
name mangling is not yet very stable in many environments. Even written in C, the functions defined in the
API can be viewed as methods defined for three classes: store, space and case iterator.
In SAP Predictive Analytics, data access is done through an abstraction layer that is decomposed under the
main classes of Store, Space, and Case Iterator.
These classes allow SAP Predictive Analytics to access data from sources that can be text files with separators,
or tables or SQL select statements accessible through an ODBC driver. In order to allow integrators to define
their own data access functions, the following third set of classes is available:
Together, these classes are not enough to run the data access. C++ wrappers are used to call functions written
in a dynamic loadable library. The internal architecture allows you to create several data access types.
Note
The functions used to perform these initial operations cannot be described in this document, as they are
depending on each integrator environment. In the case where some of these initializations must be done, it
is required that the integration environment loads the library and initializes it, because SAP Predictive
Analytics would load it without running the proper initialization.
Step Description
Install the dynamic loadable library so that the run-time of Library Installation [page 233]
SAP Predictive Analytics components can load it (in most
environments, libraries are looked for in a set of predefined
locations).
Declare the new user class name associated with its library Declaration of a New User Class Name [page 233]
in the configuration file and its configuration options.
This section presents the minimum implementation required to perform the first tests of integration. These
minimum requirements are decomposed for store, space, and case iterator. It also shows how a first running
implementation can be further refined.
9.3.1 Space
● Space_Cst
This function returns a handle (implemented as a void*) on a newly created space. A classical
implementation could return a memory pointer to a C++ class kept inside the memory space managed by
the library written by the integrator.
● Space_Dst
This function deletes a space and the memory location associated with this space. This function does not
have to close the space. SAP Predictive Analytics components will close the space before (see below).
● Space_Open
This function opens a space within a store at a given location. For a directory, iOpen would be the name of a
file in the directory, for an ODBC source, it would be the name of the view, or the table or a complete select
statement within the specified ODBC source. In a specific implementation iOpen can be any of the logical
name that allows the system to run a query to extract from or put back in the user internal storage. The
string iMode can be either "r" for space opened in read mode, and "w" for space opened in write mode.
● Space_Close
This function closes a previously opened space.
Note
The proposed level of the API does not make any assumption about where the actual connection to the data
source will be performed. Let us take the analogy with the ODBC connection, and assume that we want to
implement a user data access through ODBC by developing a user data access library. It is up to the user
design, taking into consideration the concurrent data access problems, to actually open an ODBC
connection at the store level when performing the Store_Open, the space level when performing the
Space_open, or simply at the case iterator level, when performing the begin.
9.3.2 Store
● Store_Cst
This function returns a handle (implemented as a void*) on a newly created store. A classical
implementation could return a memory pointer to a C++ class kept inside the memory space managed by
the library written by the integrator.
● Store_Dst
This function deletes a store and the memory location associated with this store. This function does not
have to close the store. SAP Predictive Analytics components will close the store before.
● Store_Open
This function opens a store at a given location. For a directory, iOpen would be the path name of the
directory, for an ODBC source, it would be the ODBC source logical name as seen on the machine running
the SAP Predictive Analytics components. In some cases, the environment can provide a user name and a
password to check access rights. The result of this operation is either KXDA_OK if successful, or
KXDA_FAIL is failure.
● Store_Close
This function closes a previously opened store. When used from SAP Predictive Analytics components, all
spaces opened from this store have been previously closed, so the implementation of this function does
not have to check for opened spaces within that store.
Sometimes, the integration environment does not have, in its original design, an object corresponding to the
notion of store. In this case, the integrator can create an empty C++ class with no method associated, any
call to Store_Open with any iOpen will return success, and all the implementation will focus on the notion of
space and case iterator. Having several stores is only important when the integrator wants to save models
within its own internal storage. In SAP Predictive Analytics design, any store can/should contain a specific
space called "KxAdmin" that holds information about the actual locations where models are stored within
this store, it is then important that the SAP Predictive Analytics components can retrieve the models
description using this name (KxAdmin can reconfigured with another name through the configuration file).
This minimum implementation does not allow a graphical interface to present to the user the names of the
stores that can be opened by the user. Most of the graphical user interfaces create an empty store, and ask
to open this store with an iOpen equals to the empty string ("") in order to get the list of possible stores. An
advanced implementation can use this feature but is not required at the beginning.
● CaseIter_Cst
This function returns a handle (implemented as a void*) on a newly created case iterator. A classical
implementation could return a memory pointer to a C++ class kept inside the memory space managed by
the library written by the integrator.
● CaseIter_Dst
This function deletes a case iterator and the memory location associated with this case iterator.
There are some types of user-defined spaces that know the storage of each column (variable, dimension).
When this is the case, SAP Predictive Analytics components can ask for this description instead of using a
default algorithm that will force a case iterator of cells of strings in order to derive this information from the first
100 lines of the space.
Example
Available Stores
When the user-defined DLL is used from a graphical interface, you can provide the list of available stores that
the application can open.
In this case, the user must provide a function call that returns the number of available stores and their
descriptions. In order to initiate this process, user interfaces always ask for a store with an empty open string,
and then it asks for the available stores in this stub store to have all the possible stores. When stores have a
hierarchical structure, this mechanism can be used to browse all available stores.
Available Spaces
When the user-defined DLL is used from is used form a graphical interface, it can be nice to have the list of
available spaces that the application can open from this store. In this case the user must provide a function
that returns the number of available spaces (and then their description).
When the user wants to use its own internal storage to save and restore models, it has to provide a certain
number of functions. First of all, it should be possible to open space in the write mode (to save the models) and
to erase lines within a space in a store that has some keys equal to some values.
The following table presents some of the compilers required to generate dynamic loadable libraries for different
platforms. The compilers required are C compilers, as C language has been chosen for stability reasons.
Win64 CL
Linux gcc
This section presents what to do in order for SAP Predictive Analytics to find a dynamic loadable library within
different environments.
The DLL file should be installed somewhere so that the application will find it. Typically a path variable is
searched for such DLL, or the directory where the initial application is located.
lFactory.setConfiguration("UserStore", "MyAccess:XXXX");
This call tries to load the corresponding dynamic library, but the loading of such a library is done in an OS-
dependent way. The following convention are used:
Note
The Search Variable is the variable used by the OS to locate dynamic libraries.
On Linux systems, we try first to load XXXX.so, then libXXXX.so. So any of the two names is valid.
This section presents what you must do to declare a new user-defined data access class within SAP Predictive
Analytics components.
The user-defined data access dynamic library must be "declared" to SAP Preditcive Analytics components
environment. To do this,you must add a configuration entry, either in the configuration file loaded by the
executable (for example, KxShell.cfg or KxCORBA.cfg), or using the setConfiguration call.
Key Value
MessageFile ../KxCORBA/KxTool_us.umsg
MessageFile ../KxCORBA/KxTool_fr.umsg
KxDesc KxDesc
KxAdmin KxAdmin
UserStore MySpecialStore:MyLib
UserStoreOption.MySpecialStore.MultiLang true
uage
Once the configuration file is written, there are two ways of loading it. A default configuration file is always
loaded at init time at the place where the executable code is present. Then, the user can force to load
supplementary configuration files through the following commands.
This configuration entry's key should be "UserStore", and the value should be a string composed of 2 fields,
separated by a ':' character:
● The first one is the symbolic name that will be attached in SAP Predictive Analytics components to this
class of Store, for example MySpecialStore. This name is used for example in the class Factory, in
createInstance, and by the model's function openNewStore.
● The second one is the actual name of the library, without any extension of system specific prefix, for
example MyLib, but not MyLib.dll, neither libMyLib.dll.
Of course, you can have several such entries in a configuration file, or call several times the
setConfiguration function.
Note
If the dynamic library cannot be loaded at when the configuration entry is set, it is currently silently ignored.
Calls to the getClassInfo function will not report any information on such store, and calls to
createInstance with this class name will fail.
Optionnally, a configuration option can be added to describe processing of charsets by the dynamic library. The
key name is UserStore.<My_Dynamic_Library>.MultiLanguage. Possible values are:
● False: all strings returned or consumed by the dynamic library are encoded using the current OS’s native
charset. In such a case, the kernel applies its own UTF8 encoding/decoding. This is the default value of the
option.
● True: all strings returned or consumed by the dynamic library are already encoded in UTF8, avoiding and
encoding/decoding step to the kernel.
Example
UserStore.SasWindows7.MultiLanguage=true
The data types of the Automated Analytics API are based on the CORBA scheme, which makes them language-
independent. However, each type is mapped to a real type in each scheme. The following table shows the API
data types with their corresponding language data types.
Note
Using the Java Common Interface layer or the Python scheme, objects are now returned without holder
types. See the Java Common Interface API documentation for more information.
API Type C++ Type Java CORBA Type JNI Type Python Type
Example
C++
Example
C++ common Interface
or
Example
Java/CORBA
import org.omg.CORBA.StringHolder;
import org.omg.CORBA.IntHolder;
import org.omg.CORBA.BooleanHolder;
import com.kxen.KxModel.*;
void printParameter(IKxenClassFactory iFactory,
IKxenModel iModel,
String iParamPath) {
IKxenParameterHolder lParam = new IKxenParameterHolder();
int hr = iModel.getParameter(iParamPath, lParam);
myProcessKxenResult(hr, "getParameter");
// result containers
StringHolder lName = new StringHolder();
StringHolder lValue = new StringHolder();
BooleanHolder lReadOnly = new BooleanHolder();
hr = lParam.value.getNameValue(lName, lValue, lReadOnly);
myProcessResult(hr, "getNameValue");
System.out.println("[" + lName.value + "] = [" + lValue.value + "]\n");
iFactory.deleteInstance(lParam.value);
}
Example
Java/JNI
import com.kxen.KxModelJni.*;
void printParameter(IKxenClassFactory iFactory,
IKxenModel iModel,
String iParamPath) {
IKxenParameterHolder lParam = new IKxenParameterHolder();
int hr = iModel.getParameter(iParamPath, lParam);
myProcessKxenResult( hr, "getParameter" );
// result containers
StringHolder lName = new StringHolder();
StringHolder lValue = new StringHolder();
BooleanHolder lReadOnly = new BooleanHolder();
hr = lParam.value.getNameValue(lName, lValue, lReadOnly);
myProcessResult(hr, "getNameValue");
System.out.println("[" + lName.value + "] = [" + lValue.value + "]\n");
iFactory.deleteInstance(lParam.value);
}
Example
Java Common Interface
import com.kxen.CommonInterf.*;
import com.kxen.CommonInterf.KxenParameter.NameValue;
void printParameter( KxenModel iModel, String iParamPath ) {
KxenParameter lParam = iModel.getParameter(iParamPath);
NameValue lNameValue = lParam.getNameValue();
System.out.println("[" + lNameValue.getName()
+ "] = [" + lNameValue.getValue() + "]\n");
lParam.release();
Example
Python
import aalib
def printParameter(model, paramPath):
param = model.getParameter(paramPath)
name_value = param.getNameValue()
print("[%s] = [%s]" % (name_value.name, name_value.value))
The format of the internal files used in the modeling phases (space description, model space, and model
description space). You can use it for setting up the database to support SAP Predictive Analytics features.
There are several files or tables that are directly read by Predictive Analytics for OEM components. Everything
has been designed in the components to be saved/restored under column tables formats. Three specific space
formats are predefined, these are:
This format is used to save a variable description. A description contains the information to describe from a
meta-model point of view the variables contained into a a dataset. These spaces are generally saved under a
name KxDesc. This default name can be overloaded through a loadConfigurationFile of a store.
Each line of the KxDesc file contains information about one variable.
This format is used to save description of models that are saved into a single store (a store is either an ODBC
source or a file directory). These descriptions contain all information that could be wanted by a user to restore
a previously saved model in this store.
One model can be associated with several lines (one for each saved version).
This format is used to save models. In order to perform this operation, models are first converted into their
parameter hierarchy counterpart and then saved into a flat parameter file that is described here. This basic
architecture would allow us to save individual transforms into seperate spaces, but we do not allow this from a
user perspective in order to ease the process of re-building a model (and internal dependencies) from different
sources. It must be noted that this allow to have a self-contained view of the model.
Each line of this space contains information about one parameter in the parameter hierarchy of one model.
Note
A parameter is defined, in this release, by both its model name, its model version, and its id. The read-only
flag is not saved in the model space, is each model class can rebuild this information internally.
ab Abkhazian
aa Afar
af Afrikaans
ak Akan
sq Albanian
sq Albanian
am Amharic
ar Arabic
an Aragonese
hy Armenian
as Assamese
av Avaric
ay Aymara
az Azerbaijani
bm Bambara
ba Bashkir
eu Basque
be Belarusian
bn Bengali
bh Bihari
bi Bislama
bs Bosnian
br Breton
bg Bulgarian
my Burmese
ca Catalan; Valencian
ch Chamorro
ce Chechen
zh Chinese
cv Chuvash
kw Cornish
co Corsican
cr Cree
hr Croatian
cs Czech
dv Divehi
nl Dutch; Flemish
dz Dzongkha
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
fj Fijian
fi Finnish
fr French
fy Frisian
ff Fulah
gl Gallegan
lg Ganda
ka Georgian
de German
gn Guarani
gu Gujarati
ha Hausa
he Hebrew
hz Herero
hi Hindi
hu Hungarian
is Icelandic
io Ido
ig Igbo
id Indonesian
ie Interlingue
iu Inuktitut
ik Inupiaq
ga Irish
it Italian
ja Japanese
kl Kalaallisut; Greenlandic
kn Kannada
kr Kanuri
ks Kashmiri
kk Kazakh
km Khmer
ki Kikuyu; Gikuyu
rw Kinyarwanda
ky Kirghiz
kv Komi
kg Kongo
ko Korean
kj Kuanyama; Kwanyama
ku Kurdish
la Latin
lv Latvian
ln Lingala
lt Lithuanian
lu Luba-Katanga
lb Luxembourgish; Letzeburgesch
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
gv Manx
mi Maori
mi Maori
mr Marathi
mh Marshallese
mo Moldavian
mn Mongolian
na Nauru
nv Navajo; Navaho
ng Ndonga
ne Nepali
se Northern Sami
no Norwegian
oj Ojibwa
or Oriya
om Oromo
os Ossetian; Ossetic
pi Pali
pa Panjabi; Punjabi
fa Persian
pl Polish
pt Portuguese
ps Pushto
qu Quechua
rm Raeto-Romance
ro Romanian
rn Rundi
ru Russian
sm Samoan
sg Sango
sa Sanskrit
sc Sardinian
sr Serbian
sn Shona
ii Sichuan Yi
sd Sindhi
si Sinhalese
sk Slovak
so Somali
st Sotho, Southern
es Spanish; Castilian
su Sundanese
sw Swahili
ss Swati
sv Swedish
tl Tagalog
ty Tahitian
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
bo Tibetan
ti Tigrinya
ts Tsonga
tn Tswana
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
vo Volapük
wa Walloon
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
za Zhuang; Chuang
zu Zulu
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
● Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
● The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
● SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
● Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering a SAP-hosted Web site. By using such
links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Gender-Related Language
We try not to use gender-specific word forms and formulations. As appropriate for context and readability, SAP may use masculine word forms to refer to all genders.