AugmentedAnalytics B

6/30/2021
SAP Analytics Cloud Help

Generated on: 2021-06-30 10:44:18 GMT+0000
SAP Analytics Cloud | Q2 2021 (2021.7)
PUBLIC
Original content: https://help.sap.com/viewer/00f68c2e08b941f081002fd3691d86a7/release/en-US
Warning
This document has been generated from the SAP Help Portal and is an incomplete version of the official SAP product
documentation. The information included in custom documentation may not re ect the arrangement of topics in the SAP Help
Portal, and may be missing important aspects and/or correlations to other topics. For this reason, it is not for productive use.
For more information, please visit the https://help.sap.com/viewer/disclaimer.
This is custom documentation. For more information, please visit the SAP Help Portal 1
6/30/2021
Augmented Analytics
Augmented Analytics comprises a set of SAP Analytics Cloud features that enhance the analytics process using machine
learning.
The Augmented Analytics features include Smart Insights, Search to Insight, Smart Discovery, and Smart Predict.
Smart Insights
Smart Insights allows you to quickly develop a clear understanding of complex aspects of your business data, by letting you see
more information about a particular data point in your visualization or table, as well as about a variance on your acquired data.
Search to Insight
Search to Insight is a natural language query interface used to query data.
Smart Discovery
By running machine learning algorithms, Smart Discovery uncovers new or unknown relationships between columns within your
dataset to help you understand the main business drivers behind your core KPIs.
Smart Predict
Smart Predict helps you answer business questions that need predictions or predictive forecasts to plan for the future business
evolution: It automatically learns from your historical data, and nds the best relationships or patterns of behavior to easily
generate predictions for future events, values, and trends. Additionally, you get easy to understand KPIs and visualizations that
help you evaluate the predictions accuracy. You can then leverage those predictions and predictive forecasts with con dence to
augment your planning model and stories.
Related Information
Smart Insights
Search to Insight
Smart Discovery
Smart Predict – Using Predictive Scenarios
Smart Predict – Using Predictive Scenarios
A Predictive Scenario helps address a business question requiring predictions. It is a workspace, where you create and compare
predictive models to nd the best one to bring the best predictions to address the business question.
The following types of predictive scenarios are available. You choose the one that best ts your business question.
Predictive
scenario Answers this type of business question...
Classi cation What is the likehood that a future event occurs? This event is observed at an individual level (customer, asset, product,
...) and at a certain horizon (in the year, before the end of week, in the month after a customer contact, ...) .
 Example
Who is likely to buy or not buy your new product? Which client is or isn't a candidate for churn?
6/30/2021
Predictive
scenario Answers this type of business question...
Regression What could be the prediction of a business value, taking into account the context of its occurrence?
 Example
What will be the revenue generated by a product line, based on planned transport charges and tax duties?
Time Series What are the future values of a business value over time, at a certain granularity/place?
 Example
How much ice cream will I daily sell over the next 12 months? I have my historical daily sales information, but I'd like
to be able to include other factors such as vacation months, and the seasons.
You can create one or several predictive models within a predictive scenario. Each predictive model produces intuitive
visualizations of the results making it easy to interpret its ndings. Once you have compared the key quality indicators for
different models, you choose the one that provides the best answers to your business question, so you can apply this predictive
model to new data sources for predictions.
 Restriction
To verify that Smart Predict is available in your SAP Analytics Cloud system, check out the SAP Note 2661746 .
Restrictions
The following restrictions currently apply to Smart Predict.
General Restrictions While Using Smart Predict
Restriction on Information on Restrictions
Predictive scenario migration - When predictive scenarios are moved from the Browse Predictive Scenario page to the Files
creation dates not kept area, the individual creation dates are not kept. The predictive scenario creation date that is
displayed after the move to Files is the timestamp for the move operation.
Availability of Smart Predict Smart Predict is available in most regions and for most tenant types.
For more details on exceptions and general availability, refer to the SAP Note 2661746 .
6/30/2021
Data Sources (acquired datasets and You can create predictive scenarios on datasets that use the following data sources:
planning models)
local le (.txt, .csv, .xlsx)
 Note
Files with extension .xls are not supported.
SAP Business ByDesign Analytics
SAP Cloud for Customer
SAP Cloud for Customer Analytics
SAP HANA
SAP Integrated Business Planning
SAP S/4HANA
SAP SuccessFactors
SAP Qualtrics
OData Services
SQL Databases
SAP BW
 Note
We recommend you to upgrade your SAP Analytics Cloud version to 1.0.43 to have
drop parent hierarchy nodes functionality. Although you can import dataset with a
lower C4AAgent version, hierarchy selection will be disabled and a corresponding
message will be shown in query builder.
Google Drive
SAP Cloud Platform Open Connectors
6/30/2021
Dataset - Storage formats

The following data types are currently supported:
Date and Date & Time, in the following formats:
YYYY-MM-DD
YYYY/MM/DD
YYYY/MM-DD
YYYY-MM/DD
YYYYMMDD
YYYY-MM-DD hh:mm:ss
Where YYYY stands for the year, MM for the month, and DD for the day of the
month, hh stands for hours from 0 to 24, mm stands for minutes from 0 to 59,
and ss stands for seconds from 0 to 59.
 Example
January 25, 2018 will take one of the following supported formats:
2018-01-25
2018/01/25
2018/01-25
2018-01/25
20180125
Numbers (any number with decimal point)
Integers (any number without decimal point)
The column name restrictions are the same as the SAP HANA ones. If some characters
are not supported, the column name is automatically converted to a supported name.
The original name is kept as a column description in the metadata.
UTF-8 encoding is supported.
 Note
Time variables are currently not supported by Smart Predict. If your dataset (acquired or
live dataset) contains a column that contains only time variables, this column won't be
included in the training process.
Dataset Maximum Sizes and Limits See System Sizing, Tuning, and Limits
Time Series Predictive Scenario

The Date and Date & Time formats that should be used in your dataset are the
following:
YYYY-MM-DD
YYYY/MM/DD
YYYY/MM-DD
YYYY-MM/DD
YYYYMMDD
YYYY-MM-DD hh:mm:ss
6/30/2021
 Note
While you can use this format in both live and acquired datasets, the
seconds (ss) won't be taken into account during the training of your
predictive models.
Where YYYY stands for years, MM stands for months, DD stands for day of the month,
hh stands for hours from 0 to 24, mm stands for minutes from 1 to 59, and ss stands
for seconds from 0 to 59.
 Note
Regardless of the date granularity you choose in your time series predictive
scenarios with a dataset as your data source, every date format has to include
years, months and days. This means that even if you just want a quarterly or
monthly forecast, the date format in your dataset still needs to include days.
If your data source is a planning model, you can use the YYYY-MM date format.
 Example
Let's say you use the YYYY-MM-DD date format, you can create time series
predictive scenarios where the date granularity can be:
Year expressed as YYYY-01-01 where YYYY is variable (moving year).
Quarter or Month expressed as YYYY-MM-01 where YYYY-MM is variable

(moving month).
Weekly data in the date format YYYY-MM-DD taking for instance the 1st day
of the week as the characters DD (moving week).
Day (calendar dates) expressed as YYYY-MM-DD where YYYY-MM-DD is

variable (moving day).
Smart Predict expects a date per each period to learn on: if you want to forecast your
monthly sales, you provide a date per month representing the value of the
corresponding month.
In a time series predictive scenario, you can de ne entities, each generating its speci c
predictive model simultaneously.
For example, if you de ne a column with countries as an entity, Smart Predict will
generate as many predictive models as there are countries in your data source.
The following limits are recommended when using a time series forecast model:
Number of forecasts (independent of the number of entities): 500 maximum
Number of entities: 1000 maximum
If your predictive model is con gured for a number of forecasts and/or entities beyond
the recommended maximum limits, it is likely to create performance issues that can
impact other users on the same SAP Analytics Cloud tenant. In the user interface, the
maximum number of forecasts that can be set is restricted to 500.
 Tip
When deciding how much historical data to use, please use the following as a
recommendation:
For a granularity of day, use 3 years of historical data
For a granularity of hour, use 3 months of historical data
6/30/2021
Time Series Forecasts Smart Predict time series forecasts don't persist the settings for Number Formatting selected
by the user in the User Preferences section of SAP Analytics Cloud Pro le Settings.
Classi cation Predictive Scenario In a classi cation predictive scenario, the target can only be a binary column that only takes
two values, for example, true or false, yes or no, male or female, 0 or 1. For this type of scenario,
Smart Predict considers that the positive target value, or positive target category of this
column, is the least frequently occurring value in the training dataset. However, to make sure
your trained predictive model is reliable, you need to make sure that you have a minimum
representation in your training dataset. For example, if your dataset contains very few failures,
your predictive model won’t be able to predict the under-represented category failures.
Training a Predictive Model Smart Predict currently excludes the following columns when training your predictive model:
Columns identi ed as having the data type <Time>.
 Note
Date & Time is supported by Smart Predict.
Columns with the data type <Textual>.
For more information, see Variables in Smart Predict
Restrictions Using Live Dataset With Smart Predict

Please note the following restrictions when using live datasets with your predictive models:
Restrictions
on Information on Restrictions
SAP HANA You should not allow the creation of live datasets on top of SAP HANA SQL Views using row-level security (see Structure of
SQL Views SQL-Based Analytic Privileges). In Smart Predict you access the dataset using the SAP HANA technical user con gured at
using row- the data repository level, and not using the SAP Analytics Cloud user pro le. This could result in a security issue as all
level SAP Analytics Cloud users would get access to the data accessible by the SAP HANA technical user. For more information,
security see Con guring a SAP HANA technical User in the On-Premise SAP HANA System.
Number of There is a limit of 1000 columns when using live datasets with predictive models.
columns for
live
datasets
Live Data You can create predictive scenarios on live datasets in the following on-premise SAP HANA systems:
Sources
SAP HANA 1.0 SPSP12 rev 122.04 and upwards
SAP HANA 2.0 SPS00, SPS01, and SPS02
SAP HANA 2.0 SPS03 and upwards
Note: Cloud deployments of SAP HANA systems are currently not supported.
Privileges A maximum of 4000 tables/SQL views are displayed for creating a live dataset through browsing. It is recommended that
for a SAP the SELECT privileges for a SAP HANA technical user are limited to only tables/SQL views required for the predictive
HANA models. For more information, see Con guring a SAP HANA technical User in the On-Premise SAP HANA System
technical
user
BI story You can't directly create a BI Story on top of a live dataset whether or not this live dataset was created with Smart Predict.
For more information, refer to Creating Calculation Views to Consume Live Output Datasets
6/30/2021
Restrictions
on Information on Restrictions
Train and Train or Apply operations using live datasets that last longer than 8 hours, don't complete.
Apply steps
with live
datasets
Date Format For live datasets, the following default SAP HANA date formats are supported:
DATE
SECONDDATE
TIMESTAMP
For more information, see Datetime Data Type.
Restrictions Using Planning Model as Data Source for Smart Predict
Restrictions on Information on Restrictions
Type of predictive models For Smart Predict - Predictive Planning, i.e. the integration of SAP
Analytics Cloud Smart Predict with SAP Analytics Cloud planning
models, only time series forecasting is supported.
Input planning models

Type: Smart Predict - Predictive Planning only supports
standalone planning models (both new model types and
classic account models). All data sources that can be
leveraged by SAP Analytics Cloud planning models are
automatically supported, with the exception of SAP BPC.
Neither live nor acquired SAP Business Planning and
Consolidation (SAP BPC) planning models are supported.
Version: You can use public or private versions. The input

version must be a public version, not in edit mode, or a
private version. You have at least a read access to it.
 Note
Smart Predict doesn't support predictive
forecasting on calculated measures, including
currency conversion measures, when your
planning model is a new model type.
For classic account models, there are speci cities while

using public or private versions when currency
conversion is enabled. For more information, see How
does Smart Predict Support Currencies De ned in
Planning Model?.
Prompt: Planning models with prompts are supported. If

prompts are present, default prompt values (set by the
user or SAP Analytics Cloud defaults) are used to query the
data. There is no way to set or change values.
6/30/2021
Entities (crossing of multiple dimensions)

You cannot select more than 5 dimensions or attributes to
For more information on entities, see also How Can You Get Distinct
create a time series predictive model generating predictive
Predictive Forecasts per Entities For Your Planning Model?
forecasts per entities.
The maximum number of entities supported is 1000.
Attributes and Hierarchies:
You can select the attributes (custom properties)

that form part of level-based hierarchies or that can
be freely de ned.
You can select system properties like the currency

for instance.
The levels can be selected indirectly by selecting

the custom properties that form part of the level-
based hierarchy, whereas for parent child
hierarchies, you need to create them as Smart
Predict can generate one segment for each leaf
only.
 Tip
It's possible to add custom properties to group
members in custom ways: you can use this
mechanism to keep the number of entities under
1000 and perform an intermediate forecasting
approach where predictive forecast is run on
intermediate nodes: For nodes above, predictive
forecasts will be spread and for nodes below,
they will be summed.
One entity can combine one or multiple dimension

members.
6/30/2021
Predictive Goal
These are Smart Predict - Predictive Planning settings
available:
Signal: a valid signal is a numeric value that is data

entry enabled. A numeric value which involves
formulas, or with aggregation type LABEL or NONE
is not a valid signal. Supported numeric values are
leaf members in the account dimension hierarchy,
with no formula, or a parent numeric value with
aggregation type SUM, or no aggregation de ned
(defaults to SUM) provided that none of its
descendants is a member which involves a formula,
or with aggregation type LABEL or NONE.
Smart Predict doesn't support calculated measures

when using a planning model, even if an inverse
formula is provided. For more information on
inverse formulas, you can refer to the chapter
Inverse Formulas.
Time granularity: indicates the time granularity

inherited from the planning model data source. If
the time granularity de ned in the planning model
date dimension is different from the time
granularity of the data, the time granularity of the
data is used when creating predictive forecasts. For
more detail you can refer to the section below
called Time Aggregation.
Date: the date dimension used to create the time

series predictive model.
 Note
Only the Date dimension is used as as
in uencer. All other dimensions, attributes or
measures are ignored when you select Train &
Forecast.
Dimensions or Attributes: you can combine

multiple dimensions and attributes to use as
source for the entities to get distinct predictive
forecasts per entities.
Number of Forecasts: number of forecast values you want

to get. The number of historical data points in the planning
model conditions the number of con dent predictive
forecasts you can get. The current ratio is 5 to 1. For
example, if you want one con dent forecast, you need at
least ve historical data points in your planning model. For
more information, see How Many Forecasts can be
Requested?.
6/30/2021
Outputs
Output versions:
You can write-back the predictions only on private

versions of planning models. The private versions
have to be created beforehand in stories. For more
information, see About Version Management.
You cannot write-back the predictive forecasts in a

dataset.
Time aggregation Time granularity: The time series predictive model is trained and
applied based on the level of time granularity available in the
planning model data source.
When creating a planning model, the time granularity of the date

dimension can be either Year, Quarter, Month or Day.
So, as a simple example, if the planning model's lowest level of

time granularity is monthly, then Smart Predict creates monthly
predictive forecasts.
 Example
The granularity of the date dimension of your planning
model is de ned as monthly.
You have data for the months from January 2016 to

December 2020 - 60 months.
You ask Smart Predict for 12 forecasts.
In this case, Smart Predict generates forecasts from January

2021 to December 2021.
In a more complex scenario, the time granularity de ned in the

planning model date dimension can be different from the time
granularity of the data. A planning model could have a daily time
granularity in the date dimension, but the data could be stored
every month, or every week. In this case, Smart Predict - Predictive
Planning gives priority to the time granularity of the data.
 Example
You have a planning model with daily granularity in the
date dimension – from January 1st 2016 to December
31st 2021.
The data is stored at a monthly level, so you have one

row of data for January 2016, one row for February 2016
etc.
If you ask Smart Predict for 12 forecasts, they are

generated for January 2021 to December 2021, not for
January 1st 2021 to January 12th 2021.
 Note
To learn more about date dimension, you can refer to the
chapter called About Dimensions and Measures.
6/30/2021
Publishing to PAi It's not currently possible to publish a predictive model created
from a planning model data source to a PAi application.
Spreading The spreading policy is the default policy available in the planning
model data source. It depends on how the dimension is used in the
model:
Date Dimension: no spreading occurs on date dimension as

predictive forecasts are generated using data at the lowest
level of granularity for the date dimension.
Dimension used as part of the entity de nition: no

spreading occurs.
Dimension attribute used as part of the entity de nition:

spreading occurs no matter if this attribute is used as a
level-based hierarchy or not.
Dimension not used as part of the entity de nition:

spreading occurs. If the target cell (in the horizon) is
unbooked, the value is allocated to the Unassigned
member of this dimension. Whereas if the target cell is
booked, the value will be split according to existing
weights.
For example, if your planning model data source has de ned a

spreading and if you have run the predictive forecasts on a parent
node (for example, <All Regions>), results are automatically
spread across all levels below (for example, <North America>,
<EMEA>, and then <all countries> below, then <all cities>, etc..).
For more information, you can refer to Spreading a Value Entering
Values in a Table and Disaggregation of Values during Data Entry.
Related Information
Setting up Live SAP HANA Data Access for Smart Predict
Understanding the Basic Concepts Used in Smart Predict
Planning Model as Data Source
Roles and Permissions for Predictive Scenarios

Predictive Content Creator and Predictive Admin are the roles in the application that include permissions for Predictive
Scenarios.
The following tables detail what you can do in Predictive Scenarios, and which roles and permissions you need to perform the
action.
To quickly access the table you need :
Connections
Users
Roles and Permissions
Predictive Models
Predictive Scenarios
6/30/2021
Data Sources
Data Repository
Consume the Generated Dataset that contains the predictions.
 Note
If you are an admin, you can create custom roles. For more information, see Creating Custom Roles and Standard
Application Roles.
Connections
What you can do Predictive Content Creator Predictive Admin
View available connections x x
Create new connections x
Update existing connections x
Delete a connection x
For more information, see Connections.
Users
View existing users x x
Create new users x
Update user information x
Delete a user x
For more information, see Users.
Roles and Permissions
View user role and permissions x x
Create/Assign roles to user x
Update user roles x
Delete user roles x
For more information, see Roles.
Predictive Models
6/30/2021
View existing predictive models x x
Create a new predictive model x x
Update a predictive model x x
Delete a predictive model x x
Train or retrain a predictive model x x
Apply a predictive model x x
Duplicate a predictive model x x
Publish a predictive model to a Predictive x

Analytics integrator (PAi) application
For more information, see Creating a Predictive Model.
Predictive Scenarios
View existing predictive scenarios x x
Create a new predictive scenario x x
Update an existing predictive scenario x x
Delete a predictive scenario x x
For more information, see Creating a Predictive Scenario.
Data Sources
Dataset
 Note
These roles and permissions apply to both live and acquired datasets.
View existing datasets x x
Create a new dataset x x
Update an existing dataset (only for live x x

datasets)
Delete a dataset x x
For more information, see About Datasets and Dataset Types.
Planning Models
 Note
6/30/2021
You must have the relevant license to create planning models. For more information, see Features by License Type for
Planning Models.
What you can do Predictive Content Predictive Admin Planning Modeler

Creator
View and select existing planning models. x x
 Note
You need read access to the planning model. You can
create private versions once you have read access.
Write-back the predictive forecasts in planning models. x x
 Note
This applies to the private version of the planning model
only. You must have write access to the private version
related to the planning model. You can always write to
private versions you own, but you need to be granted
write access for the versions you don't own (shared
versions).
Create planning models. x
 Note
To publish a private version of your planning model with Smart Predict forecasts, you need Maintain permissions at a global
level, and at the level of the speci c model. Maintain permissions aren't enabled by default on the Predictive Content
Creator and Predictive Admin roles. Planning Professional Admin , Planning Professional Modeler , and Planning Standard
Reporter are the roles in the application that include Maintain permissions.
Data Repositories
View existing data repositories (in dataset x x

User Interface)*
Create a new data repository (in the x

Administration User Interface)*
Update an existing data repository (in the x

Administration User Interface)
Delete a data respository (in the x

Administration User Interface)
* This means the Predictive Content Creator can effectively view the existing data repositories when creating a live dataset but
this does not mean the Predictive Content Creator can view the data repositories in the Administration User Interface.
** The Predictive Content Creator cannot access the Administration User Interface and therefore can't effectively see the data
repositories.
For more information, see Adding and Con guring the Data Repository in SAP Analytics Cloud.
Consume the Generated Dataset That Contains the Predictions

6/30/2021
To consume a generated dataset in a story or model, please refer to the following chapter:
Using the Generated Dataset in SAP Analytics Cloud
To consume predictions saved in a planning model version, you can create story tables or charts on your planning model,
selecting the private version used for predictive model application. For more information on planning in tables, and creating a
chart, please refer to the following chapters:
Planning in Tables
Creating a Chart
For required permissions on datasets, models, and stories please refer to the following chapters:
Permissions
Standard Application Roles
Creating Custom Roles
 Note
The BI Admin and Planning Admin roles include all Predictive Admin permissions by default.
Related Information
Standard Application Roles
Security
Model and Version Security
Permissions
Sharing Private Versions
Features by License Type for Planning Models
Understanding the Basic Concepts Used in Smart Predict

Here are some explanations of terms that you'll encounter when you create a predictive scenario:
Predictive Scenario: A workspace where you create and compare predictive models to nd the one that provides the
best insights to help solve a business question requiring predictions. Currently, you can choose between 3 different
types: classi cation, regression, and time series forecasting.
Predictive Model: The result found by Smart Predict after exploring relationships in your data using SAP automated
machine learning. Each predictive model produces visualizations and performance indicators based on certain
requirements that you have set, so you can understand and evaluate the accuracy of the predictive results. You'll
probably want to experiment a bit with different predictive models, varying the input data, or the training settings, until
you are satis ed with the accuracy and relevance of the results.
Data source: The form and origin of the data that you'll use to create a predictive model. This could be a dataset in a
database or a planning model in an SAP Analytics Cloud story.
Target or Signal: The variable that you want to explain or predict values for. Depending on your data source, this could be
the column or dimension that you're interested in knowing about. Target is used for classi cation and regression models,
Signal for time series forecasting models.
 Note
6/30/2021
In the Smart Predict documentation the term variable is used to mean either column or dimension. However, in the
user interface and messages, you'll see the speci c term for the data source being used: columns in datasets and
dimensions in planning model versions.
Entity: Only used in time series forecasting predictive scenarios. You can split up a population into distinct sections called
entities. A predictive model is created for each entity allowing you to get more accurate forecasts aligned with the
entity's particular characteristics.
In uencers: The variables that have an in uence on the target or signal. By default the predictive model considers all the
columns or dimensions as in uencers, and during training, will only retain the signi cant ones. You can chose to exclude
in uencers that you consider not worth including in the training. This is useful when dealing with large data sources.
Training: The process that uses SAP automated machine learning to explore relationships in your data and nd the best
combinations. The result is a formula, your predictive model, that can be applied to new data to obtain predictions.
Related Information
What Type of Predictive Scenario Do You Need?
Starting with a Predictive Model
Variables in Smart Predict
Training a Predictive Model
Partition Strategy
A Predictive Scenario is a precon gured workspace that you use to create predictive models and reports to address a business
question requiring the prediction of future events or trends. You choose the one that is relevant to the type of predictive
insights you are looking for.
 Restriction
Smart Predict is not available in all regions or for all tenant types.
For more information, refer to the SAP Note 2661746 .
You can nd examples on how to create and use the predictive scenarios available in Smart Predict in our playlist on YouTube
or looking at the individual videos:
Predictive Link to videos Description

Scenario Type
Overview on Basic Concept in Smart Predict: The Explore the different predictive scenarios currently available in Smart
Predictive Predictive Scenario Predict: Classi cation, Regression and Time Series Predictive
Scenarios Scenarios.
available
Classi cation Smart Predict: Finding the best Using a simple scenario, you will create a Classi cation Predictive
classi cation predictive model Scenario. You will create a rst predictive model and check the accuracy
of your predictions. Then, you will duplicate your predictive model and
will train it using another dataset. You will nally compare the 2
predictive models to nd the best one.
Classi cation Smart Predict: Understanding the confusion In the video Smart Predict: Finding the best classi cation predictive
matrix model, you have built a predictive model to answer your business issue.
Now, you will interpret the results of the confusion matrix for this
business case and understand how to set a threshold (or cut off point) to
best categorize your target group.
6/30/2021
Predictive Link to videos Description

Scenario Type
Classi cation Smart Predict: Using the pro t simulation In the previous videos, you have created a predictive model to answer
your business issue and have interpreted the confusion matrix and set a
threshold. Now, you will use the pro t simulation of Smart Predict, to
calculate the return on investment using this predictive model, and set
the ideal threshold that allows you to optimize your pro t.
Classi cation Smart Predict: Applying a classi cation In a previous video, we created a classi cation predictive model to
predictive model identify which customers to contact with a marketing campaign. Now,
we’ll use this predictive model on actual customer data to create the
output dataset containing the answer to this question.
Classi cation Smart Predict: Publishing a predictive Using a simple scenario (Predict if passengers are likely to cancel their
model using a PAi Connection ight booking), you will go through the different steps to publish a
predictive model to an S/4HANA system, using a Predictive Analytics
integrator connection.
Regression Smart Predict: Debrie ng a Regression Using a simple scenario we will create a Regression Predictive Scenario
Predictive Model and will go through the generated reports to evaluate the accuracy of the
predictive model.
Time Series Smart Predict: Creating a Time Series Using a simple scenario, you will create a Time Series Predictive
Forecasting Predictive Scenario Scenario. You will create a rst predictive model and go through the
different settings to be lled to create the scenario.
Time Series Smart Predict: Debrie ng a Time Series In the video Smart Predict: Creating a Time Series Predictive Scenario,
Forecasting Predictive Model you have created a Time Series Predictive Model in your Predictive
Scenario. Now you will go through the generated reports and evaluate the
accuracy of your predictive model.
Time Series Smart Predict: Creating a segmented time Using a simple scenario and the segmented variable of a Time Series
series predictive model predictive model, you will create several predictive models at once. You
will get accurate predictions per predictive model considering the
characteristic of each individual segment.
Starting with a Predictive Model

Once you've created a predictive scenario, you add its rst predictive model. You start by selecting a data source that contains
historical data, then specify the target or signal that you want to explain or predict the values for.
The predictive model is built using SAP automated machine learning algorithms that explore relationships in the data, and nd
the best combinations. This is called training the predictive model, and the result is the predictive model that can be applied to
new data to obtain predictions.
Each predictive model produces visualizations and KPIs that help you understand and evaluate the accuracy of a predictive
model.
Depending on your business question, you will probably want to experiment a bit with different predictive models, varying the
training data and settings to deliver a more accurate or relevant predictive output.
When you are con dent that you have a trained predictive model that generates results that satisfy your business question, you
can apply that predictive model to new data.
In the Smart Predict documentation the term variable is used to mean either column or dimension. However, in the user
interface and messages, you'll see the speci c term for the data source being used: columns in datasets and dimensions in
6/30/2021
planning model versions. Rows contain the observations for the variable. For example, in a database containing information
about your customers, the <name> and <address> of those customers are variables.
In a predictive scenario, variables have different roles that you assign when de ning the predictive goal and the training
requirements for a predictive model. For example, a variable can be a target or signal, another can be an identi er for an entity,
and others can be excluded from consideration by the predicitve model, perhaps because you consider them to have no
in uence on the target.
Depending on their values, variables can take the following properties:
Statistical Type: continuous, ordinal or nominal.
Data Type: for example, date, number, or string.
Related Information
Variable Statistical Types
Variable Data Types
Understanding Predictive Goal and Training Roles for Variables
Editing Column Details
De ne Settings and Train a Classi cation or Regression Predictive Model
De ne Settings and Train a Time Series Predictive Model
Variable Statistical Types
A variable can have several statistical types:
Type Description Example
Continuous Values are numerical, continuous, and The variable <salary> is both a numerical
sortable. They can be used to calculate variable, and a continuous variable. It may,
measures; for example, mean or variance. for example, take on the following values:
<$1,050>, <$1,700,> or <$1,750>.
During modeling, a continuous variable may
be grouped into signi cant discrete bins. The mean of these values may be
calculated.
Ordinal Values are discrete. They can be regrouped The variable <school grade> is an ordinal
into categories and are sortable. variable. Its values actually belong to
de nite categories and can be sorted. This
Ordinal variables may be:
variable can be:
Numerical: the values are numbers
numerical, if its values range
and they are ordered according to
between <0> and <20>,
the natural number system (0, 1, 2,
and so on). textual, if its values are A, B, C, D, E
and F.
Textual: the values are character
strings. They are ordered according
to alphabetic conventions.
6/30/2021
Type Description Example
Nominal Values are discrete. They can be regrouped The variable <zip code> is a nominal
into categories. variable. The set of values that this variable
may assume are clearly distinct, non-
 Caution ranked categories, although they happen to
be represented by numbers. For example:
Binary variables (variable with 2 distinct
<10111>, <20500> or <90210>.
values only) are considered as nominal
variables. They are the ones that can be The variable <eye color> is a nominal
used as target for classi cation variable. The set of values that this variable
predictive models may assume are clearly distinct, non-
ordered categories, and are represented by
character strings. For example: <blue>,
<brown>, <black>.
Textual A type of nominal variable containing For example the variable <Bluetooth
phrases, sentences, or complete texts. Headphones Customer Feedback> is a
 Note
Textual variables are used for text analyses. textual variable. The values for this variable
These variables are currently not can be <Durable cord, connect easy to
supported by Smart Predict, and are phone and plug.>, <Great t and great
therefore excluded from the training of a sound!> or <Great length and color. Super
predictive model. fast charging.>.
 Note
During training, the values of the categorical variables are regrouped into homogeneous categories. These categories are
then ordered as a function of their relative contribution with respect to the values of the target variable. For more
information, see Category In uence.
Variable Data Types
Variables can use the following data types:
String
Integer
Number
Boolean
Date
Date and Time
Time
Angle
 Note
Variables with Time (not timestamp) and Textual storage formats aren't currently supported by Smart Predict. These
variables won't be considered when training the predictive model.
Understanding Predictive Goal and Training Roles for Variables
A variable corresponds to a column in a dataset or a dimension in a planning model. The observations relating to each variable
correspond to the rows. Variables that have been speci ed as a target/signal, or an entity identi er, are not considered as
6/30/2021
in uencers. Unless you exclude certain in uencers, all other variables are treated as in uencers. The training retains the most
signi cant ones for the predictive model reports for debrie ng.
To build a predictive model, you de ne the following variable roles:
Role Description Example
Target or Signal The variable that you want to explain, or

 Example
predict the values for.
Classi cation predictive
scenario:
You want to predict if a customer

will answer to your mailing or
not. Your training data source
containing the customer
information contains the target
<responded to my mailing>.
This target may take the values
<Yes> or <No>. If the value
<Yes> is the least frequent
value, the application considers
that value to be the targeted
value for the target.
Regression predictive scenario:
You want to predict the number

of complaints that your
customer support will receive
this week. Your target is
<Number of customer
complaints> and it will take
<numerical> values.
Time series predictive scenario:
You want to forecast the product

sales for the next 6 months.
<Product sales> is your signal.
6/30/2021
Date The variable used for the date values. The date formats that should be used in
your dataset are the following:
 Note
This variable is mandatory for a time YYYY-MM-DD
series predictive scenario.
YYYY/MM/DD
YYYY/MM-DD
YYYY-MM/DD
YYYYMMDD
YYYY-MM-DD hh:mm:ss
Here, YYYY stands for the year, MM for the

month,DD for the day of the month, hh
stands for the hour, mm stands for the
minutes, and ss stands for the seconds.
 Note
Let's say you use the YYYY-MM-DD date
format, you can create Time Series
Predictive Scenarios where the date
granularity can be:
Year expressed as YYYY-01-01

where YYYY is variable (moving
year).
Month expressed as YYYY-MM-

01 where YYYY-MM is variable
(moving month).
Weekly data in the date format

YYYY-MM-DD taking for
instance the 1st day of the week
as the characters DD (moving
week).
Day (calendar dates) expressed

as YYYY-MM-DD where YYYY-
MM-DD is variable (moving
day).
Entity Optionally used in a time series predictive

 Example
scenario. It’s the identi er variable that you
want to use to split up the predictive model You want to forecast the energy
into entites, with each one producing its own consumption by industry sector for the
predictive model, so you get distinct next 6 months. Your signal value is
predictions for each entity. <Energy consumption> and your entity
The predictive model can then catch is <Industry sector>. You will get
behaviors that are speci c to a given entity, predictions and performance indicators
and so produce more accurate predictions. for each industry sector: commercial,
industrial, residential, transportation.
The entity can be a dimension in the data,
for example Region, Store, or Product
Family.
6/30/2021
In uencer The in uencers are variables that describe

 Example
your data and which serve to explain a
target. Unless excluded, all variables that Your company is marketing two products
aren't already selected as a target or signal, A and B.
or an entity identi er, are considered as
You have a database, which contains
in uencers, with only the most signi cant
references to:
ones being retained after training for
debrie ng. 1,500 of your customers. You
know which product, A or B,
During the predictive model creation, you
each customer has purchased.
can decide to exclude in uencers from the
training process, these are not taken into 10,000 prospects. You want to
account to compute the predictive model, know which product each
not included in the statistics for the customer is likely to purchase.
predictive model, not retrieved from the
data source, and not needed when you The variables <name>, <age>,
apply the predictive model to an <address>, and <socio-occupational
application data source. class> are your in uencers: they allow
you to generate a predictive model
capable of explaining and predicting the
 Remember
value of the target <product
You should exclude in uencers that are
purchased>.
directly related to the target, especially
variables that contain indirectly a target
variable. Statisticians call these
variables as "leakers" or "leak
variables". This will produce a wrong
predictive model with wrong
performance indicator unable to produce
prediction.
 Example
If a predictive model has the target
variable <has bought the product
Yes/No>, you should exclude the
in uencer <Billing amount> if it contains
the cost for the product.
 Tip
If there is a variable that is in uencing
the prediction at very high level then
there is a chance that it is a leak
variable.
Excluding in uencers that have no in uence

on the targets (for example <account
number>) can help speed up the training
process.
Editing Column Details

When you create a new predictive model, you can edit properties and specify descriptive information for a column, to ensure
that the columns are identi ed in the right category for the predictive model generation.
 Note
6/30/2021
any updates you make in the dataset are permanent: if you (or another user) reuse this dataset in another predictive model
or another predictive scenario, the changes will remain.
The following column properties can be updated:
Field Values
Description: Text eld

User speci ed column
name. This can be a more
relevant title as opposed to
the "Name" column that
contains the dataset column
name.
Storage:
String
Data type for the column.
Integer Note that a telephone or account numbers should not be considered numbers.
Number
Boolean
Date
Date and Time
Time
Angle
Type: Continuous: columns whose values are numerical, continuous, and sortable. They can be used to
Statistical data type calculate aggregations (such as min, median or max).
Nominal: columns that label data. They have no quantitative value (such as 1 and 2 to indicate male or
female).
Ordinal: discrete numeric column where the relative order is important.
Textual: text column containing phrases, sentences, or complete text.
 Tip
While creating a predictive model, if the column you want to select as target or entity (time series
forecasting) isn't available it is likely that this column wasn't assigned the right data type, you can
correct this here in Edit Column Details.
Missing: For example, if you enter #Empty, then any rows with no entries will receive #Empty as a value.
String speci ed here
replaces a missing value in
the column.
Key Your dataset needs to have at least one key column if you use a regression predictive model.
Specify one or multiple
unique identi ers for
observations in the dataset.
Data Sources in Smart Predict
In Smart Predict, you need to provide data so that your predictive model can be trained or applied to generate the predictions.
This data can be organized using different formats depending on the type of predictive model:
6/30/2021
Type of Predictive Model Type of Data Source Comments
Classi cation Dataset only It can be an acquired or a live dataset.
Regression Dataset only It can be an acquired or a live dataset.
Time Series
Dataset It can be an acquired or a live dataset.
Planning Model Not all types of planning model are supported. See Restrictions
Related Information
About Datasets and Dataset Types

SAP Analytics Cloud allows you to create planning models and run predictive time series forecasts on your data within a story
grid or table. For more information, refer to Predictive Time Series Forecasting.
With Smart Predict you can go one step further and generate forecasts per entities to get accurate business-oriented insights,
not only raw forecasts. You can directly use your planning model as the data source, no need to extract data in a dataset.
Smart Predict uses the data available in your planning model to create and train a predictive model. You can then analyze
predictive forecast accuracies across the combined dimension values and understand signal breakdown in details. Once you are
satis ed with the accuracy of your predictive model, you can generate the predictive forecasts: they are saving back directly in
the private version of your planning model. It’s then easy for you to augment your story with actual and predictive forecasts.
You can nd a detailed use case using a planning model as data source. Read this blog .
 Note
They are restrictions on using planning models as data sources. See Restrictions
When you create a predictive model, you initially specify a training data source, a target or signal variable, and then de ne
additional training settings. Training is a process that uses SAP automated machine learning algorithms to nd the best
relationships or patterns of behavior in the data source. The result is a predictive model that you can apply to a new data source
to predict with a probability what could be the value of the target or signal for each element of the data source.
While creating the predictive model, you've selected a training data source. As the values of the target variable are known in
this data souce, the data can be used to evaluate the accuracy of the predictive model's results. During the training process,
the data source is cut into sub-sets using a process called partition strategy, with a nal partition used to validate the predictive
model's performance, using a range of performance indicators and graphical tools. For more information regarding the partition
strategy, refer to the related link.
6/30/2021
The following graphics summarize what's happen when you click Train with a training dataset as data source:
The training progress is monitored in the predictive model list.
If the training is successful, the predictive model produces a range of performance indicators and graphical charts that
allow you to analyze the training results. Assessing the accuracy and robustness of the training is called debrie ng the
predictive model. You can nd more information on debrie ng clicking on the related link Look for the Best Predictive
Model at the end of this page.
If the training is not successful, use the Status Panel (click the icon Settings panel), to access detailed information
why warnings and errors were generated during the training process.
 Note
Smart Predict doesn't take into account neither the textual variables nor the time variables that are contained in the
columns of your training dataset. Therefore, these variables are excluded from the training of your Predictive Model. For
more information, see the Restrictions.
Once you are satis ed with the accuracy and robustness of your predictive model, you can apply it to a new data source for
predictive insights, ensuring that the new data source has the same information structure as the training data source. For more
information on the apply step, click the related link below Generating Your Predictions.
Related Information
Partition Strategy
Restrictions
Keeping Informed With The Status Panel
Looking for the Best Predictive Model
Generating Your Predictions
Partition Strategy
A partition strategy is a technique that decomposes a training data source into two distinct subsets:
6/30/2021
A training subset
A validation subset
The partition is performed as follows:
The row or dimension selection is random
The training subset contains 75% of the input rows or dimensions.
The validation contains 25% of the input rows or dimensions.
Thanks to this partition strategy, the application can cross-validate the predictive models generated to ensure the best
performance.
The following table de nes the roles of the two data subsets obtained using partition strategies.
The data source Is used to...
Training Generate different predictive models. The predictive models

generated at this stage are hypothetical.
Validation Select the best predictive model among those generated using the
training subset, which represents the best compromise between
perfect quality and perfect robustness.
 Note
For Time Series Forecast, the validation subset allows you to calculate the con dence interval (Error Min and Error Max) of
the predictions.
Related Information
How does Smart Predict Select the Best Predictive Model?

As the values of the target (or signal) variable are known in your training data source, the data can be used to evaluate the
accuracy of the predictive model's results. Thanks to the partition strategy, Smart Predict rst creates several predictive
models versions and then cross-validates the results of these generated predictive models in order to keep the one with the
best performance.
How is done the selection for a classi cation predictive model?

In the case of a classi cation model, Smart Predict bases the selection looking at the performance indicators Predictive Power
and Prediction Con dence. It selects the model with the best compromise between perfect quality and perfect robustness.
How is done the selection for a regression predictive model?

In the case of a regression model, Smart Predict looks at the performance indicators Root Mean Squared Error (RMSE) and
Prediction Con dence. It selects the model with the best compromise between these two performance indicators.
How is done the selection for a time series predictive model?
6/30/2021
The selection of the best time series predictive model is based on the horizon-wide MAE: The time series predictive model is
applied on the past observations found in the validation set. For each period, the predictive model calculates as many
forecasted values as requested by the analyst. This is called the horizon of forecasts. Each of those forecasted values is
compared to the corresponding actual one. Then, for each possible horizon, a per-horizon MAE can be calculated. The horizon-
wide MAE is the mean of all per-horizon MAE values.
Creating a Predictive Scenario

Create a predictive scenario and work with one or more predictive models.
Context
You create a predictive scenario that corresponds to the type of information you need to answer a business question. The
predictive scenario is where you will create and store one or several predictive models. Refer to Related Information for a full
description of the types of predictive scenarios available.
Procedure
1. Select Main Menu Create Predictive Scenario .
 Note
You can also create a predictive scenario from the Files page by clicking the Create icon in the My Files menu bar.
2. Select the type of predictive scenario that best ts your business question.
The New Predictive Scenario dialog appears. It lists les and folders in the Files repository. If you already have a user
folder, you can store your predictive scenario there, or you can create a new folder.
3. Browse to and select a folder for the predictive scenario, and then enter a meaningful and unique name.
4. Enter a description, if needed. It should describe the business intent of the predictive scenario; for example, what
problem is it trying to solve? Is its goal to predict who will buy a product? Or is it creating groups of customers?
5. Click OK.
Your predictive scenario is created and automatically opens the Settings pane. The pane contains the parameters and
options that you'll use to de ne its rst predictive model. The settings available depend on the type of predictive
scenario, and these are described in Related Information.
Related Information
De ning the Settings of a Classi cation or Regression Predictive Model
De ning the Settings of a Time Series Predictive Model Using a Dataset as Data Source
Creating a Predictive Model

Once you've created your predictive scenario, you can add your rst predictive model.
Depending on the type of predictive scenario you have created and the training data source type you will use to train your
predictive model, the predictive model settings can be different.
 Note
Don' forget to keep in mind the following when:
6/30/2021
Using an acquired dataset as data source: Don’t forget that an acquired dataset must be uploaded in SAP Analytics
Cloud, under the Files area, before you can use it.
Using a live dataset as data source: Before you start using live datasets as data source, you need to need to check
with your administrator that your live SAP HANA data repository works. For more information, see Setting up Live
SAP HANA Data Access for Smart Predict.
Using a planning model as data source (Time series predictive scenario only): Before you can use your planning model
as a data source, you must ensure that you have a least read access to it. There are also some speci cities when
currencies conversion is enabled. See How does Smart Predict Support Currencies De ned in Planning Model?.
 Note
Keep in mind that your training and application data source must come from the same data source location. You can't apply
a predictive model on a live dataset if it was trained with an acquired dataset, nor can you apply a predictive model on an
acquired dataset if it was trained using a live one.
Regarding planning model: the same planning model you use as a source will be the one where you save the predictive
forecasts back.
Click the image below to navigate
This image is interactive. Hover over each area for a description. Click highlighted areas for more information.
Related Information
About Preparing Datasets for Predictive Scenarios
De ning the Settings of a Classi cation or Regression Predictive Model
Before you train your classi cation or regression predictive model, you need to specify how you want your predictive model to
be trained through the Settings panel.
The following sections mirror the sections of the Settings pane you need to complete to create your predictive model.
6/30/2021
General
Settings Action Additional Information
Description Enter what your predictive model is trying For example, predict if a customer will
to do. churn or not.
Training Data Source Browse and select the data source that The data source can be an acquired dataset
contains your historical data. or a live dataset.
Edit Column Details Check and update if necessary the columns You might need to check the statistical type
contained in your data source. if you cannot select it as your target at next
step.
Predictive Goal
Target Select the column from your data source For a classi cation predictive model, the
that contains the information you want to target column must contain binary values
get predictions for. only (for example: yes or no).
For a regression predictive model, the
target column must contain numerical
values.
In uencers
Exclude as in uencer Select the in uencers that should not be All of the in uencers contained in your
taken into consideration by the predictive training data source can in luence more or
model. less the target.
Some in uencers might have too much

in uence on the target and should therefore
be excluded from the predictive model.
6/30/2021
Limit Number of In uencers During the training, Smart Predict chooses

an optimized number of in uencers to
include in your predictive model. This is
why the option Limit Number Of In uencer
is unchecked by default.
However, if you have a reason for overriding

the Smart Predict default setup, for
example if you want to focus only on a few
in uencers that have the most in uence on
the target, switch on the button, and set the
Maximum Number of In uencers to be
kept in the model.
 Example
Imagine that you want to launch a phone
survey. You decide to limit the survey up
to 3 questions. In this case, as you need
to focus on the questions that best
in uence the prediction, you check the
option Limit Number Of In uencers and
set Maximum Number of In uencers to
3
Click Train button. Thanks to the generated reports, you can analyze the predictive model performance and decide if you need
to further re ne your predictive model or if you can use it with con dence. For more information, see Looking for the Best
Predictive Model.
De ning the Settings of Time Series Predictive Model With a Planning Model as
Data Source
There are some settings to specify before you train your time series predictive model using a planning model as data source.
To de ne how you want your predictive model to be trained, use the Settings panel as described in the tables below.
For more information about what is currently supported in Smart Predict, see the section Restrictions Using Planning Model as
Data Source for Smart Predict in Restrictions.
General
Description Enter a description that explains what your For example, you might want to Forecast
predictive model is trying to do. product sales by city.
Times Series Data Source Browse and select the planning model you Smart Predict supports only standalone
want to use as a data source. planning models, (both new model types
and classic account models).
 Note
SAP Business Planning and
Consolidation (SAP BPC) planning
models are not supported whether these
are live or acquired.
6/30/2021
Version Browse and select the planning model The input version must be a public version,
version you want to use as data source. not in edit mode, or a private version. You
must have a least read access to it. There
are also some speci cities when currency
conversion is enabled. See How does Smart
Predict Support Currencies De ned in
Planning Model?.
Predictive Goal
Signal Select the numeric value containing the data Smart Predict doesn't support calculated
to be forecasted. measures when using a planning model,
even if an inverse formula is provided. For
 Note
more information on inverse formulas, you
If your signal is related to a can refer to the chapter Inverse Formulas.
currency, you also have the
For more information about using a planning
option to select Default
model as a data source, see the section
Currency or Local Currency.
Restrictions Using Planning Model as Data
The option you choose
Source for Smart Predict in Restrictions.
determines the currency used to
forecast and report on your For more information about currency
signal. support, see the chapter How does Smart
Predict Support Currencies De ned in
If your planning model is a new
Planning Model?.
model type structured with an
account dimension and one or To learn more about the different model
multiple measures, to de ne types, you can refer to the chapter called
your Signal, you must select a Getting Started with the New Model Type.
value for both, using the
Measure and Account selectors.
If your planning model is a new

model type or a classic account
model structured with measures
only, you're only prompted to
select a value for Measure.
Date The date dimension in the predictive model.
Time Granularity By default, this refers to the level of date If the lowest level of the date hierarchy in
granularity available in the planning model the planning model is daily, then Smart
data source. Predict will create daily predictive
forecasts.
Number Of Forecasts Select the number of predictive forecasts For more information, see How Many
you would like to get. Forecasts can be Requested?
6/30/2021
Entity Select up to ve dimensions or attributes This corresponds to de ning each entity

for which you want to get distinct forecasts. that you want to get forecasts for. The
This eld is optional. predictive model will capture speci c
behaviors for each entity and will generate
distinct predictive forecasts. For more
information, see How Can You Get Distinct
Predictive Forecasts per Entities For Your
Planning Model?
 Note
There are speci c restrictions on
entities. For more detailed information,
see the section Restrictions Using
Planning Model as Data Source for
Smart Predict in Restrictions.
Predictive Model Training
Train Using Select whether you want to train your It can be useful to de ne the range of
predictive model using all observations or a observations that will be used to train the
window of observation. predictive model. You may want to ignore
If you choose to use a window of very old observations or inappropriate
observations you'll need to specify the size observation to avoid that your predictive
of the window you want to use. model learns based on
obsolete/inappropriate behavior.
 Note  Example
If the range of predictive forecasts
For example, if you want to forecast
overlaps existing data in the private
travel costs for next year, you might
version, data will be overriden.
want to ignore a couple of months in
your past data where travel has been
frozen for budget reasons.
Until Select whether you want to train your If you select a custom observation date,
predictive model until the last observation make sure it stays within the time range
or another date of your choice. de ned in the data source planning model.
Force Positive Forecasts Switch the toggle on if you want to get This turns negative predictive forecasts to
positive forecasts only. zero. This can be useful when predictive
forecasts only make sense as positive
predictive forecasts. For example, if you
need to forecast the number of sales for
one of your main product by major cities for
a region. It makes no sense to get negative
values. Either you sell a number of products
or you sell none of them. Negative values
will be turned to 0.
Select the Train & Forecast button. Thanks to the generated reports, you can analyze the predictive model performance, and
decide if you need to further re ne your predictive model, or if you can use the predictive forecasts with con dence. For more
information, see Analyzing the Results of Your Time Series Predictive Model.
Related Information
6/30/2021
Getting Started with the New Model Type
How does Smart Predict Support Currencies De ned in Planning Model?

Smart Predict supports the currencies de ned in a planning model.
The planning model used as a data source might contain currencies. But how does Smart Predict support these currencies? It
depends on how they are con gured in your planning model data source, and on what type of planning model you are using - a
classic account model, or a new model type.
Classic account model: You can generate predictions in Smart Predict using the default or local currency to read and
write data, when your planning model is a classic account model, and currency conversion is enabled. Depending on your
selection, the report for all your model entities are consistently expressed in one default currency, or multiple local
currencies. For example, if the default currency is USD, you always see numbers expressed in USD in the Smart Predict
report. If there are multiple local currencies in your planning model, the numbers in the Smart Predict report re ect
these multiple local currencies.
When you select the Default Currency setting, you can only write to planning versions con gured to receive default
currencies.
When you select the Local Currency setting, you can only write to planning versions con gured to receive local
currencies.
 Note
To understand the currency displayed in the report, you can check the Smart Predict settings, and the currency
de nition in your planning model. The default currency is indicated in the planning model.
 Example
If you deal in Japanese yen, you would understand that your report is in Japanese yen, as your Smart Predict currency
setting is Local Currency.
New model type: Smart Predict doesn't support predictive forecasting on calculated measures, including currency
conversion measures, when your planning model is a new model type.
To learn more about the different model types, you can refer to the chapter called Getting Started with the New Model Type.
To learn more about how currencies can be set up in a planning model, see Planning on Data in Multiple Currencies.
For classic account models, Smart Predict can support currencies when they are set at the planning model level in the following
cases:
Planning Model Input Planning Model Output Planning Model

Settings Version Version Comment
Currency conversion A public version or a A private version.

 Note
disabled but a private version.
dimension with If the measure has a currency, the currency
currency is de ned. attribute should be added as an entity to get
distinct forecasts per currency to prevent mixed
aggregations.
6/30/2021
Planning Model Input Planning Model Output Planning Model

Settings Version Version Comment
Currency conversion A public version or a A private version created N/A

enabled. private version created with default currency.
with default currency or Private versions with local
local currency. currency or another
Private versions with currency are not available
another currency are not for selection.
available for selection.
A private version created
with a default currency or a
local currency.
Currency conversion A public version or a A private version. N/A

disabled, but a default private version.
currency is de ned.
 Caution
You must have at least read access to the input version, and written access to the output version (private only). Public
versions in edit mode are available for selection, but aren't supported.
Related Information
Attributes of an Account Dimension
Attributes of an Organization Dimension
Attributes of a Dimension
Planning on Data in Multiple Currencies
Setting Up Model Preferences
Displaying Currencies in Tables
Restrictions
Getting Started with the New Model Type
How Can You Get Distinct Predictive Forecasts per Entities For Your Planning
Model?
Thanks to Smart Predict, you can create distinct predictive forecasts per entities using your planning model as data source,
where the granularity of the predictive forecasts is determined by the aggregation level of the combined dimensions. But what
does it mean?
 Example
Let's take an example to better understand how it works: Imagine that you want to forecast your future sales by country and
by product.
To build a predictive model with distinct forecast per entities taking your planning model as data source, Smart Predict
needs to match the data contained in your planning model (actuals are used as historical data ) with the variable roles that
are mandatory to generate the predictive forecasts:
Roles in Smart Predict Correspondence in your planning model Correspondence with our example
Signal Measure that does not involve calculation It's the measure you want to forecast:
<Sales>
6/30/2021
Roles in Smart Predict Correspondence in your planning model Correspondence with our example
Date You can have several dates in a planning <Month>

model. You select one as main date to
consider as the date dimension
Entity It corresponds to dimensions or attributes <Country> dimension and <Product>

for which you want predictive forecasts. dimension
Before the training phase, the data can be represented as follows:
Once the training is done and the generated predictive forecasts are available, the data look like:
6/30/2021
When application is done, the predictive forecasts are added to your planning model. In our example, it means that the
generated forecasts for Sales will be added for June and July.
Reporting in SAP Analytics Cloud at a higher level
Even if the time series predictive model is trained and applied based on a lowest level of date granularity, you can still report
data at upper level in a story.
Going back to our example: The times series has been generated on months basis. You can report by aggregating the sales
by quarters or years (instead of month).
6/30/2021
Understanding Which Level of Predictive Forecasts You Need
Entities help you create your predictive forecasts at different levels, depending on your business needs. They can also help you
detect performance gaps in your predictive models in some cases.
Entities are subsets of your predictive forecasts that are calculated independently from a combination of one to ve
dimensions. Each entity can be seen as an individual predictive forecast. These individual predictive forecasts can be
agreggated at a higher level if needed.
The level you need for yourr predictive forecasts depends on the insights you want to get from your predictive forecasts and the
data available in your source planning model. You can create entities by combining up to 5 dimensions for varying levels of high-
detail predictive forecasts. You can also work without entities to keep your predictive forecasts high-level.
Let’s take a car sales scenario to explore in more detail how entities can help you nd the level of predictive forecasts you need.
Your company wants insights on future car sales. The company sells ve car brands across six countries. You have ve months of
data available in your source planning model, wand you want two months of predictive forecasts.
6/30/2021
Predictive forecasts using no entities

Predictive forecasts without entities can give you insights on general trends across your data source, such as future car sales
across all countries and brands.
In this case, you use car sales as your signal, Date as your date dimension and you train your predictive model.
You get a single predictive model that generates two car sales predictive forecasts, one for June and one for July.
6/30/2021
This high-level forecasting of the company’s future car sales is useful if you just need an overview without looking into subset-
speci c trends.
Predictive forecasts using entities with multiple dimensions

Predictive forecasts using entities with multiple dimensions can give you insights on trends within and between several subsets
of your data source such as car sales per brand and country.
In this case, you keep car sales as your signal and Date as your date dimension. You add Brand and Country dimensions as
entities and train your predictive model.
You get thirty predictive models that are calculated independently. The thirty predictive models generate sixty predictive
forecasts if all combinations have data available. This high-detail forecasting approach is useful if you need to focus on several
subsets such as how one brand performs in a given country or how one country performs by brand compared to other countries.
The sixty predictive forecasts can still be aggregated at a higher level by Country, Brand or Date if needed.
Predictive forecasts using entities with one dimension

Predictive forecasts using entities with one dimension can give you insights on trends within one subset of your data source
such as car sales per country
In this case, you keep car sales as your signal and Date as your date dimension. You add the Country dimension as entity and
train your predictive model.
6/30/2021
You get six predictive models (one per country) that are calculated independently. Twelve predictive forecasts are generated if
all combinations of Country and Date are possible.
These twelve predictive forecasts can also be aggregated at a higher level by Country or Date if needed. This mid-level
forecasting approach is useful when you need to focus on trends and relationships in one speci c subset such as how each
country will perform individually and compared to other countries.
Entities are a useful tool to tailor the level of your predictive forecasts to your different business needs. You may need to try a
few forecasting combinations to reach the level of accuracy you want.
 Note
Hierarchies are currently not supported as entities. For more information, see Restrictions.
De ning the Settings of a Time Series Predictive Model Using a Dataset as Data
Source
Before you train your Time Series predictive model using a dataset as data source, you need to specify how you want your
predictive model to be trained through the Settings panel.
The following sections mirror the sections of the Settings pane you need to complete to create your predictive model.
 Note
Don't forget that the date formats in your time series data source must be:
YYYY-MM-DD
YYYY/MM/DD
YYYY/MM-DD
YYYY-MM/DD
YYYYMMDD
6/30/2021
YYYY-MM-DD hh:mm:ss
where YYYY stands for the year, MM stands for the month, DD stands for the day of the month, hh stands for hour, mm
stands for minutes, and ss stands for seconds.
 Example
January 25, 2018 will take one of the following supported formats:
2018-01-25
2018/01/25
2018/01-25
2018-01/25
20180125
General
Description Enter what your predictive model is trying For example, forecast product sales by city.
to do.
Times Series Data Source Browse and select the data source that
contains your historical data.
Edit Column Details Check and update if necessary the columns You might need to check the statistical type
contained in your data source. if you cannot select it as your signal at next
step.
Predictive Goal
Signal Select the numeric column containing the

data you want to get predictive forecasts
for.
Date Select the column that contains the dates of

observations for the time series.
Number Of Forecasts Select the number of predictive forecasts See How Many Forecasts can be
you want. Requested?.
Entity Select up to ve columns from your data This corresponds to identify each entity
source for which you want to get distinct that you want to get predictive forecasts
forecasts. This eld is optional. for. The predictive model will capture
speci c behaviors for each entity and will
generate distinct predictive forecasts.
Predictive Model Training
6/30/2021
Train Using Select whether you want to train your It can be useful to de ne the range of
predictive model using all observations or a observations that will be used to train the
window of observation. predictive model. You may want to ignore
If you choose to use a window of very old observations or inappropriate
observations you'll need to specify the observation to avoid that your predictive
window size you want to use. model learns based on
obsolete/inappropriate behavior.
 Example
For example, if you want to forecast
travel costs for next year, you might
want to ignore a couple of months in
your past data where travel has been
frozen for budget reasons.
Until Select whether you want to train your Last Observation: Let the application use
predictive model until the last observation the last training reference date as a basis.
or another date of your choice.
User-De ned Date: You select a speci c
date (available in the dataset).
Force Positive Forecasts Switch the toggle on if you want to get This turns negative predictive forecasts to
positive forecasts only. zero. This can be useful when predictive
forecasts only make sense as positive
predictive forecasts. For example, if you
need to forecast the number of sales for
one of your main product by major cities for
a region. It makes no sense to get negative
values. Either you sell a number of products
or you sell none of them. Negative values
will be turned to 0.
Click Train & Forecast button. Thanks to the generated reports, you can analyze the predictive model performance and decide if
you need to further re ne your predictive model or if you can use the predictive forecasts with con dence. For more
information, see Analyzing the Results of Your Time Series Predictive Model.
How Many Forecasts can be Requested?

The eld Number of Forecasts allows you to select the number of predictive forecasts to generate.
You can set the number of predictive forecasts that corresponds to your business needs. However, this number can be set with
some limits:
In the case that your time series data source is a dataset that contains future values for in uencers, your number of
predictive forecasts needs to be inferior or equal to the number of future values observations you have in your data
source.
 Example
If you have future values for the next six months, then the number of predictive forecasts requested cannot exceed
six.
The number of predictive forecasts delivered with con dence intervals is determined as follows:
If the time series data source size is equal or fewer than 12 periods, it is treated as a small data source case and
by default the number of predictive forecasts with con dence intervals is set to 1.
6/30/2021
In the other cases, the number of predictive forecasts with con dence intervals is set to 1/5 of the time series
data source size.
 Example
If your time series data source contains 1,000 observations, Smart Predict can provide up to 200 predictive
forecasts with con dence intervals. If you ask for more than 200 predictive forecasts, the accuracy of the
forecasts starting from the 201st cannot be evaluated.
Related Information
Partition Strategy
De ning the Settings of a Time Series Predictive Model Using a Dataset as Data Source
How Adding In uencers to Your Dataset Can Potentially Increase the Accuracy of
Your Predictive Model?
Once you’ve trained your predictive model, the performance indicators can be too low to immediately consider the predictive
model accurate (see Predictive Power for a classi cation predictive model, Root Mean Square Error for a regression predictive
model or Horizon-Wide MAPE for a time series predictive model).
One way to increase your predictive model’s accuracy is to add in uencers to your dataset. These in uencers can then be used
by Smart Predict to improve its understanding of the relationships between your data.
 Note
In uencers are only available if your data source is a dataset.
Example
Your company noticed that the maintenance costs of their stores are getting too high. You need to analyze them to see where
to cut costs but also predict future maintenance costs better to avoid going over budget. You create your rst predictive
scenario with a Time Series predictive model to assess the maintenance costs per store. You choose the overall expenses as
signal, the date of these expenses as date variable and the store ID as entity.
You train your rst predictive model excluding the twenty-three possible in uencers.
The Horizon-Wide MAPE of your rst predictive model in the debrief is at 26.71%.
 Note
6/30/2021
You want the percentage of your predictive model’s Horizon-Wide MAPE to be as low as possible as it indicates the
percentage of error you can expect in your predictive forecasts.
You notice that some of the variables excluded as in uencers such as the number of Saturdays and Sundays have a direct
relation to the date dimension your used in your predictive model. You realize they impact the insights and could improve the
accuracy of your predictive forecasts if they were included as in uencers.
You create a second predictive model by duplicating your rst predictive model. However, this time you include all in uencers
and train your second model.
The Horizon-Wide MAPE of your second predictive model in the debrief drops to 20.77%.
Your predictive model gained 22% of accuracy by simply including variables as in uencers in your predictive model. You may
need to try a few in uencers combinations to reach the level of accuracy you want.
Related Information
In uencer Count
Root Mean Squared Error (RMSE)
Horizon-Wide MAPE
Predictive Power
Save and Train Your Predictive Model

Once you have set up the settings of your predictive model, you need to complete two additional steps before you can evaluate
its accuracy:
1. You save your predictive model to add it to the predictive model list of your predictive scenario. The predictive model is
saved with the status Not Trained.
 Note
The "Save" button is in the toolbar.
2. You train your predictive model. During training, Smart Predict explores and learns from relationships in the data source
to nd the best combination or patterns of behaviour and generates the predictive model. Its status is updated
immediately in the predictive model list as Trained.
6/30/2021
 Note
It can happen that the training fails. In this case, check the Status Panel clicking . You can also click on the Status
directly in the predictive model list.
Related Information
Keeping Informed With The Status Panel
Status of a Predictive Model
The Status Panel

You can keep informed about what is happening during the modeling process by regularly checking out the Status panel.
At the top of the Settings panel, click to access to information on the predictive model, and any errors that occurred
during the training step or apply step actions.
 Note
The Status panel stays empty until your predictive model is trained.
View Displays
Model Status The area where you can access the error messages. For example, if
the training failed, you can get some information on what went
wrong.
Detailed Logs Logs that display the details of each step of the process. In case of
a problem, it allows you to provide SAP support professionals with
information.
Status of a Predictive Model

Once your predictive model is saved, it is added to the list of predictive models contained in your predictive scenario with the
status Not Trained. From this area, you are able to monitor the status of your predictive model at each step. The predictive
model list is located at the bottom of the screen.
When you train or apply your predictive model, you can get the following types of information on the training or apply process:
Icon Status Means
6/30/2021
Icon Status Means
Warning The training/apply ended with warnings. It

means that the training/apply task could be
run: You might need to check the Status
panel for more information on these warning
and update your predictive model settings.
For the training, the application can display

a debrie ng. The status of the predictive
model will be Trained with Warning.
For the apply, the application could apply

the predictive model. However as some
warnings were found, the status is Applyed
with Warning. After 10 minutes, the status
will be reverted to the previous status
"trained with warning" or "trained".
Error The training/apply ended with errors. The

predictive model could not be applied or
trained. You might need to check the Status
panel for more information and update your
predictive model settings.
For the training, the application could not

trained the predictive model and no
debrie ng are available. The status of the
predictive model will be Train Failed.
For the apply, the application could not

apply the predictive model. The status of
the predictive model will be Apply Failed."
After 10 minutes, the status will be reverted
to the previous status Train Failed or Not
Trained.
Note that you can look for speci c

messages relating to errors in the Status
panel.
Trained The training/apply ended with success.

For the training, the application can display
a debrie ng. The status of the predictive
model will be Trained.
For the apply, the application could apply

the predictive model. After 10 minutes, the
status will be reverted to the previous
status Trained.
 Note
For Time series predictive models that are split up into entities, errors/warnings are displayed in the debrie ng reports
directly and exact error / warning is displayed per entity.
The Status panel is located at the top of the Settings panel, by clicking .
Looking for the Best Predictive Model
6/30/2021
You've trained your rst predictive model using your training data source and it's now time to evaluate if you can use it with
con dence to generate your predictions.
You can evaluate your predictive model's performance using a range of performance indicators. You can also add new predictive
models with different settings and compare their performances if you want to re ne your results.
What Can You Do in the Predictive Models List?

As soon as your predictive model is saved, it is added to its predictive model list at the bottom of the screen. You can view each
predictive model's status and perform the following actions from the list:
Opening a Predictive Model
Duplicating a Predictive Model
Deleting a Predictive Model
Adding a New Predictive Model to the Predictive Scenario
Checking the Status of a Predictive Model
Displaying the Assessing Your Predictive Model With the Performance Indicators
Opening a Predictive Model
Context
You want to open an existing predictive model.
Procedure
1. Open the predictive scenario that contains your predictive model.
2. Click the relevant predictive model to open from the Predictive Models list.
6/30/2021
Duplicating a Predictive Model

Duplicate a predictive model to create a new version in the same predictive scenario.
Context
You want to experiment with new settings starting from a predictive model version that already exists. You can duplicate the
predictive model's settings to create a new version of the current predictive model to:
Edit the Column Details of your dataset.
Add or remove excluded In uencers.
Set or update the number of in uencers for Classi cation and Regression predictive models.
Set new training and forecast settings for Time Series predictive models.
Procedure
1. Open the predictive scenario, which contains the predictive model you want to duplicate.
2. Open the Predictive Model list.
3. Click at the predictive model level you want to duplicate, and select Duplicate.
You create an exact copy of the original version of your predictive model.
 Note
The duplicated version created is always untrained.
Deleting a Predictive Model

You can delete one or several predictive models from any of your predictive scenarios.
Context
You want to delete a predictive model.
Procedure
1. Open the predictive scenario, which contains the predictive model you want to delete.
2. Open the Predictive Model list.
3. Click More () at the predictive model level you want to delete.
4. Select Delete.
Your predictive model no longer appears in the list.
Adding a New Predictive Model to the Predictive Scenario

You want to create a new predictive model from scratch in your predictive scenario.
Context
Your predictive scenario is created and you've already created at least a rst predictive model. Now you want to create a new
predictive model to test new training settings and compare the results.
6/30/2021
Procedure
1. Click the button Create Predictive Model.
2. De ne the predictive model settings.
3. Click the Train button.
The new predictive model is added to the predictive scenario and appears in the predictive model list. You can now easily
compare the existing predictive models to nd the one that best ts to your need.
Assessing Your Predictive Model With the Performance Indicators

Predictive model performance indicators help you assess the performance and robustness of your predictive model.
You can check the performance and robustness of your predictive model using several performance indicators. The indicators
available depend on the type of predictive model you have set up:
Type of Predictive Model Performance Indicator
Classi cation predictive model Predictive Power
Prediction Con dence
In uencer Count
Record Count
Classi cation Rate
Area Under the ROC Curve (AUC)
Gini Index
Regression predictive model Root Mean Squared Error (RMSE)
In uencer Count
Record Count
Error Mean
Error Standard Deviation
Maximum Error
Time Series predictive model Horizon-Wide MAPE
In uencer Count
Record Count
Predictive Power
Quality indicator of a classi cation predictive model.
The predictive power measures the ability of your predictive model to predict the values of the target variable using the
in uencers present in the training data source.
How to interpret the indicator?
6/30/2021
The predictive power indicator takes a value between 0% and 100%. This value should be as close as possible to 100%, without
being equal to 100%.
A predictive power of 1 is a hypothetically perfect predictive model, where the in uencers are capable of accounting for 100% of
information in the target variable. In practice, however, this is usually an indication that an in uencer, that is 100% correlated
with the target variable, was not excluded from the data source analyzed. A good practice would be to exclude this in uencer
when you de ne the settings of your predictive model.
A predictive power of 0 is a purely random predictive model with no predictive power.
 Tip
To improve the predictive power of a predictive model, try adding new in uencers to the training data source.
 Example
A predictive model with a predictive power of 79% is capable of accounting for 79% of the variation in the target variable
using the in uencers in the data source analyzed.
No exact threshold exists to separate a “good” predictive model from a “bad” predictive model in terms of predictive power as
this depends on your business case. The predictive model of a customer-based scenario can be considered "good" with a
predictive power of 40, while the predictive model of a nance-based scenario usually requires a predictive power above 70 to
be considered "good".

Robustness indicator of a classi cation or regression predictive model.
The prediction con dence indicates the capacity of the predictive model to achieve the same performance when it is applied to
a new data source, which has the same characteristics as the training data source. If the distribution of data is different
between the two data sources, the predictive model is no longer useful.

Prediction Con dence takes a value between 0% and 100%. This value should be as close as possible to 100%. To improve your
Prediction Con dence, you can add new rows to your data source, for example.
Calculation of Predictive Power and Prediction Con dence

Predictive Power and Prediction Con dence are the two main indicators of performance for your predictive model.
The following graph displays the predictive power and the prediction con dence calculation:
6/30/2021
Horizon-Wide MAPE
Error made of a time series forecasting predictive model.

The Horizon-Wide MAPE is the evaluation of the "error" made when using the predictive model to estimate the future values of
the signal, whatever the horizon. For each actual observed value, the predictive model calculates as many forecasted values as
requested by the analyst. This is called the horizon of forecasts. Each of those forecasted values is compared to the
corresponding actual ones. Then, for each possible horizon, a per-horizon MAPE can be calculated, which is the mean of the
absolute differences between actual and forecasted values, expressed as a percentage of actual values. The Horizon-Wide
MAPE is the mean of all per-horizon MAPE values that have been calculated..
 Example
A Horizon-Wide MAPE of 12% indicates that the error made when using a forecasted value will be of more or less 12%.
A Horizon-Wide MAPE of zero indicates a perfect predictive model.
The absolute value of the differences is taken into account to evaluate the average error.
Related Information
How Many Forecasts can be Requested?
In uencer Count
Number of in uencers used in the predictive model.
The In uencer Count indicates the number of in uencers used in the predictive model.
6/30/2021
 Tip
To improve the predictive power of your predictive model, you may want to add In uencers to the training data source.
Record Count
Number of rows processed.
Each row in the dataset represents an observation.
 Tip
To improve the prediction con dence of a predictive model, you may want to add new observation rows to the training data
source. In case of a classi cation predictive model, keep in mind that the number of rows of the class less represented must
ideally be higher than 1000.
Classi cation Rate

Ratio of correctly classi ed rows to the total number or rows.
For classi cation predictive models, it corresponds to the ratio of correctly classi ed rows to the total number of rows.
 Example
A classi cation rate of 0.82 means that 82% of the rows in the training dataset are correctly classi ed by the predictive
model.
 Note
The classi cation rate is not very well adapted to unbalanced cases, when the target category is not very frequent. For
example, if there is only 1% of the target category, it's very easy to have a very high classi cation rate. In such a case, check
the Predictive Power or the Area Under the ROC Curve (AUC).
Area Under the ROC Curve (AUC)

Rank-based measure of predictive model performance.
The Area Under the ROC Curve (AUC) is another way to measure predictive model performance. It calculates the area under
the Receiver Operating Characteristic (ROC) curve. The AUC is linked to Predictive Power (PP) according to the following
formula: PP = 2 * AUC - 1. For a simple scoring predictive model with a binary target, this represents the probability that a
randomly chosen signal observation will have a higher score than a randomly chosen negative observation (non-signal).
One of the interests of the Area Under the ROC Curve is its independence from the target distribution. For example, even if you
duplicate each positive observation twice by duplicating rows in the dataset, the AUC of the predictive model stays the same.
 Tip
AUC is a good measure for evaluating a binary classi cation system. It is useful for cases when the target category is not
very frequent, which is not the case of the Classi cation Rate.
 Example
6/30/2021
Below is an example of a ROC curve:
Sensitivity, which appears on the Y axis, is the proportion of CORRECTLY identi ed signals (true positives) found (out of all
true positives on the validation data source).
[1 – Speci city], which appears on the X axis, is the proportion of INCORRECT assignments to the signal class (false
positives) incurred (out of all false positives on the validation data source). (Speci city, as opposed to [1 – speci city], is the
proportion of CORRECT assignments to the class of NON-SIGNALS – true negatives.)
Error Mean
Mean of the differences between predictions and actual values.
The Error Mean or Standard Error of the Mean quanti es the precision of the predictive model's estimations. It's used to
determine how precisely the mean of the predictive model's predicted values estimates the population mean.

A mean of zero indicates that the mean of the predictive model is the same as the actual data. A mean value close to 0 is
better.
A negative mean value indicates that the predictive model always underestimates the values, and often generates values under
the actual values.
A high mean value indicates that the predictive model over-estimates the target values, and often generates values above the
actual values.
To improve the accuracy of your predictive model, you can bring additional in uencers that make the target clearer to the
training data source.
The Error Standard Deviation or Standard Deviation is a measure of variability that quanti es how much the errors vary from
one another.
6/30/2021

A low standard deviation is better because indicates that the data points tend to be close to the mean of the data source.
A high standard deviation indicates that the data points are spread out over a wider range of values.
To improve the accuracy of your predictive model, you can bring additional in uencers that make the target clearer to the
training data source.
Gini Index
Measure of predictive power based on the Lorenz curve.
The Gini index is a measure of the predictive power based on the Lorenz curve. It is proportionate to the area between the
random line and the predictive model curve. The Gini index is de ned as the area under the Lorenz curve. It is the area between
the ʻTrade-off’ curve and the obtained curve multiplied by 2.

The Gini index takes values from 0 to 1.
Value 0 corresponds to random predictive model, value 1 corresponds to ideal predictive model.
Maximum Error
Maximum absolute difference between predicted and actual values.
The Maximum Error is the highest value resulting from the calculation of the absolute differences between the predicted and
the actual values for each row of the data source.
6/30/2021

It's always positive and values closer to zero are better. It gives an idea of how far the prediction can be (at worst) from the
actual value.
Root Mean Squared Error (RMSE)

The Root Mean Squared Error (RMSE) is one of the two main performance indicators for a regression predictive model. It
measures the average difference between values predicted by a predictive model and the actual values. It provides an
estimation of how well the predictive model is able to predict the target value (accuracy).

The lower the value of the Root Mean Squared Error, the better the predictive model is. A perfect predictive model (a
hypothetic predictive model that would always predict the exact expected value) would have a Root Mean Squared Error value
of 0.
The Root Mean Squared Error has the advantage of representing the amount of error in the same unit as the predicted column
making it easy to interpret. If you are trying to predict an amount in dollars, then the Root Mean Squared Error can be
interpreted as the amount of error in dollars.
How to decrease the Root Mean Squared Error?

You can improve the Root Mean Squared Error by adding more in uencer in the training data source.
What is the formula used to calculate the Root Mean Squared Error?
The Root Mean Squared Error is calculated using the following formula:
where:
SSEw = Weighted Sum of Squares
W = Total weight of the population
N = Number of observations
wi = Weight of the i-th observation
ui = Error associated with the i-th observation
Other Interpretation
The Root Mean Squared Error can be interpreted as the standard deviation of the error (it's the square root of the error
variance).
Analyzing the Results of Your Classi cation Predictive Model
6/30/2021
Once you've trained your classi cation predictive model, you can analyze its performance to make sure it's as accurate as
possible.
Use the dropdown list to access and analyze the reports on in uencers and predictive model performance.
What do the values of the two Does the Which Which Is my model Can I see any What's next?
Cli main performance indicators target value in uencergroup of producing model errors?
ck mean? appear in s have thecategorieaccurate Is my You have two
th sufficient highest s has the predictions?predictive possibilities:
e Quickly check if your predictive
quantity in the impact onmost Can I model
ar model is accurate and robust, Your are
different data the in uence evaluate the producing
ea checking the global performance satis ed with
sources? target? on the costs/savin accurate
for indicators: your
target? gs using predictions?
m predictive
Get an overviewCheck how this model?
or Predictive Power is your model's
of the the top In the Use a large
e main measure of predictive frequency in performance
ve In uencerUse the panel of
inf model accuracy. It takes a each data after
in uencersContributiConfusion performance
or value between 0% and checking the
source of each impact on ons Matrix tab curves in the
m 100%. This value should be target class performance
the target. report, and assess Performance
ati as close as possible to indicators.
(positive or Only the analyze the predictive Curves tab, to
on. 100%, without being equal Then you can
negative) that top ve the model compare your
to 100% (100% would be a belongs to the contributinin uence use it: See
performance predictive
hypothetically perfect Generating
target variable. g of in detail, model to a
predictive model; 0% would Your
in uencersdifferent using random
be a random predictive It's usually Predictions.
are categorie standard predictive
model with no predictive recommended displayed s of an metrics such model and a You would like
power). To improve your that you have as a in uencer as speci city. hypothetical to see if you
Predictive Power, you can at least 1000 default. on the perfect can improve
add more in uencers, for records of the target: Use the predictive your
example. each class in For more Pro t model: predictive
your data informatio If Simulation model's
For more information, refer
source. Under n, refer to th tab and performance:
to Predictive Power.
this threshold, In uencer
6/30/2021
Prediction Con dence the validity of Contributi e estimate the Determi Duplic
indicates the capacity of the prediction ons. in expected ne the ate
your predictive model to con dence is no ue pro t, based percenta your
achieve the same degree of longer nc on costs and ge of the curren
accuracy when you apply it guaranteed. e pro ts populatio t
to a new data source, which val associated n to predic
has the same For more ue with the contact tive
characteristics as the information, is predicted to reach model
training data source. It takesrefer to Target po positive and a speci c and
a value between 0% and Statistics. siti actual percenta experi
100%. This value should be ve, positive ge of the ment
as close as possible to we targets. actual with
100%. To improve your ar positive updat
Prediction Con dence, you e For more target ed
can add new rows to your moinformation, with The settin
data source, for example. re refer to Detected gs. You
lik Confusion Target can
For more information, refer
ely Matrix, The Curve. then
to Prediction Con dence.
to Pro t compa
Check
ge Simulation. re the
how
 Note t two
much
Depending on your business "m versio
better
issue, you can look at the other ino ns and
your
provided performance indicators rit nd
predictiv
for the predictive model, and y the
e model
also review the pro le of the cla best
is than
detected curve. For more ss" one.
the
information, refer to Assessing . See
random
Your Predictive Model With the Duplic
If predictiv
Performance Indicators and The ating a
th e model
Detected Target Curve. Predic
e with The
tive
in Lift
Model.
ue Curve.
nc Updat
Check
e e the
how well
val settin
your
ue gs of
model
is your
discrimin
ne predic
ates, in
ga tive
terms of
tiv model
the
e and
compro
we retrain
mise
ar it. See
between
e De ne
sensitivit
les Settin
y and
s gs and
speci cit
lik Train a
y with
ely Classi
The
to catio
Sensitivit
ge n or
y Curve
t Regre
(ROC).
"m ssion
ino Predic
6/30/2021
rit Check tive
y the Model
cla values
ss" for [1- 
. Sensitivit Ca
y] or for uti
The Speci cit on
in uence y against You
of a the will
category populatio era
can be n with se
positive The the
or Lorenz pre
negative. Curves. vio
us
For more Understa
ver
informati nd how
sio
on, refer positive
n!
to and
Category negative
targets Delete
In uence,
are your
Grouped
distribut predic
Category
ed in tive
In uence
your model.
and
model See
Grouped
with The Deleti
Category
Density ng a
Statistics.
Curves. Predic
tive
Model
Target Statistics
Identify the categories in your target and their frequency in each data source.
Target Statistics are expressed in percentage and show how often each target class appears in the data source.
For each data source, the Target Statistics report lists the categories of the target variable. For each category, the table
indicates how often it appears compared to the other category. The target category (positive or negative target) is by default
the less frequent one.
In uencer Contributions
The In uencer Contributions show the relative importance of each in uencer used in the predictive model.
The In uencer Contributions view allows you to examine the in uence on the target of each in uencer used in the predictive
model.
The in uencers are sorted by decreasing importance.
The most contributive ones are those that best explain the target.
Only the contributive in uencers are displayed in the reports, the in uencer with no contribution are hidden. The sum of their
contributions equals 100%.
6/30/2021
 Note
The number of in uencers that are displayed, depends on the predictive model settings you have de ned at the creation
steps. Indeed, if you decided at the creation step to Limit Number Of In uencers to 3 for example, then you will get the
information on the 3 most important in uencers at a maximum.
Related Information
Category In uence
Category In uence
Analyze the in uence of different categories of an in uencer on the target.
Category in uence is an analysis of the in uence of different categories of an in uencer on the target, computed from basic
information:
The category frequency: percentage of observations within this category.
The difference between the percentage of positive cases in this category and the percentage of positive cases in the
whole population.
How to interpret the in uence?

The higher the absolute value of the in uence, the stronger the in uence of the category is: categories with values equal to 0 or
close to 0 are categories with no in uence on the target.
The in uence of a category can be positive or negative:
Categories with positive values are categories where observations are more likely to be in the positive category of the
target: The percentage of positive targets within this category is above the percentage of positive target in the whole
data source.
Categories with negative values are categories where the observations are more likely to be in the negative category of
the target: the percentage of positive target within this category is below the percentage of negative target in the whole
data source.
The in uence is computed for each category and provided by the engine using this formula:
In uence(C) = NP(C) * Frequency(C) / NC
where:
NC = Normalization Constant = (target_key_frequency) * (1 – target_key_frequency).
NP(C) = Pro t(most_frequent_target_category) * P(most_frequent_target_category|C) +

Pro t(least_frequent_target_category) * P(least_frequent_target_category|C).
Pro t(most_frequent_target_category) * proba(most_frequent_target_category) +

pro t(least_frequent_target_category) * proba(least_frequent_target_category) = 0.
Related Information
Grouped Category In uence
6/30/2021
Grouped Category Statistics
Grouped Category In uence

Identify which category group has the most in uence.
Grouped Category In uence shows groupings of categories of an in uencer, where all the categories in a group share the same
in uence on the target variable. You can quickly see which category group has the most in uence.
How to read the bar chart that is displayed as a default?

The X axis represents the in uence of the grouped categories on the target variable. The Y axis represents the in uencer of the
grouped categories.
The length and direction of a bar show whether the category has more or fewer observations that belong to the target
category:
A positive bar (in uence on target greater than 0) indicates that the category contains more observations belonging to
the target category than the mean (calculated on the entire validation data source).
<0> means that the category has no speci c in uence on the target.
A negative bar (in uence on target less than 0) indicates that the category contains fewer positives cases (%) than the
percentage of positive cases in the overall validation data source.
Grouped Category Statistics

Analyze how the categories in uence the target.
Grouped Category Statistics show the details of how the grouped categories in uence the target variable over the selected
data source.
As a default view, a scatter plot is displayed:
The X Axis displays the target mean:
For a nominal target, the target mean is the frequency of positive cases for the target variables contained in the
data source.
For a continuous target, the target mean is the average of the target variable for the category in the data source.
The Y Axis displays the frequency of the grouped category in the selected data source.
You can use the Validation and Prediction dropdown list to compare the results obtained by the predictive model (training
subset) and the one obtained on validation (validation subset).
Confusion Matrix
The Confusion Matrix, also known as an error matrix, is a table that shows the performance of a classi cation predictive model's
performance by comparing the predicted value of the target variable with its actual value.
Each column of the Confusion Matrix represents the observations in a predicted category, while each row represents the
observations in an actual class:
Total Predicted 1 (= Positive Targets Predicted) Predicted 0 (= Negative Targets Predicted)
Actual 1 (= Actual Positive Targets) Number of correctly predicted positive Number of actual positive targets that have
6/30/2021
targets (True Positive =TP) been predicted negative (False Negative =
FN)
Actual 0 (= Actual Negative Targets) Number of actual negative targets that have Number of correctly predicted negative
been predicted positive (False Positive = targets (True Negative = TN)
FP)
The observations are classed into positive and negative target categories:
Positive target (Predicted 1 and Actual 1): An observation that belongs to the population you want to target.
Negative target (Predicted 0 and Actual 0): An observation that does not belong to this target population.
The Confusion Matrix reports the number of false positive, false negative, true positive, and true negative targets. It is a good
estimator of the error that would occur when applying the predictive model on new data with similar characteristics.
By default, the Total Population is the number of records in the validation data source. This is a part of your training data source
that Smart Predict keeps separate from the training data, and uses to test the predictive model's performance.
The classi cation model allows you to sort the Total Population from the lowest to highest probability. To get the predicted
category, which is what you are interested in, you need to choose the threshold that determines who or what gets into that
category, and the others that don't make it. Sliding the threshold bar allows you to experiment with this number to see the
resulting Confusion Matrix for the population on which you want to apply your predictive model.
Contacted Population: You select the percentage of the population to target.
Detected Target: You select the percentage of positive targets you want to detect.
 Note
Refer to the section How is a Decision Made For a Classi cation Result? for information on how Smart Predict automatically
sets the threshold.
You can use the Confusion Matrix to do the following:
Get a detailed assessment of your predictive model's quality. This is because it takes into account a selected threshold
that transforms a range of probability scores into a predicted category. You can also use standard metrics such as
speci city. For more information, see the related link.
To estimate the expected pro t, based on costs and pro ts associated with the predicted positive and actual positive
targets. For more information, see the related link.
In some cases, assessing the predictive model quality based on the error matrix is more relevant than using metrics like the
classi cation rate.
 Example
In a business scenario where you want to detect fraudulent credit card transactions, the False Negative (FN) class can be a
better metric than the Classi cation rate. If your predictive model to detect the fraudulent transactions always predicts
"non-fraudulent", the Classi cation rate will be 99.9%.
The classi cation rate is excellent, but it isn’t a reliable metric to evaluate the real performance of your predictive model
because it gives misleading results. These results are usually due to an unbalanced data source, where there is a lot of
variation in the number of samples in different classes.
This performance issue will show up in the error matrix as a high False Negative (FN) class (number actual fraudulent
transactions detected as non-fraudulent by the predictive model)
6/30/2021
Related Information
How is a Decision Made For a Classi cation Result?
The Metrics
Example: Interpreting The Confusion Matrix
How is a Decision Made For a Classi cation Result?

A predictive model will predict a probability for a targeted value, but the decision as to whether or not the value belongs to one
category or another (0 or 1), is based on setting a threshold, so that anything above it is Yes, below is No. Smart Predict sets
this threshold automatically for you.
The automatically determined threshold is the point where you have the same % of positive observations for the applied data
source population as you do for the training data source population.
Above the threshold, observation is agged as positive.
Below the threshold, observation is agged as negative.
The Metrics
You can use the Confusion Matrix to compute metrics to associate with different needs.
Here's how to read the metrics.
Metrics De nitions Formula
Classi cation Rate Proportion of targets accurately classi ed (TP+TN)/N

by the preditive model when applied on the
validation data source.
Sensitivity Proportion of actual positive targets that TP/(TP+FN)

have been correctly predicted.
Speci city Proportion of actual negative targets that TN/(FP+TN)

have been correctly predicted.
Precision Proportion of predictive positive targets TP/(TP+FP)

that are actually positive targets.
F1 score Harmonic mean of Precision and Recall 2 / ((1/Precision) + (1/Sensitivity))

(Recall and Precision are evenly weighted).
Fall-out Proportion of negative targets that have FP/(FP+TN) or (100% - Speci city)
been incorrectly detected as positive.
De nition:
N = Number of observations
TP (True Positive) = Number of correctly predicted positive targets.
FN (False Negative) = Number of actual positive targets that have been predicted negative.
FP (False Positive) = Number of actual negative targets that have been predicted positive.
TN (True Negative) = Number of correctly predicted negative targets.
6/30/2021
Example: Interpreting The Confusion Matrix

This example of an Confusion Matrix is based on one speci c threshold and associated with a speci c percentage of the
population and a speci c percentage of attained positive target.
 Example
A company wants to do a marketing campaign. They would like to target the campaign to the customers that will answer
positively to the campaign and to avoid unecessary costs. They have built a model to classify the customers into two
categories:
Positive Targets (Predicted 1 and Actual 1): The customers will response positively to the campaign and need to be
contacted.
Negative Targets (Predicted 0 and Actual 0): The customers will response negatively to the campaign and don't need
to be contacted.
As a result the Confusion Matrix looks the following:
How to read the results?
By default, the application proposes to contact 24.1% of the population (see 1 on the graphic below).
 Note
The population is sorted in decreasing score order.
With this threshold:
You reach 68.4% of the actual positive cases (see 2).
 Note
Above this threshold, customers will not be targeted for marketing actions.
24.1% of the population (see 3) is considered as positive cases and is selected in the marketing campaign.
6/30/2021
The percentage of "True Positives" is 16.31% (see 4) whereas the percentage of actual positives is 23.86% (see
5).
What about the metrics? (see 6)
The Classi cation rate is 84.6%. This means that almost 85% of the customers will be correctly classi ed in the two
categories (answer positively/answer negatively to the campaign) when you will apply the validation data source.
The sensitivity is 68.35%. This means that almost 70% of customers that will answer positively to the campaign are
correctly predicted as positive targets. These customers will be selected for the campaign.
The Speci city is 89.77%. This means that almost 90% of customers that will answer negatively to the campaign are
correctly predicted as negative targets. These customers will not be contacted for the campaign.
The Precision is 67.67%. This means that almost 70% of customers predicted as customers that will answer positively
to the campaign are correctly classi ed. These customers will be part of the campaign.
The F1 Score is 0.68%.
The Fall-out is 10.23%. This means that almost 10% of the customers that will answer negatively to the campaign are
classi ed as positive targets and will be selected for the marketing campaign.
The Pro t Simulation

Associate a pro t/cost with the positive categories (observations that belong to the population you want to target) of the
Confusion Matrix.
You can visualize your pro t based on the selected threshold, or automatically select the threshold based on your pro t
parameters.
Set the threshold that determines which values are considered positive (see the relevant-related link) and provide the
following:
a Cost Per Predicted Positive: you de ne a cost per observations classi ed as positive by the Confusion Matrix. This
covers the costs both for True Positive Target (actual positive targets that have been predicted as positive) and False
Positive Target (actual negative targets that have been predicted positive).
a Pro t Per Actual Positive: you de ne a pro t per True Positive Target (targets correctly predicted as positive) identi ed
by the Confusion Matrix.
The Total Pro t table is updated accordingly to calculate your pro t/cost. You obtain an estimation of the gap between the gain
of the action based on a random selection (without any predictive model) and the gain based on the selection.
To see the threshold that will give you a maximum pro t for the pro t parameters you have set, click Maximize Pro t.
Example: Using the Pro t Simulation

This example of a Pro t Simulation is based on one speci c threshold and associated with a speci c percentage of the
population, a speci c percentage of attained positive target and a speci c cost.
Associate a cost/pro t
 Example
As an example to understand how the pro t simulation works, we will consider the same example as for the error matrix.
6/30/2021
In our Confusion Matrix example (see the relevant-related link for more information), we have decided on the following
threshold:
Contacted population (see 1 on the graphic below): 24.1% of the population.
Detected target (see 2): 68.4% of the actual positive cases.
The marketing department has estimated that the cost per contacted customers is 2€ and that the pro t per customers
that will really answer positively is 20€ (see 3).
The total pro t matrix is updated accordingly and displays the following results:
You obtain an estimation of the gap between the pro t of the action based on a random selection (without any predictive
model): 8, 314€ (see 4) and the pro t based on this selection: 34,634€ (see 5).
Using the Maximize Pro t
 Example
If with those unit cost/pro t (see 1 on the graphic below), you select the option Maximize Pro t, the matrix is updated as
follow:
6/30/2021
To maximize your pro t, the application recommends to target 50.5% of the population, not 24.1% (see 2). This would
represent 95.3% of the detected target (see 3). Using this proposed threshold, the pro t will be 44,114€ (see 4).
The Performance Curves

Evaluate the accuracy of your predictive model using the performance curves.
Use the Performance Curves tab to compare the performance of your predictive model to a random and a hypothetical perfect
predictive model.
For more information, see the related links.
Related Information
The Detected Target Curve
The Lift Curve
The Sensitivity Curve (ROC)
The Lorenz Curves
The Density Curves
The Detected Target Curve

Determine the percentage of the population to contact to reach a speci c percentage of the actual positive target.
The Detected Curve compares your predictive model to the ideal and random predictive models. It lets you determine the
percentage of the population to contact to reach a speci c percentage of the actual positive target.
 Example
A company wants to do a mailing campaign. They have built a predictive model to target to which customers to send the
campaign. The predictive model will classify the customers into two categories:
6/30/2021
Positive Targets: The customers will response to the campaign.
Negative Targets: The customers will not response to the campaign.
The predictive model debrief displays the following Detected Target curve:
You can determine that by selecting 30% of the total population:
With a random predictive model, you would reach 30% of the positive population (= population that will response to
the mailing).
With a perfect predictive model, you would reach 100% of the positive population (= population that will response to
the mailing).
With the Smart Predict predictive model (the validation curve), you would reach 78% of the positive population (=
population that will response to the mailing).
Related Information
In uencer Contributions
Debrie ng Classi cation Predictive Model Results
Target Statistics
The Lift Curve

Use the lift curve to see how much better your predictive model is than the random predictive model.
The lift is a measure or the effectiveness calculated as the ratio between the results obtained with and without a predictive
model. The lift curve evaluates predictive model performance in a portion of the population.
How to read the lift curve?

The X axis shows the percentage of the population and is ordered from highest probability to lowest probability.
The Y axis shows how much better your model is than the random predictive model.
Example of a Lift Curve

6/30/2021
 Example
A company wants to do a mailing campaign. They have built a predictive model to target to which customers to send the
campaign.
The predictive model will classify the customers into two categories:
Positive Targets: the customers will response to the campaign.
Negative Targets: the customers will not response to the campaign.
The predictive model debrief displays the following Lift Curve:
You can see that by selecting 20% of the total population:
You would reach 3.09 times more positive cases with your predictive model than with a random predictive model.
A perfect predictive model would reach 4.19 times more positive cases than the random predictive model.
The Sensitivity Curve (ROC)

See how your classi cation model handles the compromise between sensitivity and speci city.
This curve shows the True Positive rate against the False Positive rate as the detection threshold is varied:
The X Axis shows the [1-Speci city]. It represents the proportion of actual negative targets that have been predicted
positive (False Positive targets).
The Y Axis show the Sensitivity. It represents the proportion of actual positive targets that have been correctly
predicted (True Positive targets).
6/30/2021
Each point on the curve represents a Sensitivity/[1-Speci city] pair corresponding to a particular threshold, so the closer the
curve is to the upper left corner, the higher the overall accuracy of the predictive model.
 Example
Take the following Sensitivity Curve:
At 40% of False Positive targets (observations incorrectly assigned to the negative target) we see the following:
A random predictive model (that is, no predictive model) would classify 40% of the positive targets correctly as True
Positive.
A perfect predictive model would classify 100% of the positive targets as True Positive.
The predictive model created by Smart Predict (the validation curve) would classify 96% of the positive targets as
True Positive.
The Lorenz Curves

Check [1-Sensitivity] or Speci city against the population using the Lorenz Curve.
Using the selector, you can display the cumulative percentage for:
[1-Sensitivity], where Sensitivity is the proportion of the actual positive targets that have been correctly predicted,
Speci city, which is the proportion of actual negative targets that have been correctly predicted..
How to read the [1-Sensitivity] curve?

The [1-Sensitivity] curve also known as the "Lorenz Good" displays the cumulative proportion of false negative targets with
regard to the selected population threshold.
6/30/2021
The X Axis shows the percentage of the population ordered from the lowest to the highest probability. The Y Axis shows the [1-
Sensitivity], that is [1- the proportion of positive targets classi ed as True Positive]. This is equivalent to the proportion of the
missed positive targets.
The results are ordered from the lowest probability (on the left) to the highest probability (on the right).
How to read the Speci city curve?

The Speci city curve, also known as the "Lorenz Bad " curve, displays the cumulative proportion of actual negative targets that
have been correctly predicted with regard to the selected population threshold.
The X Axis shows the percentage of the population ordered from the lowest to the highest probability whereas the Y Axis shows
the Speci city.
Examples of Lorenz Curves

If we consider a banking predictive model to detect a risk to grant a credit:
The positive targets would represent the population with a high risk: this population should not be granted a credit.
The negative targets would represent the population with a low risk: This population could be granted a credit.
For the following examples, we will consider a threshold set at 80% of the population with the lowest probability that the
customers cannot reimburse the credit.
 Example
We got the following [1-Sensitivity] Lorenz Curve:
we can see that the following is true:
A "random predictive model" (that is no predictive model) would not identify 80% of the population with a high risk (=
population that should not be granted a credit).
6/30/2021
A perfect predictive model would not identify 17% of the population with a high risk (= population that should not be
granted a credit).
The predictive model created by Smart Predict (the validation curve) would not identify 40% of the population with a
high risk (= population that should not be granted a credit).
We got the following Speci city Lorenz Curve:
We can see that the following is true:
A random predictive model would classify 80% of the population with a low risk as True Negative (= population that
could be granted a credit).
A perfect predictive model would classify 100% of the population with a low risk as True Negative (= population that
could be granted a credit).
The predictive model created by Smart Predict (the validation curve) would classify 93% of the population with a low
risk as True Negative (= population that could be granted a credit).
The Density Curves
Understand how the positive targets and the negative targets are distributed in your predictive model.
The Density curves display the density function of the score (probability that an observation belongs to each class) for positive
and negative targets.
The estimated density function in an interval is equal to:
(Number of observations in the interval/Total number of observation)/ Length of the interval.
6/30/2021
The length of an interval is its upper bound minus its lower bound.
The X axis shows the score and the Y axis shows the density.
As a default view, a line chart is displayed with the following density curves:
The blue curve, Positives: This curve displays the distribution of population with positive target value per score value.
The yellow curve, Negatives: This curve displays the distribution of population with negative target value per score value.
As an example, check the density curves below. The rst example is a good model because there is a small overlapping zone
with low density. This means the predictive model is pretty good at separating the positive and negative cases. Whereas in the
second example, you see a large zone with high density for both positive and negative cases.
The fewer observations there are and the smaller the score interval for the overlap zone, the better it is.
 Example
A good predictive model:
 Example
A bad predictive model:
6/30/2021
Analyzing the Results of Your Regression Predictive Model

Once you've trained your regression predictive model, you can analyze its performance to make sure it's as accurate as
possible.
Use the dropdown list to access and analyze the reports on in uencers and predictive model performance.
What do the values of the two main How Which Which Can I see any What's next?
Clic performance indicators mean? does in uencers group of errors in my
k the have the categories predictive You have two
the target highest has the model ? Is my possibilities:
are
6/30/2021
a Quickly check if your predictive model is value impact on most predictive Your are
for accurate and robust, checking the two main appear the target? in uence model satis ed with
morperformance indicators: in the on the producing your predictive
e differe Check how target? accurate model's
info Root Mean Squared Error (RMSE) nt data the top ve predictions? performance.
rma measures the average difference Then you can
source in uencers In the
tion between values predicted by your impact on the In uencer Compare the use it:
s?
. predictive model and the actual values. target. Only Contribution prediction Generating
The smaller the RMSE value, the more the top ve s report, accuracy of your Your
Get
accurate the predictive model is. contributing analyze the predictive model Predictions.
some
Prediction Con dence indicates the descrip in uencers in uence of to a perfect

You would like
tive are displayed different predictive model
capacity of your predictive model to to see if you
statisti as a default. categories of using a graph and
achieve the same degree of accuracy can improve
an in uencer detect the
when you apply it to a new data source, cs on your predictive
For more on the predictive model
which has the same characteristics as the model's
information, target: errors very
the training data source. It takes a value target performance:
value refer to quickly.
between 0% and 100%. This value
In uencer If the Duplicat
should be as close as possible to 100%. per
Contributions. in ue For more e your
To improve your Prediction Con dence, data
source. nce information, refer current
you can add new rows to your data
value to Predicted vs. predicti
source, for example.
For is Actual. ve
more positi model
For more information, refer to Root Mean
informa ve, we and
Squared Error (RMSE) and Prediction
tion, are experim
Con dence.
refer to more ent with
Target likely updated
Statisti to get settings
cs. "mino . You
rity can
value" then
. compar
e the
If the
two
in ue
versions
nce
and nd
value
the best
is
one.
negat
See
ive we
Duplicat
are
ing a
less
Predicti
likely
ve
to get
Model.
"mino
rity Update
value" the
. settings
of your
The in uence predicti
of a category ve
can be model
positive or and
negative. retrain
it. See
6/30/2021
For more De ne
information, Setting
refer to s and
Category Train a
In uence, Classi c
Grouped ation or
Category Regress
In uence and ion
Grouped Predicti
Category ve
Statistics. Model.

Cau
tion
You
will
eras
e the
previ
ous
versi
on!
Delete
your
predicti
ve
model.
See
Deletin
ga
Predicti
ve
Model.
Predicted vs. Actual

Quickly identify the predictive model errors thanks to the Predicted vs. Actual chart.
This chart shows the accuracy of your predictive model. It displays the actual target value as a function of the prediction.
How does Smart Predict compute the graph?

During the training phase, predictions are calculated using the training data source.
To build the graph, Smart Predict groups these predictions on 20 segments (or bins). Each segment represents roughly 5% of
the population.
For each of these segments, some basic statistics are computed:
the mean of the predictions on each segment (Segment Mean)
6/30/2021
the mean of the actual target values (Target Mean)
the variance of this target within each segment (Target Variance)
How to read the chart?

By default, the following curves are displayed (but you can still customize the graph to t your needs):
The Validation - Actual curve shows the actual target values as a function of the predictions.
The hypothetical Perfect Model curve shows that all the predictions are equal to the actual values.
The Validation - Error Min and Validation - Error Max curves show the range for the actual target values.
For each curve, a dot on the graph corresponds to the segment mean on the X-Axis, and the target mean on th Y-axis.
The area between the Error Max and Error Min represents the possible deviation of your current predictive model: It's the
con dence interval around the predictions.
What can the chart tell you about your predictive model's accuracy?
You can draw three main conclusions from your Predicted vs. Actual chart depending on the relative positions of the curves on
the graph:
The Validation and Perfect Model curves don't match.
Your predictive model isn't accurate. Con rm this conclusion by checking the prediction con dence indicators. If the
indicators con rm your predictive model isn't reliable, you can improve its accuracy by adding more observations or
in uencers to your training data source.
The Validation and Perfect Model curves match closely.
Your predictive model is accurate. Con rm this conclusion by checking the predictive con dence indicators. If the
indicators con rm its reliability, you can trust your predictive model and use its predictions.
The Validation and Perfect Model curves match closely but diverge signi cantly on a segment.
Your predictive model is accurate, but its performance is hindered in the diverging segment. Con rm this conclusion by
checking the predictive con dence indicators. If the indicators con rm its overall reliability, you can improve that
segment's predictions by adding more observations or in uencers to it in your training data source.
If your Predicted vs. Actual chart is between any of these three cases, the prediction con dence indicators remains the best
way to assess your predictive model's accuracy.
 Example
You are working for an insurance company. You want to adapt client’s premium rates according to their age while accounting
for their risk of sudden death. You want to make sure the age tiering is accurate.
6/30/2021
The predictive model debrief displays the following Predicted vs. Actual graph:
In our example, when the prediction (in blue) is 45 years old, the actual value ("validation value" taken from our historical
data) is 44.75 years old. The error min and error max calculated by our predictive model are respectively 33.17 years old and
56.34 years old.
As you can see, the blue curve (our predictive model) and the green curve (the hypothetical perfect model) are very similar,
then it means that you can rely on the predictions.
Target Statistics
For continuous target, the Target Statistics give descriptive statistics for the target variable in each data source:
Name Means
Minimum Minimum value found in the data source for the target variable.
Maximum Maximum value found in the data source for the target variable.
Mean Mean of the target variable.
Standard Deviation Measure of the extent to which the target values are spread around
their average.
Analyzing the Results of Your Time Series Predictive Model

Once you've trained your Time Series predictive model, you can analyze its performance to make sure it's as accurate as
possible.
Analyze the reports to get information on your predictive model composition and evaluate your predictive model performance.
6/30/2021
Is the main performance What are How accurate is my What are the past data What's next?
Clic indicator high enough to the predictive model? that most in uences the
k consider my predictive predicted signal? You have two
the model robust and accurate? values Use the Forecast vs. possibilities:
are provided Actual graph to visualize Identify whether the signal
a Check the quality of your the predicted values is in uenced by the recent Your are satis ed
by the
for predictive model performance with your
predictive (predictive forecast) and past or far past in the case
mo over the Horizon-Wide MAPE. actual values (signal) for of an autoregressive predictive model's
model?
re the training data source. component. performance.
infoThe Horizon-Wide MAPE is the You can then quickly see Then you can use
Analyze
rm evaluation of the "error" that how accurate your The lags are numbered it: Saving
the
ati would be made if the forecast Predictive
predicted predictive model is, what with negative integers
on. was calculated in the past Forecasts
values for are the outliers, the zone representing their distance
where the actual values are of possible errors. in the past from the Generated by a
the
known. A Horizon-Wide MAPE of predictive forecast. Lag -1 is the point Time Series
zero indicates a perfect For more information, referin the past just before the Predictive Model
model over
predictive model. The lower the a set of to The Forecast vs. Actual forecast. Lag -5 is ve into a Dataset or
Horizon-Wide MAPE, the better known data Graph and The Signal points in the past. The Saving Predictive
your predictive model Outliers. higher the absolute value, Forecasts Back
from the
performance. the further the point is in into Your Planning
training
the past. Model.
data
For more information, refer to source. You would like to
Horizon-Wide MAPE. For more information, refer
see if you can
to Past Signal Value
Check if improve your
there are Contributions.
predictive model's
outliers in performance:
the
forecasts Duplicate
and detect your
anomalies current
on the predictive
signal. model and
experiment
For more with
informatio updated
n, refer to settings.
The You can
Predictive then
Forecasts, compare
6/30/2021
The Signal the two
Outliers versions
and The and nd
Signal the best
Anomalies. one. See
Duplicating
a
Predictive
Model.
Update the
settings of
your
predictive
model and
retrain it.
See De ne
Settings
and Train a
Time
Series
Predictive
Model.

Cautio
n
You will
erase
the
previou
s
version!
Delete
your
predictive
model. See
Deleting a
Predictive
Model.
Viewing Entities
If you choose to get predictive forecasts per entity, the reports for each entity are available in the Forecast and Explanation
tabs. If there are less than 20 entities, then these reports are available automatically following training. You select the column
values that appear together forming an entity, for example Product X, Store Y, from the top left dropdown list in both tabs to
view its report.
If a predictive model contains more than 20 entities, the reports for each entity are not available automatically following
training, but are accessed on demand. You just have to select the entity, and after a slight delay, the reports are created and
made available. This is to ensure that time isn't lost creating reports for predictive models with a high numbers of entities, when
not all of those reports may be required all at once. Once a report is available, you can then access it immediately any time
afterwards.
6/30/2021
The Signal Statistics

Get information on the signal.
The Signal Statistics is described in detail in the Forecast tab of your report. The Signal Statistics describe information on the
signal (target), the minimum, maximum, and average (mean) values, as well as the standard deviation measure.
 Note
If you choose to get predictive forecasts per entity, you have this information for each entity.
The Forecast vs. Actual Graph

Visualize a graphic representation of the signal and of the predictive model.
The Forecast vs. Actual graph appears in the Forecast tab of your report. The Forecast vs. Actual graph shows curves for the
predicted values (forecast) and actual values (signal) for the time series data source. You can then quickly see how accurate
your predictive model is. The predictions are displayed at end of the graph.
For each forecasted value, the predictive model shows an estimation of the minimum and maximum error. The area between
this upper and lower limit of the possible errors in the predictive forecasts produced by your predictive model, is called the
con dence interval. It's only displayed for the predictive forecasts.
Outliers are values marked with a red circle on the graph (see The Signal Outliers for more information). The forecasting error
indicator is the absolute difference between the actual and predicted values. This is also called the residue. The residue
abnormal threshold is set to 3 times the standard deviation of the residue values on an estimation (or validation) data source.
The forecasted value and error limit values are listed in the table for each predictive forecast.
 Note
 Note
You can display the data as a table. See Customizing the Visualization of Your Debrief.
Related Information
The Signal Outliers
The Predictive Forecasts
The Predicted Forecasts are described in the Forecast tab of your report.
The following details about the predictive forecasts are displayed:
Forecast: Predicted values for the predictive model.
Error min and max: Minimum and maximum deviation measures of the values around the predictive forecasts.
 Note
6/30/2021
The Signal Anomalies

The Signal Anomalies are described in the Forecast tab of your report.
Anomalies are signal values that are outside the zone of possible error for the predictive forecast, which is de ned by the upper
and lower limit.
The signal is compared to all predictive forecasts.
 Example
Your facilities department wants to monitor the electrical consumption of your building. The signal is very regular with
consumption peaks in the day time, low consumption in the night, and some seasonalities related to vacations, for example.
A predictive model based on this signal will forecast a very low consumption at 11:00 PM.
At 11:15pm the predictive model is re-forecasted and the actual consumption for 11:00 PM is known. It is very far from what
was expected by the predictive model: an anomaly is detected.
 Note
The Signal Outliers

The Signal Outliers are described in the Forecast tab of your report.
The default view (table) displays details about the outliers on the signal and the predictive forecasts. An actual signal value is
quali ed as outlier once its corresponding forecasting error is considered to be abnormal relative to the forecasting error mean
observed on the estimation data source. The forecasting error indicator is the absolute difference between the actual and
predicted values. This is also called the residue. The residue abnormal threshold is set to 3 times the standard deviation of the
residue values on an estimation (or validation) data source.
 Note
The Signal Decomposition

The Signal Decomposition is described in detail in the Explanation tab of your report. The elements in the report can be
different depending on the type of modeling technique used to build your predictive model.
 Note
We describe the modeling techniques used by Smart Predict in the two tables below. In the rst table, we describe the
breakdown modeling technique that may be applied when you're working with a time series with limited disruptions. In the
second table, we describe the smoothing technique that may be applied when you're working with a disrupted time series
that doesn't follow a regular trend or cycle.
1. When a time series breakdown technique is used to create your predictive model,
your report can contain the following information:
Information Description
6/30/2021
Textual explanation The textual information describes the modeling technique that is
used to calculate the forecast. This is textual explanation you see
in the report:
The predictive model was built by breaking down the time series
into different components.
Actual The Actual is the observed historical data.
Trend The Trend is the general orientation of the time series. The report
can show linear or quadratic trends.
Cycles A time series predictive scenario can detect xed length or

seasonal cycles. Fixed length cycles recur every N observations.
The recurrence of seasonal cycles is based on a calendar time unit
such as day, week, month etc. For seasonal cycles, the report
shows the recurrence of the cyclic pattern as well as the time
granularity that makes the cyclic pattern appear. The following
seasonal cycles can be detected:
a pattern recurring every year when observed on a half

monthly basis
a pattern recurring every year when observed on a monthly

basis
a pattern recurring every year when observed on a

semester basis
a pattern recurring every year when observed on a weekly

basis
a pattern recurring every quarter when observed on a

monthly basis
a pattern recurring every semester when observed on a

monthly basis
a pattern recurring every month when observed on a

weekly basis
a pattern recurring every year when observed on a daily

basis
a pattern recurring every month when observed on a daily

basis
a pattern recurring every week when observed on a daily

basis
a pattern recurring every hour when observed on a minute

basis
a pattern recurring every day when observed on an hourly

basis
a pattern recurring every minute when observed on a

second basis
6/30/2021
Fluctuations Fluctuations represent the part of the signal detected by the

predictive model that is dependent on past values of the signal.
The report shows the in uence of the last observations before the
predictive forecast. Fluctuations re ect changes that are not
detected at the trend and cycle level.
For example, the predictive model can detect that the previous 37
values have an impact on the actual values.
For more information you can refer the the chapter called Past
Signal Value Contributions.
Residuals Residuals refer to what is left when the trend, cycles, and
uctuations have been extracted from the initial time series.
Residuals are neither systematic nor predictable. They re ect the
part of the signal that Smart Predict can't explain or model. The
smaller the residuals, the better the predictive model. A good
predictive model produces residual data that contains no pattern.
2. When a time series smoothing technique is used to create your predictive model,
your report can contain the following information:
Textual explanation The textual explanation describes the modeling technique that is
used to calculate the forecast. This is textual explanation you see
in the report:
The predictive model was built incrementally by smoothing the
time series, with more weight given to recent observations.
Actual The Actual is the observed historical data.
Forecast The Forecast is the result of the prediction in the future.
Trend The Trend is the orientation of the forecast data. It is calculated

using an algorithm that applies an exponential smoothing on the
past data over time.
6/30/2021
Cycles A time series predictive scenario can detect seasonal cycles, with
or without amplitude variations. These cycles are calculated using
an algorithm that applies an exponential smoothing techniques on
the past data over time. The recurrence of seasonal cycles is
based on a calendar time unit such as day, week, month etc. The
report shows the recurrence of the cyclic pattern. The following
seasonal cycles can be detected:
a pattern recurring every semester
a pattern recurring every quarter
a pattern recurring every month
a pattern recurring every two weeks
a pattern recurring every week
a pattern recurring every day
a pattern recurring every hour
a pattern recurring every minute
a pattern recurring every second
Residuals Residuals refer to what is left when the trend and cycles have been
extracted from the initial time series. Residuals are neither
systematic nor predictable. The smaller the residuals, the better
the predictive model. A good predictive model produces residual
data that contains no pattern.
 Note
Related Information
Past Signal Value Contributions
Past Signal Value Contributions

The Past Signal Value Contributions identify the past observations that most in uence the forecast.
The Past Signal Value Contributions are described in the Fluctuations section of the Explanation tab of your report.
At the step of identifying the model components, Smart Predict found that previous values of the time series have an impact on
the actual values.
The Past Signal Value Contributions graph shows how the signal is in uenced by the recent past, or distant past in the case of
an autoregressive component.
The lags are numbered with negative integers representing their distance in the past from the predictive forecast. Lag -1 is the
point in the past just before the forecast. Lag -5 is ve points in the past.
 Example
Let's take the following example: we have created a predictive model to forecast the ozone rate for the next 12 months.
6/30/2021
We have obtained the following Past Signal Value Contributions graph:
Thanks to this graph, you can identify if the ozone rate is in uenced by observed values in the recent or distant past. It also
shows the most important dates. The lags are numbered with negative integers that represent how far back in the past they
are from the predictive forecasts. Smart Predict found that the 10 previous values have an impact on the subsequent values.
This is why the graph stops at 10. Using these lags, you can analyze how the previous values in uenced the subsequent ones.
Here you see that the lags -7, and -6 are very in uential.
 Note
Customizing the Visualization of Your Debrief

For each type of debrief, Smart Predict proposes a default overview. You can customize the generated debrief by selecting the
icon.
Context
To customize the generated debrief, use the Visualizations Settings dialog box. Depending on the metrics you want to display,
different options are available.
 Remember
The customized settings are stored for each type of predictive model and user. For example, if you customize a classi cation
debrief for a given predictive model, the debrief of all the other classi cation predictive models is updated with the same
change. However, the debrief settings for other users are not affected; their customizations are unchanged.
 Note
To return to the default visualization settings, click Reset.
Procedure
1. Click the icon next to the area you want to customize.
2. In the Data section, change the sorting/ranking options if required. These depend on the active metric.
6/30/2021
3. In the Display section, select another chart from the Type list. The visualization types available depend on what kind of
metrics you have.
 Note
For example, a scatter plot visualization is only available if the metrics include at least two measures.
4. In the Analysis section, select the information for the type of chart that you want to display.
According to the type of visualization you have chosen at step 2, you can assign data elds to the feeding area of your
visualization.
For example, if you select a column chart to display the metrics, you de ne which data eld to display in the X and Y axis.
Also, you can map the data series of the chart to different metrics using the One color per option.
 Note
When you change the type of visualization, the displayed information elds change accordingly. For example, if you
choose a bar chart, you can select the category in uencers to display on the X axis, the in uencer or measure for the
Y axis, and the colors for either the in uencer or measure values. You can also choose to display a single chart, or
separate charts for each iun uencer or measure. These display options would change if you chose a table, for which
you would only have columns to select.
5. In the Interactivity section, select an element that limits the chart display to data that interacts with only that element.
Depending on the type of chart, this could be a single in uencer or measure.
 Note
Some elements are mandatory because they are data elds that qualify the metrics. You must assign the mandatory
elements to the boxes that specify the chart elements. Otherwise, the missing data eld is automatically entered in
the Selectors box.
Related Information
Choosing the Right Chart for Your Debrief
Debrie ng Classi cation Predictive Model Results
Debrie ng Regression Predictive Model Results
Debrie ng Time Series Predictive Model Results
Choosing the Right Chart for Your Debrief
The type of chart available depends on whether it is appropriate for visualizing the type of in uencer analysis in your debrief.
Depending on how you are visualizing your model data, you can choose from the following chart types:
Chart type Use to Example
Pie Compare categorical data as percentages. If you have Show the predicted percentage breakdown of contributing voting
more than 10 contributing in uencers, then using a bar regions to the results of a national election.
or column chart would show a clearer view of the data
spread.
Bar Compare categorical data along the vertical axis by You have an employee turnover predictive model that predicts
the category count or percentage on the horizontal axis potential churn level for staff. Your target variable is Employee
displayed as bars. Churn Estimate. The in uencers Marital-status, Age, Quali cation
Level, Salary, Recent Promotion, and Training Participation are
Column Show the same information as a bar chart, with the plotted along one axis, and the percentage contribution of each
axes interchanged: inlfuencers along the horizontal category to Churn Estimate is plotted as a bar or column on the
axis by the group count, or percentage values on the other.
vertical axis displayed as columns. For a table, the columns would be the category in uencers
Marital-status, Age, Quali cation Level, Salary, Recent Promotion,
and Training Participation, and the percentage contribution of
6/30/2021
each category to Employee Churn Estimate appears in each cell
Chart type Use to Example
row.
Table Represent the same type of information as column
and bar charts, but in a table format where the
categorical inlfuencers are represented as columns,
with the count or percentage values in row cells.
Radar Display data for multiple in uencers in two You have a predictive model to predict sales of candy. Your target
dimensions with multiple categories represented on variable is Chocolate Sales, and you plot different chocolate
radial axes. avors around the radial axes of a bubble chart. Your categorical
in uencer that is measured over the axes is three brands of
chocolate. The spread of sales gures around the axes would give
a good idea of which different brands would do better for the
same avor than others.
Tag Cloud Represent category in uencer names as text A retail chain selling multimedia and cultural products wants to
juxtaposed geographically on a canvas, where the font venture into publishing to produce a compilation of "retro" styled
size of each text label indicates the in uence on the detective stories. The target audience is younger readers not
target variable. Tag charts are useful when the familiar with traditional detective characters. They develop a
in uencer names have semantic signi cance, for predictivve model including in uencers such as education level,
example keywords in a twitter feed, country names, age, buying history for DVDs, books, games, MP3s and streaming
business companies' stock market values, or different video, to predict a possible taste for different detective pro les.
television shows' audience ratings for a night's The results could be easily represented as a tag cloud with the
viewing. names most likely or not to appeal, for example Sherlock Holmes,
Father Brown, Miss Marple, Hercule Poirot, Auguste Dupin, Philip
Marlowe, and others.
Line Show a model performance curve. You want to see the performance curve of your training predictive
model compared to the validation and random plots for a
predictive model that predicts what percentage of a population
are identi ed positively as having a disease, after being tested
using a new screening test.
Bubble See the correlation between two in uencers, one You have a predictive model to predict fatal car accidents. Using
dependent on the other. The correlation is represented a bubble chart, you could evaluate dependency between
by third in uencer at the plot position and the area of in uencers such as "Car Accident Frequency" and "Speed", with a
the plot shows the magnitude of the relationship. categorical in uencer of Yes or No for Fatality.
Generating Your Predictions
You've assessed the performance of your predictive model and you're con dent using it to generate predictions.
To generate the predictions, the process can differ following the type of predictive model and the type of data source used.
6/30/2021
Generating and Saving the Predictions for a Classi cation or Regression Predictive
Model
Context
You want to generate and save the predictions for a predictive model of type classi cation or regression.
Procedure
1. Open the relevant predictive model.
2. Click Apply Predictive Model .
The Apply Predictive Model window opens.
3. In the Apply To Population section, select the application you want to apply your predictive model on. Don't forget that
this dataset must be prepared beforehand, it cannot be created at this step.
4. In the Generated Dataset section, you select the additional columns you want to have in your generated dataset:
Replicated Column: select which columns from the training data source that should replicated in the generated
dataset.
 Restriction
If your application dataset contains more columns than your training dataset, the additional columns will be
ignrored by the application process.
Statistics & Predictions: This is information about your predictive model that you want to have in the generated
dataset.
Information Description Comments
Apply Date It's the start date of the predictive model application. The type of the column is TIMESTAMP.
Train Date It's the start date of the predictive model training. The type of the column is TIMESTAMP.
6/30/2021
Statistics: select the statistics regarding the in uencers you want to save in your dataset:
Statistic Description
Assigned Bin When selected, individuals in the application population are assigned to referring
quantiles de ned on the validation population.
Assigned bins explained: The validation population during training is spread out in
quantiles (bins), each de ned by a range of scores, to serve as references (assigned
bins) to an application population. When a predictive model is applied, each individual
in the application population is allocated to an assigned bin based on its predicted
score. As each assigned bin represents 10% of the training population, if the
population structure is unchanged, this % value should remain stable on the
application population. If this is not the case, it doesn’t mean that the predictive
model is no longer accurate, rather that the structure of the population has changed.
For example there are more or less potential churners now, than in the past. The
accuracy of the predictions should be monitored to back up the decisions.
 Note
The number of bin is set to 10 and isn't customizable.
See the section How does Smart Predict Create Assigned Bins? for information on
using assigned bins.
Outlier Indicator For each row in the application dataset, the Outlier Indicator is 1 if the row is an outlier
with respect to the target, otherwise 0.
An observation is considered an outlier when the prediction error is greater than 3

times the average prediction error found on similar observations.
Predictions: select the predictions to include in the output table:
Prediction Description
Predicted Category For each row in the application dataset, the Predicted Category is the target category
determined by the predictive model.
Classi cation predictive models
(nominal target with 2 values The percentage of predicted target categories found in the application dataset
only) corresponds to the Contacted Population percentage that is set by default when
entering the Confusion Matrix.
Any change done by the user in the Confusion Matrix does not affect the Predicted
Category in the generated dataset.
An alternate way could be to generate the Prediction Probability (instead of the

Predicted Category) and set a decision threshold (see How is a Decision Made For a
Classi cation Result?) on the value of the probability based on the business
requirements.
Prediction Probability For each row in the application dataset, the Prediction Probability is the probability
that the Predicted Category is the target value.
Classi cation predictive models
(nominal target with 2 values
only)
Predicted Value For each row in the application dataset, the Predicted Value is the value predicted for
the target.
Regression predictive models
(continuous target)
 Note
6/30/2021
If you do not select any statistics or predictions, only the target and the key in uencer(s) are included.
Output as: Give a name to your generated dataset.
5. Click Apply.
The status of your predictive model is updated to <Applied> and you can nd your generated dataset with the
predictions under Main Menu Browse Files . You can then access to your results directly by opening the generated
dataset or depending on your business needs, consume the output dataset in a BI story.
How does Smart Predict Create Assigned Bins?
While applying a classi cation or regression predictive model to an application dataset, you can require to get the statistics
information on Assigned bins. But what are Assigned bins and how should they be leveraged?
During the training step, Smart Predict uses past observations compiled in a training dataset to create a predictive model.
For a classi cation predictive model: Smart Predict associates to each observation (customer, product, etc…) a
probability that an event (target) occurs. Then, it uses this probability to group the list of observations, ranged in
decreasing order from the most probable to the least probable in 10 bins (or groups). Each bin represents 10% of those
observations and in each bin, the observations have the same level of probability.
For a regression predictive model: Smart Predict associates to each observation a predicted value. Based on this value,
it groups the list of observations ranged from the highest to the lowest predicted value in 10 bins (or groups). Each bin
represents 10% of those observations and in each bin, the observations have the same value or range of values.
During the application step, Smart Predict refers to the bins de ned in the training step to assign the current observations
from the application dataset to the relevant bin. It compares each value obtained by the predictive model with the limits of each
Assigned bin de ned in the training step. Then it assigns each observation to the relevant bin.
Why use Assigned Bins in a Classi cation Predictive Model?
 Example
Let's take the following example: you want to know if customers will buy your new product "P". You train your predictive
model using a training dataset containing past observed observations for 1,000 customers. As a result, Smart Predict has
ranged your observations as follows:
Bin number Number of customers in the bin Average probability to buy "P"
1 100 customers (= 10% of the dataset) 20%
6/30/2021
Then, you use your predictive model to get predictions on a new set of customers. Let's say your application dataset
contains observations on 700 customers.
Smart Predict will give you the following result in the generated dataset:
Bin number Number of customers in the bin Estimation of the probability to buy "P"
1 200 customers (~ 29% of the dataset) 20%
2 100 customers (~ 14%) 18%
3 43 customers (~ 6%) 15%
4 27 customers (~ 4%) 13%
5 80 customers (~ 11%) 11%
6 45 customers (~ 6%) 8%
7 50 customers (~ 7%) 7%
8 35 customers (~ 5%) 4%
9 32 customers (~ 5%) 3%
10 88 customers (~13%) 1%
You can use Assigned Bins for two purposes:
Use for assigned bins Description Example
Simulating/Estimating the number of At the training step, Smart Predict has

 Example
positives cases. assigned each observation to a bin (one bin
equals 10% of the dataset), which Let's have a look at our example above.
corresponds to a probability to be a positive
At the training step, you know the actual
case.
number of positive targets by bins as you
Smart Predict associates to each customer train your predictive model on known data.
his/her probability to buy the product P and
At the application step, you don't know
check if this probability makes the customer
that. But once the predictive model is
belongs to bin 1, 2, 3, etc. by referring to the
applied, you know for each customer of the
bins de ned in the training step. As each bin
application dataset to which bin it belongs
is associated to an average percentage of
to. You can therefore estimate the total of
positive cases, you can easily estimate the
customers who would buy "P".
number of positive cases.
 Note
It can happen that the distribution of the
observations is not similar (10% of
observations in each bin). It's not because
the structure of the population has
changed that the predictive model is not
relevant anymore (see next point).
6/30/2021
Use for assigned bins Description Example
Monitoring the population structure Dividing the dataset into bins means that
 Example
each bin should contain +/-10% of the
observations. However, if this changes, then it Having a look back at the example above,
indicates that your population is changing. For you can see that the distribution per bin in
example, there could be an effect that the generated dataset is not similar as in
advertising on social media sites might the training dataset. For example, for bin 1,
in uence and attract more young customers, we have 200 customers, which correspond
rather than other age groups. It doesn't mean to 28% of the dataset. It could simply be
that the predictive model is not efficient because you have more young customers,
anymore. But it may be an alert to check this but with the same buying behaviour as
performance with more data from the recent young customers in the training population.
past (than the ones used to train the model).
Monitoring the predictive model Once the predictive model has been applied,
performance it is easier to analyze the classi cation
performance by bins, rather than interpreting
the performance curve. Use the classi cation
rate (see The Metrics) calculated at the
training step associated with each bin, and
detect any variation of this rate when applying
your predictive model.
Why use Assigned Bins in a Regression Predictive Model?
 Example
In the following example, you want to predict the deal values for the next quarter. Your training dataset contains
observations on 3,000 customers.
As a result, Smart Predict has ranged your observations as follows:
Bin Number Number of customers per bin Description
1 300 customers (= 10% of the dataset) Predicted values between 90,001 and
100,000 $
90,000 $
80,000 $
70,000 $
60,000 $
50,000 $
40,000 $
30,000 $
20,000 $
6/30/2021
Bin Number Number of customers per bin Description
10 300 customers (= 10% of the dataset) Predicted values between 0 and 10,000 $
Then, you use your predictive model to get predictions on a new set of customers. Let's say your application dataset
contains observations on 800 customers.
Smart Predict will give you the following result in the generated dataset:
Bin number Number of customers per bin Description
1 110 customers (~ 14% of the dataset) Predicted values between 90,001 and
100,000 $
90,000 $

80,000 $

70,000 $

60,000 $

50,000 $

40,000 $
8 60 customers (~ 7.5% of the dataset) Predicted values between 20,001 and

30,000 $
9 60 customers (~ 7.5% of the dataset) Predicted values between 10,001 and

20,000 $
10 40 customers (~ 5% of the dataset) Predicted values between 0 and 10,000 $
You can use Assigned Bins to monitor the population structure: As each bin should contain +/-10% of the observations, if these
gures increase or decrease for one or several bins, it indicates that your population is changing and you might need to retrain
your predictive model with more recent data. For example, having a look back at the example above, you can see that the
distribution per bins is quite similar in the generated dataset as in the training dataset. However, we could have different
results. For example, for bin 1, we could have 300 customers, which correspond to 37.5% of the dataset.
Saving Predictive Forecasts Generated by a Time Series Predictive Model into a

Dataset
You've assessed the performance of your predictive model and you're con dent saving the predictive forecasts into a dataset.
Context
To save the predictive forecast results into a dataset:
Procedure
6/30/2021
1. Open the relevant predictive model.
2. Under the icon bar, click the Save Forecasts icon.
3. Under Save As give a name to your generated dataset.
4. Click Save.
In the predictive model list, the status of your predictive model is updated to <Applied>. You can nd your generated
dataset with the forecasts under Main Menu Browse Files .
Here are the columns that are added to your generated dataset:
Column Name Description
Forecast This is the column where you nd the forecast values for the
signal based on the number of requested forecasts speci ed in
the predictive model settings.
Error Min For each requested forecast at a given horizon H, the predictive
model calculates a con dence interval. The Error Min value is
the lower bound of this con dence interval. It is equal to the
forecasted value – sigma(RMSE)*1.96, where sigma(RMSE)
represents the standard deviation of RMSE between the actual
and forecasted signal value at horizon H. The weighted value of
1.96 corresponds to a con dence level of 95%.
Error Max For each requested forecast at a given horizon H, the predictive
model calculates a con dence interval. The Error Max value is
the upper bound of this con dence interval. It is equal to the
forecasted value + sigma(RMSE)*1.96, where sigma(RMSE)
represents the standard deviation of RMSE between the actual
and forecasted signal value at horizon H. The weighted value of
1.96 corresponds to a con dence level of 95%.
Saving Predictive Forecasts Back into Your Planning Model

You've assessed the performance of your predictive model and you're con dent using the predictive forecasts.You want to
integrate the predictive forecasts into your planning model.
To save the predictive forecasts back into your planning model:
1. Select the (Save Forecasts) button from within your report.
2. Specify the Private Version to which you want to save the generated forecasts.
 Tip
Private versions are only initially visible to the person who created them.
You can make them visible to other users by sharing the private versions. Shared versions are private versions
that you allow other people to see.
3. To only save forecasts for future periods, select Save.
4. To also save forecasts for past periods, select Advanced Settings Save Forecasts for Past Period , and change the
default setting from Off to On.
 Note
6/30/2021
When the default setting of Save Forecasts for Past Period is set to Off, only forecasts for future periods are saved to
the private version of your input planning model, in the measure that was selected as Signal. When the default
setting is set to On, it allows you to save forecasts for a past period to the private version of your planning model. This
means you can assess the performance of your predictive forecast by using all the visual and modeling powers right
there in your story, to compare the difference between your predictive forecast and the actuals, plans or budget.
Related Information
About Version Management
Using the Generated Dataset in SAP Analytics Cloud
Once the predictions are generated, a new dataset is created. You can augment your SAP Analytics Cloud stories or models
using the data available in this generated dataset. The process might differs depending on the type of dataset (acquired or live)
you used to generate this dataset and the insights you want to reuse in SAP Analytics Cloud.
Related Information
Using Your Acquired Dataset Generated by a Classi cation or Regression Predictive Scenario
Using Your Acquired Dataset Generated by a Time Series Predictive Scenario
Using Your Acquired Generated Dataset in a Story
Using Your Live Generated Dataset in an SAP Analytics Cloud Model
Using Your Acquired Generated Dataset in a Story
You can use your acquired generated dataset either directly in a story, or in a story via an SAP Analytics Cloud model.
Depending on the type of predictive model, you need to keep different elements in your generated dataset to consume it.
Consuming the predictions stored in an acquired dataset

The generated dataset containing your predictions can be consumed in a story using some features available in SAP Analytics
Cloud.
 Note
If you want to consume your generated dataset in a story or via an SAP Analytics Cloud model in SAP Analytics Cloud, only
the rst 100 columns will be taken into account.
6/30/2021
 Note
You need to keep speci c information in your generated dataset in order to consume it in a story. The type of information you
need, depends on the predictive scenario type. For more information, refer to the related links.
You can consume your generated dataset:
Directly in a story:
 Note
When you upload the generated dataset in a story, it implicitly becomes an "embedded" model. For more information,
see Models in Stories.
If you update the generated dataset later on, the story will be updated as well.
6/30/2021
In a story via a SAP Analytics Cloud model:
 Note
The SAP Analytics Cloud model can be shared with other users.
If you update the generated dataset later on, the SAP Analytic Cloud model will not be updated.
Related Information
Using Your Acquired Dataset Generated by a Classi cation or Regression Predictive Scenario
Using Your Acquired Dataset Generated by a Classi cation or Regression Predictive

Scenario
Depending on how you want to use your Classi cation or Regression Predictive Scenario, you need to keep different information
in your generated dataset containing predictions.
Consuming your generated dataset in a story

To consume your generated dataset in a story, you need to keep some information:
The application dataset in uencers, if these in uencers are not available from another source.
The predictions
The key in uencer (s)
 Note
If you have speci ed the key in uencer(s) during the training of your predictive model, it is automatically added.
You can then consume the generated dataset directly in a story:
6/30/2021
Go to Files Create Story , then Access & Explore Data. Select Data acquired from an existing dataset or model and browse up
to your generated dataset. As soon as a change will be done in the generated dataset, the story will be automatically updated.
 Note
When you upload the generated dataset in a story, it implicitly becomes an "embedded" model.
For more information, refer to the related link Blending Data .
Combining existing data with the generated predictions and consuming them in a
story
To combine existing data with your predictions, you need to keep the following information in your generated dataset:
The predictions
The key in uencer(s)
 Note
If you have speci ed the key in uencer(s) during the training of your predictive model, it is automatically added.
Thanks to the key in uencer(s), you can blend the predictions with other data sources, in the context of a story.
For more information, refer to the related link Blending Data .
Importing the generated dataset in SAP Analytics Cloud models

You can rst use the generated dataset to create a SAC model and then consume it in a story.
 Note
You will then be able to easily share this model with other users.
For more information, refer to the related link Creating a New Model and Model preferences.
To do so:
1. Create a model (from the main menu Create Model ).
2. Select Get data from a data source.
3. From the proposed list, select Dataset and browse up to your generated dataset.
Related Information
Creating a New Story
Creating a New Model
Blending Data
Depending on how you want to use your Times Series Predictive Scenario, you need to keep different information in your
generated dataset containing the predictive forecasts.
6/30/2021
Consuming your generated dataset with forecasts in a story

To consume your generated dataset in a story, you need to keep at least the predictive forecasts.
 Note
The date and signal are selected automatically.
You can then consume the generated dataset directly in a story:
Go to Files Create Story , then Access & Explore Data. Select Data acquired from an existing dataset or model and browse up
to your generated dataset. As soon as a change will be done in the generated dataset, the story will be automatically updated.
 Note
When you upload the generated dataset in a story, it implicitly becomes an "embedded" model.
For more information, refer to the related link Creating a new story.
Combining existing data with the generated predictions and consuming them in a
story
To combine existing data with your predictions, you need to keep the forecasts in your generated dataset.
 Note
The date and signal are selected automatically.
Thanks to the date variable, you can blend the predictions with other data sources, in the context of a story.
For more information, refer to Blending Data .
Importing the generated dataset containing the forecast in SAP Analytics Cloud
models
You can rst use the generated dataset to create a model and then consume it in a story.
To do so:
1. Create a model (from the main menu Create Model ).
2. Select Get data from a data source.
3. From the proposed list, select Dataset and browse up to your generated dataset.
 Note
You will then be able to easily share this model with other users.
For more information, refer to the related link Creating a New Model and Model preferences.
Related Information
Creating a New Story
6/30/2021
Creating a New Model
Blending Data
Using Your Live Generated Dataset in an SAP Analytics Cloud Model

You have generated the predictions in a live dataset and you now want to use the predictions in SAP Analytics Cloud.
Context
Live datasets cannot be consumed in SAP Analytics Cloud as they are. To be able to use your predictions, you need the help of
an IT administrator and to go through some additional steps.
Here is an overview of the full process you have to follow while working with live datasets (see related links for more
information):
 Caution
As of Google Chrome version 80, you need to con gure your SAP on-premise data source to issue cookies with
SameSite=None; Secure attributes. If the SameSite attribute is not set, cookies issued by your SAP data source
system will no longer work with SAP Analytics Cloud. Refer to SameSite Cookie Con guration for Live Data Connections for
more information.
 Restriction
Your IT administrator user has to create calculation views before you can consume the generated tables containing your
predictions. For more information, see Creating Calculation Views to Consume Live Output Datasets.
Procedure
1. In the main menu of SAP Analytics Cloud, select Create Model .
6/30/2021
2. Select Get data from a data source
3. Click on Live Data Connection under Connect to live data area.
4. In the Create Model From Live Data Connection window, enter the following information:
System Type: SAP HANA
Connection: Select the previously created data connection (to this SAP HANA system)
Data Source: Enter rst 3 letters of the calculation view created by your SAP HANA technical user.
You can then create a story using this model as a source.
Related Information
Creating Calculation Views to Consume Live Output Datasets
Setting up Live SAP HANA Data Access for Smart Predict
Publishing a Predictive Model to a PAi Application
Context
You can publish a predictive model to a Predictive Analytics Integrator (PAi) application as a new predictive scenario, or as a
predictive model within a scenario.
 Note
You can nd more information Predictive Analytics Integrator at https://help.sap.com/pai.
 Restriction
You can only publish classi cation and regression predictive models trained with an acquired dataset to a PAi Application. It
is not possible to publish predictive models trained with a live dataset.
Before you can publish to PAi, check the following pre-requisites with your SAP Analytics Cloud administrator:
The PAi connection is available in SAP Analytics Cloud.
You have the application role of Predictive Admin assigned to your user pro le.
Procedure
1. From the main menu, select Browse Predictive Scenarios .
2. Click a predictive scenario in the list.
The debrief page for the predictive model appears showing its statistics and KPIs.
3. Expand the Predictive Models pane at the bottom of the page.
The trained predictive models for the predictive scenario are listed.
 Note
Only a trained predictive model can be published to PAi. It is published to the PAi application as a model speci cation
either as a new predictive scenario, or as a model within a predictive scenario.
4. Click the predictive model version that you want to publish to your PAi application.
6/30/2021
5. Click the Publish icon in the toolbar menu for the page.
The Publish to PAi Application box opens.
6. Depending on whether you want to publish the predictive model as a new predictive scenario or a model in a predictive
scenario, enter information for the following elds:
Field Actions
PAi Connection Select a PAi connection. If your connection doesn't appear in the list, see the SAP Analytics
Cloud administrator to make it available. Creating a PAi connection in SAP Analytics Cloud is
described here: SAP Analytics Cloud Connection to PAi for SAP S/4HANA Cloud
PAi Predictive Scenario Browse to the relevant catalog, and either select a predictive scenario to receive the model, or
select New Predictive Scenario to create a new one.
New Predictive Scenario

Click to create a new predictive scenario with the published predictive model.
Add a predictive scenario name. The name must comply with Namespace rules in
S/4HANA:
S/4HANA Cloud: YY1_
S/4HANA on Premise: Begins with Z or Y
The name of a Predictive Scenario must be all uppercase, and have a maximum of 20
characters.
Add a description to your scenario.
Model: Name Name the predictive model.
Model: Description Add a description.
7. Click Publish.
The publishing progress is indicated by messages in the Status column for the predictive model. The status messages
are described here: Predictive Model Publishing Status Messages.
You can refer to this blog for more information.
Predictive Model Publishing Status Messages
When you publish a predictive model to a PAi application, the following status levels can be displayed that indicate the
publishing progress:
Status Description
Publish Pending Publishing hasn't started, but it is in a queue, and will start as soon as possible.
Publishing Publishing is in progress.
Published The predictive model has been published successfully without warnings or errors.
Publish Failed Publishing couldn't be completed. Click the status icon for more information.
Published with The predictive model has been published, but with one or more issues. Click the status icon for more
Warning information.
Smart Predict Quick Start Guide
6/30/2021
Short tour guide around Smart Predict with the Help topics accessible from the application interface.
These topics are accessed directly from the Smart Predict application. We've grouped them together as sort of a short tour
guide around the application interface. You'll be able to create a predictive scenario, as well as add and train a new predictive
model without needing to have a background in the predictive analytics eld. However, to get the most out of your new
predictive scenarios, we suggest that you take a bit of time to browse about the more in depth topics available in this guide as
well.

AugmentedAnalytics B

Uploaded by

Copyright:

Available Formats

AugmentedAnalytics B

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AugmentedAnalytics B

Uploaded by

Copyright:

Available Formats

6/30/2021

SAP Analytics Cloud Help

SAP Analytics Cloud | Q2 2021 (2021.7)

Original content: https://help.sap.com/viewer/00f68c2e08b941f081002fd3691d86a7/release/en-US

For more information, please visit the https://help.sap.com/viewer/disclaimer.

Smart Predict – Using Predictive Scenarios

The following restrictions currently apply to Smart Predict.

General Restrictions While Using Smart Predict

Restriction on Information on Restrictions

Restriction on Information on Restrictions

SAP Business ByDesign Analytics

SAP Cloud for Customer

SAP Cloud for Customer Analytics

SAP Integrated Business Planning

SAP Cloud Platform Open Connectors

Restriction on Information on Restrictions

Dataset - Storage formats

Date and Date & Time, in the following formats:

Numbers (any number with decimal point)

Integers (any number without decimal point)

UTF-8 encoding is supported.

Time Series Predictive Scenario

Restriction on Information on Restrictions

Year expressed as YYYY-01-01 where YYYY is variable (moving year).

Quarter or Month expressed as YYYY-MM-01 where YYYY-MM is variable

Day (calendar dates) expressed as YYYY-MM-DD where YYYY-MM-DD is

Number of forecasts (independent of the number of entities): 500 maximum

Number of entities: 1000 maximum

For a granularity of day, use 3 years of historical data

For a granularity of hour, use 3 months of historical data

Restriction on Information on Restrictions

Columns identi ed as having the data type <Time>.

Columns with the data type <Textual>.

For more information, see Variables in Smart Predict

Restrictions Using Live Dataset With Smart Predict

SAP HANA 2.0 SPS00, SPS01, and SPS02

SAP HANA 2.0 SPS03 and upwards

For more information, see Datetime Data Type.

Restrictions Using Planning Model as Data Source for Smart Predict

Restrictions on Information on Restrictions

Input planning models

Version: You can use public or private versions. The input

For classic account models, there are speci cities while

Prompt: Planning models with prompts are supported. If

Restrictions on Information on Restrictions

Entities (crossing of multiple dimensions)

The maximum number of entities supported is 1000.

Attributes and Hierarchies:

You can select the attributes (custom properties)

You can select system properties like the currency

The levels can be selected indirectly by selecting

One entity can combine one or multiple dimension

Restrictions on Information on Restrictions

Signal: a valid signal is a numeric value that is data

Smart Predict doesn't support calculated measures

Time granularity: indicates the time granularity

Date: the date dimension used to create the time

Dimensions or Attributes: you can combine

Number of Forecasts: number of forecast values you want

Restrictions on Information on Restrictions