Driverless A I Booklet
Driverless A I Booklet
Driverless A I Booklet
http://docs.h2o.ai
3 Key Features 8
4 Supported Algorithms 10
6 Launching Driverless AI 12
6.1 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8 Running an Experiment 23
8.1 Before You Begin . . . . . . . . . . . . . . . . . . . . . . . . 23
8.2 New Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3 Completed Experiment . . . . . . . . . . . . . . . . . . . . . 27
8.3.1 Experiment Summary . . . . . . . . . . . . . . . . . . 28
8.4 Viewing Experiments . . . . . . . . . . . . . . . . . . . . . . 31
8.4.1 Checkpointing, Rerunning, and Retraining . . . . . . . 32
8.4.2 Deleting Experiments . . . . . . . . . . . . . . . . . . 34
9 Diagnosing a Model 34
10 Project Workspace 36
10.1 Linking Datasets . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.1.1 Selecting Datasets . . . . . . . . . . . . . . . . . . . . 37
10.2 Linking Experiments . . . . . . . . . . . . . . . . . . . . . . . 38
10.2.1 New Experiments . . . . . . . . . . . . . . . . . . . . 38
10.2.2 Checkpointing Experiments . . . . . . . . . . . . . . . 39
4 | CONTENTS
11 Interpreting a Model 43
11.1 Interpret this Model button - Regular Experiments . . . . . . . 44
11.2 Interpret this Model button - Time-Series Experiments . . . . 45
11.2.1 Multi-Group Time Series MLI . . . . . . . . . . . . . . 45
11.2.2 Single Time Series MLI . . . . . . . . . . . . . . . . . 47
11.3 Model Interpretation - Driverless AI Models . . . . . . . . . . 49
11.4 Model Interpretation - External Models . . . . . . . . . . . . . 52
11.5 Understanding the Model Interpretation Page . . . . . . . . . 54
11.5.1 Summary Page . . . . . . . . . . . . . . . . . . . . . 56
11.5.2 DAI Model Dropdown . . . . . . . . . . . . . . . . . . 56
11.5.3 Surrogate Models Dropdown . . . . . . . . . . . . . . 61
11.5.4 Random Forest Dropdown . . . . . . . . . . . . . . . 66
11.5.5 Dashboard Page . . . . . . . . . . . . . . . . . . . . . 68
11.6 General Considerations . . . . . . . . . . . . . . . . . . . . . 69
11.6.1 Machine Learning and Approximate Explanations . . . 69
11.6.2 The Multiplicity of Good Models in Machine Learning . 70
11.6.3 Expectations for Consistency Between Explanatory Tech-
niques . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12 Viewing Explanations 71
16 Deployment 101
16.1 Additional Resources . . . . . . . . . . . . . . . . . . . . . . 101
16.2 Deployments Overview Page . . . . . . . . . . . . . . . . . . 101
16.3 AWS Lambda Deployment . . . . . . . . . . . . . . . . . . . 102
16.3.1 Driverless AI Prerequisites . . . . . . . . . . . . . . . 102
16.3.2 AWS Access Permissions Prerequisites . . . . . . . . . 102
16.3.3 Deploying the Lambda . . . . . . . . . . . . . . . . . 103
16.3.4 Testing the Lambda Deployment . . . . . . . . . . . . 104
16.3.5 AWS Deployment Issues . . . . . . . . . . . . . . . . . 106
16.4 REST Server Deployment . . . . . . . . . . . . . . . . . . . . 106
16.4.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . 106
16.4.2 Deploying on REST Server . . . . . . . . . . . . . . . 107
16.4.3 Testing the REST Server Deployment . . . . . . . . . 108
16.4.4 REST Server Deployment Issues . . . . . . . . . . . . 110
18 Logs 114
18.1 Sending Logs to H2O . . . . . . . . . . . . . . . . . . . . . . 117
19 References 118
20 Authors 119
6 | Overview
1 Overview
H2O Driverless AI is an artificial intelligence (AI) platform for automatic
machine learning. Driverless AI automates some of the most difficult data
science and machine learning workflows such as feature engineering, model
validation, model tuning, model selection and model deployment. It aims to
achieve highest predictive accuracy, comparable to expert data scientists, but in
much shorter time thanks to end-to-end automation. Driverless AI also offers
automatic visualizations and machine learning interpretability (MLI). Especially
in regulated industries, model transparency and explanation are just as important
as predictive performance. Modeling pipelines (feature engineering and models)
are exported (in full fidelity, without approximations) both as Python modules
and as pure Java standalone scoring artifacts.
1.1 Citation
To cite this booklet, use the following: Hall, P., Kurka, M., and Bartz, A. (Sept
2018). Using H2O Driverless AI. http://docs.h2o.ai
How do you frame business problems in a data set for Driverless AI?
The data that is read into Driverless AI must contain one entity per row, like
a customer, patient, piece of equipment, or financial transaction. That row
must also contain information about what you will be trying to predict using
similar data in the future, like whether that customer in the row of data used a
promotion, whether that patient was readmitted to the hospital within thirty
days of being released, whether that piece of equipment required maintenance,
or whether that financial transaction was fraudulent. (In data science speak,
Driverless AI requires ”labeled” data.) Driverless AI runs through your data
many, many times looking for interactions, insights, and business drivers of the
phenomenon described by the provided dataset. Driverless AI can handle simple
data quality problems, but it currently requires all data for a single predictive
model to be in the same dataset, and that dataset must have already undergone
standard ETL, cleaning, and normalization routines before being loaded into
Driverless AI.
3 Key Features
Below are some of the key features available in Driverless AI.
4 Supported Algorithms
XGBoost
XGBoost is a supervised learning algorithm that implements a process called
boosting to yield accurate models. Boosting refers to the ensemble learning
technique of building many models sequentially, with each new model attempting
to correct for the deficiencies in the previous model. In tree boosting, each
new model that is added to the ensemble is a decision tree. XGBoost provides
parallel tree boosting (also known as GBDT, GBM) that solves many data
science problems in a fast and accurate way. For many problems, XGBoost is
one of the best gradient boosting machine (GBM) frameworks today.
LightGBM
LightGBM is a gradient boosting framework developed by Microsoft that uses
tree based learning algorithms. It was specifically designed for lower memory
usage and faster training speed and higher efficiency. Similar to XGBoost, it is
one of the best gradient boosting implementations available. It is also used for
fitting Random Forest models inside of Driverless AI.
GLM
Generalized Linear Models (GLM) estimate regression models for outcomes
following exponential distributions. GLMs are an extension of traditional linear
models. They have gained popularity in statistical data analysis due to:
Installing and Upgrading Driverless AI | 11
• the flexibility of the model structure unifying the typical regression methods
(such as linear regression and logistic regression for binary classification)
• the recent availability of model-fitting software
• the ability to scale well with large datasets
TensorFlow
TensorFlow is an open source software library for performing high performance
numerical computation. Driverless AI includes a TensorFlow NLP recipe based
on CNN Deeplearning models.
RuleFit
The RuleFit ([3]) algorithm creates an optimal set of decision rules by first
fitting a tree model, and then fitting a Lasso (L1-regularized) GLM model to
create a linear model consisting of the most important tree leaves (rules).
FTRL
Follow the Regularized Leader (FTRL) is a DataTable implementation ([13]) of
the FTRL-Proximal online learning algorithm proposed in ”Ad click prediction:
a view from the trenches” ([10]). This implementation uses a hashing trick
and Hogwild approach ([11]) for parallelization. FTRL can do binomial and
multinomial classification, binomial and multinomial regressions, as well as
regression for continuous targets.
6 Launching Driverless AI
1. After Driverless AI is installed and started, open a browser and navigate
to <driverless-ai-host-machine>:12345.
2. The first time you log in to Driverless AI, you will be prompted to read
and accept the Evaluation Agreement. You must accept the terms before
continuing. Review the agreement, then click I agree to these terms
to continue.
3. Log in by entering unique credentials. For example:
Username: h2oai
Password: h2oai
Note that these credentials do not restrict access to Driverless AI; they are
used to tie experiments to users. If you log in with different credentials,
for example, then you will not see any previously run experiments.
The Datasets Page | 13
4. As with accepting the Evaluation Agreement, the first time you log in,
you will be prompted to enter your License Key. Paste the License Key
into the License Key entry field, and then click Save to continue. This
license key will be saved in the host machine’s /license folder.
Upon successful completion, you will be ready to add datasets and run experi-
ments.
6.1 Messages
A Messages menu option is available when you launch Driverless AI. Click this
to view news and upcoming events regarding Driverless AI.
1. Click the Add Dataset or Drag and Drop button to upload or add a
dataset.
Notes:
• Upload File, File System, HDFS, and S3 are enabled by default.
These can be disabled by removing them from the enabled file systems
setting in the config.toml file.
• If File System is disabled, Driverless AI will open local filebrowser
by default.
• If Driverless AI was started with data connectors enabled for HDFS,
S3, Azure Blob Store, BlueData DataTap, Google Cloud Storage,
Google Big Query, JDBC, KDB+, Minio, and/or Snowflake, then
a dropdown will appear allowing you to specify where to begin
browsing for the dataset. Refer to the Data Connectors section in
the Driverless AI User Guide for more information.
Notes:
• When importing a folder, the entire folder and all of its contents are
read into Driverless AI as a single file.
• When importing a folder, all of the files in the folder must have the
same columns.
Upon completion, the datasets will appear in the Datasets Overview page. Click
on a dataset or click the [Click for Actions] button to open a submenu. From
this menu, you can specify to view Details, Split, Visualize, Predict, or Delete a
dataset. You can also delete an unused dataset by hovering over it, clicking the
16 | The Datasets Page
X button, and then confirming the delete. Note: You cannot delete a dataset
that was used in an active experiment. You have to delete the experiment first.
The Dataset Details page provides a summary of the dataset. This summary
lists each column that is included in the dataset along with the type, the count,
the mean, minimum, maximum, standard deviation, frequency, and the number
of unique values. Note: Driverless AI recognizes the following column types:
integer, string, real, and boolean. Date columns are given a str type.
Hover over the top of a column to view a summary of the first 20 rows of that
column.
The Datasets Page | 17
To view information for a specific column, type the column name in the field
above the graph.
To switch the view and preview the dataset, click the Dataset Rows button in
the top right portion of the UI. Then click the Dataset Overview button to
return to the original view.
18 | The Datasets Page
In Driverless AI, you can download datasets from the Datasets Overview page.
To download a dataset, click on the dataset or select the [Click for Actions]
button beside the dataset that you want to download, and then select Download
from the submenu that appears.
Upon completion, the split datasets will be available on the Datasets page.
The Visualization page shows all available graphs for the selected dataset. Note
that the graphs on the Visualization page can vary based on the information in
your dataset. You can also view and download logs that were generated during
the visualization.
20 | The Datasets Page
The images on this page are thumbnails. You can click on any of the graphs to
view and download a full-scale image.
8 Running an Experiment
8.1 Before You Begin
This section describes how to run an experiment using the Driverless AI UI.
Before you begin, it is best that you understand the available options that you
can specify. Note that only a dataset and a target column are required to be
specified, but Driverless AI provides a variety of experiment and expert settings
that you can use to build your models. Hover over each option in the UI, or
review the Experiments section in the Driverless AI User Guide for information
about these options.
After you have a comfortable working knowledge of these options, you are ready
to start your own experiment.
24 | Running an Experiment
target columns; and the count of the least frequent class for numeric
multinomial target columns.
• For data imported in version 1.0.23 (and later), TARGET FREQ
is the frequency of the target class for binomial target columns,
and MOST FREQ is the most frequent class for multinomial target
columns.
5. The next step is to set the parameters and settings for the experiment.
(Hover over each option and/or refer to the Experiment Settings section in
the Driverless AI User Guide for detailed information about these settings.)
You can set the parameters individually, or you can let Driverless AI infer
the parameters and then override any that you disagree with. Note that
Driverless AI will automatically infer the best settings for Accuracy, Time,
and Interpretability and provide you with an experiment preview based on
those suggestions. If you adjust these knobs, the experiment preview will
automatically update based on the new settings.
Expert settings (optional):
Optionally specify additional expert settings for the experiment. Refer to
the Expert Settings section in the Driverless AI User Guide for detailed
information about these settings. Note that the default values for these
options are derived from the environment variables in the config.toml file.
Additional settings (optional):
• Classification or Regression button. Driverless AI automatically
determines the problem type based on the response column. Though
not recommended, you can override this setting by clicking this
button.
• Reproducible: Click this button to build this with a random seed.
• Enable GPUs: Specify whether to enable GPUs. (Note that this
option is ignored on CPU-only systems.)
26 | Running an Experiment
The files within the experiment summary zip provide textual explanations of the
graphical representations that are shown on the Driverless AI UI. For example,
the preview.txt file provides the same information that was included on the
UI before starting the experiment; the summary.txt file provides the same
summary that appears in the lower-right portion of the UI for the experiment;
the features.txt file provides the relative importance values and descriptions
for the top features.
Experiment Report
A report file is included in the experiment summary. This report provides insight
into the training data and any detected shifts in distribution, the validation
schema selected, model parameter tuning, feature evolution and the final set of
features chosen during the experiment.
• report.docx: The report available in Word format
Experiment Overview Artifacts
The Experiment Summary contains artifacts that provide overviews of the
experiment.
• preview.txt: Provides a preview of the experiment. (This is the same
information that was included on the UI before starting the experiment.)
• summary.txt: Provides the same summary that appears in the lower-right
portion of the UI for the experiment.
Tuning Artifacts
During the Driverless AI experiment, model tuning is performed to determined
the optimal algorithm and parameter settings for the provided dataset. For
regression problems, target tuning is also performed to determine the best
way to represent the target column (i.e. does taking the log of the target
column improve results). The results from these tuning steps are available in
the Experiment Summary.
• tuning leaderboard: A table of the model tuning performed along with
the score generated from the model and training time. (Available in txt
or json.)
• target transform tuning leaderboard.txt: A table of the transforms
applied to the target column along with the score generated from the
model and training time. (This will be empty for binary and multiclass
use cases.)
Features Artifacts
Driverless AI performs feature engineering on the dataset to determine the
optimal representation of the data. The top features used in the final model
30 | Running an Experiment
can be seen in the GUI. The complete list of features used in the final model is
available in the Experiment Summary artifacts.
The Experiment Summary also provides a list of the original features and their
estimated feature importance. For example, given the features in the final
Driverless AI model, we can estimate the feature importance of the original
features.
• ensemble scores.json: The scores of the final model for our list of
scorers.
• ensemble confusion matrix: The confusion matrix for the internal vali-
dation and test data if test data is provided.
• ensemble gains: The lift and gains table for the internal validation and
test data if test data is provided. (Visualization of lift and gains can be
seen in the UI.)
Click this link to open the Experiments page. From this page, you can rename
an experiment, view previous experiments, begin a new experiment, rerun an
experiment, and delete an experiment.
32 | Running an Experiment
In Driverless AI, you can retry an experiment from the last checkpoint, you
can run a new experiment using an existing experiment’s settings, and you can
retrain an experiment’s final pipeline.
Checkpointing Experiments
In real-world scenarios, data can change. For example, you may have a model
currently in production that was built using 1 million records. At a later date,
you may receive several hundred thousand more records. Rather than building
a new model from scratch, Driverless AI includes H2O.ai Brain, which enables
caching and smart re-use of prior models to generate features for new models.
You can configure one of the following Brain levels in the experiment’s Expert
Settings.
• Level 0: Dont use any brain cache but still write to cache
• Level 3: Smart checkpoint like level 1, but for the entire population. Tune
only if the brain population is of insufficient size.
Running an Experiment | 33
• Level 4: Smart checkpoint like level 2, but for the entire population. Tune
only if the brain population is of insufficient size.
• Level 5: Smart checkpoint like level 4, but will scan over the entire brain
cache of populations (starting from resumed experiment if chosen) in
order to get the best scored individuals.
If you chooses Level 2 (default), then Level 1 is also done when appropriate.
To make use of smart checkpointing, be sure that the new data has:
• The same data column names as the old experiment
• The same data types for each column as the old experiment. (This won’t
match if, e.g,. a column was all int and then had one string row.)
• The same target as the old experiment
• The same target classes (if classification) as the old experiment
• For time series, all choices for intervals and gaps must be the same
When the above conditions are met, then you can:
• Start the same kind of experiment, just rerun for longer.
• Use a smaller or larger data set (i.e. fewer or more rows).
• Effectively do a final ensemble re-fit by varying the data rows and starting
an experiment with a new accuracy, time=1, and interpretability. Check
the experiment preview for what the ensemble will be.
• Restart/Resume a cancelled, aborted, or completed experiment
To run smart checkpointing on an existing experiment, click the right side of the
experiment that you want to retry, then select Restart from Last Checkpoint.
The experiment settings page opens. Specify the new dataset. If desired, you
can also change experiment settings, though the target column must be the
same. Click Launch Experiment to resume the experiment from the last
checkpoint and build a new experiment.
The smart checkpointing continues by adding a prior model as another model
used during tuning. If that prior model is better (which is likely if it was run for
more iterations), then that smart checkpoint model will be used during feature
evolution iterations and final ensemble.
Notes:
• Driverless AI does not guarantee exact continuation, only smart continua-
tion from any last point.
34 | Diagnosing a Model
• The directory where the H2O.ai Brain meta model files are stored is
tmp/H2O.ai brain. In addition, the default maximum brain size is
20GB. Both the directory and the maximum size can be changed in the
config.toml file.
Rerunning Experiments
To run a new experiment using an existing experiment’s settings, click the
right side of the experiment that you want to use as the basis for the new
experiment, then select New Model with Same Params. This opens the
experiment settings page. From this page, you can rerun the experiment using
the original settings, or you can specify to use new data and/or specify different
experiment settings. Click Launch Experiment to create a new experiment
with the same options.
Retrain Final Pipeline
To retrain an experiment’s final pipeline, click the right side of the experiment
that you want to use as the basis for the new experiment, then select Retrain
Final Pipeline. This opens the experiment settings page with the same settings
as the original experiment except that Time is set to 0. This retrain mode is
equivalent to setting feature brain level 3 with time 0 (no tuning or feature
evolution iterations).
To delete an experiment, hover over the experiment that you want to delete.
An ”X” option displays. Click this to delete the experiment. A confirmation
message will display asking you to confirm the delete. Click OK to delete the
experiment or Cancel to return to the experiments page without deleting.
9 Diagnosing a Model
The Diagnosing Model on New Dataset option allows you to view model
performance for multiple scorers based on existing model and dataset.
On the completed experiment page, click the Diagnose Model on New
Dataset button.
Notes:
• You can also diagnose a model by selecting Diagnostic from the top
menu, then selecting an experiment and test dataset.
Diagnosing a Model | 35
• The Model Diagnostics page also automatically populates with any ex-
periments that were scored from the Project Leaderboard on the Projects
page.
Select a dataset to use when diagnosing this experiment. At this point, Driverless
AI will begin calculating all available scores for the experiment.
When the diagnosis is complete, it will be available on the Model Diagnostics
page. Click on the new diagnosis. From this page, you can download predictions.
You can also view scores and metric plots. The plots are interactive. Click a
graph to enlarge. In the enlarged view, you can hover over the graph to view
details for a specific point. You can also download the graph.
Classification metric plots include the following graphs:
• ROC Curve
• Precision-Recall Curve
• Cumulative Gains
• Lift Chart
• Kolmogorov-Smirnov Chart
• Confusion Matrix
36 | Project Workspace
10 Project Workspace
Driverless AI provides a Project Workspace for managing datasets and exper-
iments related to a specific business problem or use case. Whether you are
trying to detect fraud or predict user retention, datasets and experiments can
be stored and saved in the individual projects. A Leaderboard on the Projects
page allows you to easily compare performance and results and identify the best
solution for your problem.
To create a Project Workspace:
1. Click the Projects option on the top menu.
2. Click New Project.
3. Specify a name for the project and provide a description.
4. Click Create Project. This creates an empty Project page.
From the Projects page, you can link datasets and/or experiments, and you can
run new experiments. When you link an existing experiment to a Project, the
datasets used for the experiment will automatically be linked to this project (if
not already linked).
Project Workspace | 37
You can link a Training, Validation, or Test dataset by selecting the Train-
ing, Validation, or Test tab, clicking Link Dataset, and then selecting the
dataset(s) to include. The list available datasets include those that were added
on :ref:‘Datasets‘, or you can browse datasets in your file system. Be sure to
select the correct tab before linking a training, validation, or test dataset. This
is because, when you run a new experiment in the project, the training data,
validation data, and test data options for that experiment come from list of
datasets linked here. You will not be able to, for example, select any datasets
from within the Training tab when specifying a test dataset on the experiment.
When datasets are linked, the same menu options are available here as on the
Datasets page.
In the Datasets section, click Select, then select a training and/or valida-
tion and/or testing dataset. The combination of selected datasets will show
experiments in the Project that use that combination of datasets.
38 | Project Workspace
When experiments are run from within a Project, only linked datasets can be
used.
2. Select your training data and optionally your validation and/or testing
data.
3. Specify your desired experiment settings, and then click Launch Experi-
ment.
When experiments are linked to a Project, the same checkpointing options for
experiments are available here as on the Experiments page.
The Leaderboard allows you to view scoring information for a variety of scorers.
You can change the scorer used by clicking the Scorer link and then selecting
a scorer.
Notes:
• If an experiment has already scored a dataset, it will not score it again.
The scoring step is deterministic, so for a particular scorer dataset and
experiment combination, the score will be same regardless of how many
times you repeat it.
• The scorer dataset absolutely needs to have all the columns that are
expected by the various experiments you are scoring it on. However, the
columns of the scorer dataset need not be exactly the same as input
features expected by the experiment. There can be additional columns
in the scorer dataset. If these columns were not used for training, they
will be ignored. This feature gives you the ability to train experiments
on different training datasets (i.e., having different features), and if you
have an ”uber test dataset” that includes all these feature columns, then
you can use the same dataset to score these experiments.
• You will notice a Score Time in the Experiments Leaderboard. This
values shows the total time (in seconds) that it took for calculating the
experiment scores for all applicable scorers for the experiment type. This
is valuable to users who need to estimate the runtime performance of an
experiment.
You can compare two or three experiments and view side-by-side detailed
information about each.
1. Click the Select button at the top of the Leaderboard and select either
two or three experiments that you want to compare. You cannot compare
more than three experiments.
42 | Project Workspace
This opens the Compare Experiments page. This page includes the
experiment summary for each experiment as well as metric plots. The
metric plots vary depending on whether this is a classification or regression
experiment.
For classification experiments, this page includes:
• Variable Importance list
• Confusion Matrix
• ROC Curve
• Precision Recall Curve
• Lift Chart
• Gains Chart
• Kolmogorov-Smirnov Chart
For regression experiments, this page includes:
• Variable Importance list
• Actual vs. Predicted Graph
Interpreting a Model | 43
11 Interpreting a Model
Driverless AI provides robust interpretability of machine learning models to
explain modeling results in a human-readable format. In the Machine Learning
Interpetability (MLI) view, Driverless AI employs a host of different techniques
and methodologies for interpreting and explaining the results of its models.
A number of charts are generated automatically, including K-LIME, Shapley,
Variable Importance, Decision Tree Surrogate, Partial Dependence, Individual
Conditional Expectation, and more. Additionally, you can download a CSV of
LIME and Shapley reasons codes from this view.
44 | Interpreting a Model
• Using the MLI link in the upper right corner of the UI to interpret either
a Driverless AI model or an external model.
Notes:
• This release deprecates experiments and MLI models from 1.7.0 and
earlier.
• MLI is not available for NLP experiments or for multiclass Time Series.
• For time series experiments, when the test set contains actuals, you will
see the time series metric plot and the group metrics table. If there are
no actuals, MLI will run, but you will see only the prediction value time
series and a Shapley table.
For regular experiments, this page provides several visual explanations of the
trained Driverless AI model and its results. More information about this page is
available in the Understanding the Model Interpretation Page section later in
this chapter.
Interpreting a Model | 45
This section describes how to run MLI on time series data for multiple groups.
1. Click the Interpret this Model button on a completed time series ex-
periment to launch Model Interpretation for that experiment. This page
includes the following:
• A Help panel describing how to read and use this page. Click the
Hide Help Button to hide this text.
• If a test set is provided and the test set includes actuals, then a panel
will display showing a time series plot and the top and bottom group
matrix tables based on the scorer that was used in the experiment.
The metric plot will show the metric of interest per time point
for holdout predictions and the test set. Likewise, the actual vs.
predicted plot will show actuals vs. predicted values per time point
for the holdout set and the test set. Note that this panel can be
resized if necessary.
• If a test set is not provided, then internal validation predictions will
be used. The metric plot will only show the metric of interest per
time point for holdout predictions. Likewise, the actual vs. predicted
46 | Interpreting a Model
plot will only show actuals vs. predicted values per time point for
the holdout set.
2. Scroll to the bottom of the panel and select a grouping in the Group
Search field to view a graph of Actual vs. Predicted values for the group.
The outputted graph can be downloaded to your local machine.
Interpreting a Model | 47
3. Click on a prediction point in the plot (white line) to view Shapley values
for that prediction point. The Shapley values plot can also be downloaded
to your local machine.
4. Click Add Panel to add a new MLI Time Series panel. This allows you to
compare different groups in the same model and also provides the flexibility to
do a ”side-by-side” comparison between different models.
Time Series MLI can also be run when only one group is available.
1. Click the Interpret this Model button on a completed time series ex-
periment to launch Model Interpretation for that experiment. This page
includes the following:
• A Help panel describing how to read and use this page. Click the
Hide Help Button to hide this text.
• If a test set is provided and the test set includes actuals, then a panel
will display showing a time series plot and the top and bottom group
matrix tables based on the scorer that was used in the experiment.
The metric plot will show the metric of interest per time point
for holdout predictions and the test set. Likewise, the actual vs.
predicted plot will show actuals vs. predicted values per time point
for the holdout set and the test set. Note that this panel can be
resized if necessary.
• If a test set is not provided, then internal validation predictions will
be used. The metric plot will only show the metric of interest per
time point for holdout predictions. Likewise, the actual vs. predicted
plot will only show actuals vs. predicted values per time point for
the holdout set.
48 | Interpreting a Model
2. Scroll to the bottom of the panel and select an option in the Group
Search field to view a graph of Actual vs. Predicted values for the group.
(Note that for Single Time Series MLI, there will only be one option in this
field.) The outputted graph can be downloaded to your local machine.
Interpreting a Model | 49
3. Click on a prediction point in the plot (white line) to view Shapley values
for that prediction point. The Shapley values plot can also be downloaded
to your local machine.
4. Click Add Panel to add a new MLI Time Series panel. This allows you
to do a ”side-by-side” comparison between different models.
setting for mli sample size.) Turn this toggle off to run MLI on the
entire dataset.
9. Optionally specify weight and dropped columns.
10. For K-LIME interpretations, optionally specify a clustering column. Note
that this column should be categorical. Also note that this is only available
when K-LIME is used as the LIME method and when Use Original
Features is enabled. If the LIME method is changed to LIME-SUP, then
this option is no longer available.
11. Optionally specify the number of surrogate cross-validation folds to use
(from 0 to 10). When running experiments, Driverless AI automatically
splits the training data and uses the validation data to determine the
performance of the model parameter tuning and feature engineering steps.
For a new interpretation, Driverless AI uses 3 cross-validation folds by
default for the interpretation.
12. For K-LIME interpretations, optionally specify one or more columns to
generate decile bins (uniform distribution) to help with MLI accuracy.
Columns selected are added to top n columns for quantile binning selection.
If a column is not numeric or not in the dataset (transformed features),
then the column will be skipped. Note: This option is only available
when Use Original Features is enabled.
13. For K-LIME interpretations, optionally specify the number of top variable
importance numeric columns to run decile binning to help with MLI
accuracy. (Note that variable importances are generated from a Random
Forest model.) This defaults to 0, and the maximum value is 10. Note:
This option is only available when Use Original Features is enabled.
14. Click the Launch MLI button.
52 | Interpreting a Model
12. For K-LIME interpretations, optionally specify the number of top variable
importance numeric columns to run decile binning to help with MLI
accuracy. (Note that variable importances are generated from a Random
Forest model.) This value is combined with any specific columns selected
for quantile binning. This defaults to 0, and the maximum value is 10.
13. Click the Launch MLI button.
• MLI Docs: A link to the Interpreting a Model section in the online help.
• Download MLI Logs: Downloads a zip file of the logs that were gener-
ated during this interpretation.
• Scoring Pipeline:
The Summary page is the first page that opens when you view an interpretation.
This page provides an overview of the interpretation, including the dataset and
Driverless AI experiment (if available) that were used for the interpretation
along with the feature space (original or transformed), target column, problem
type, and k-Lime information. If the interpretation was created from a Driverless
AI model, then a table with the Driverless AI model summary is also included
along with the top variables for the model.
This menu provides a Feature Importance plot and a Shapley plot (not supported
for RuleFit and TensorFlow models) for transformed features as well as a Partial
Dependence plot and Disparate Impact Analysis (DIA) for Driverless AI models.
Feature Importance
This plot is available for all models for binary classification, multiclass classifica-
tion, and regression experiments.
Shapley Plot
This plot is not available for RuleFit or TensorFlow models. For all other
models, this plot is available for binary classification, multiclass classification,
and regression experiments.
Shapley explanations are a technique with credible theoretical support that
presents consistent global and local variable contributions. Local numeric
Shapley values are calculated by tracing single rows of data through a trained
tree ensemble and aggregating the contribution of each input variable as the
row of data moves through the trained ensemble. For regression tasks, Shapley
values sum to the prediction of the Driverless AI model. For classification
problems, Shapely values sum to the prediction of the Driverless AI model
before applying the link function. Global Shapley values are the average of the
absolute Shapley values over every row of a data set.
More information is available at https://arxiv.org/abs/1706.06060.
You can view a Shapley explanations plot by selecting the Interpret this Model
on Transformed Features button in an experiment.
58 | Interpreting a Model
Taking the Driverless AI model as F(X), assuming credit scores vary from 500
to 800 in the training data, and that increments of 30 are used to plot the ICE
curve, ICE is calculated as follows:
ICEcredit score,500 = F (30, 500, 1000)
ICEcredit score,530 = F (30, 530, 1000)
ICEcredit score,560 = F (30, 560, 1000)
Interpreting a Model | 59
...
ICEcredit score,800 = F (30, 800, 1000)
The one-dimensional partial dependence plots displayed here do not take inter-
actions into account. Large differences in partial dependence and ICE are an
indication that strong variable interactions may be present. In this case partial
dependence plots may be misleading because average model behavior may not
accurately reflect local behavior.
Overlaying ICE plots onto partial dependence plots allow the comparison of
the Driverless AI model’s treatment of certain examples or individuals to the
model’s average predictions over the domain of an input variable of interest.
This plot shows the partial dependence when a variable is selected and the ICE
values when a specific row is selected. Users may select a point on the graph
to see the specific value at that point. Partial dependence (yellow) portrays the
average prediction behavior of the Driverless AI model across the domain of an
input variable along with +/- 1 standard deviation bands. ICE (grey) displays
the prediction behavior for an individual row of data when an input variable
is toggled across its domain. Currently, partial dependence and ICE is only
available for the top ten most important original input variables. Categorical
variables with 20 or more unique values are never included in these plots.
group that receives the potentially harmful outcome is divided by the proportion
of the privileged group that receives the same outcomethe resulting proportion
is then used to determine whether the model is biased. Refer to the Summary
section to determine if a categorical level (for example, Fairness Female) is
fair in comparison to the specified reference level and user-defined thresholds.
Fairness All is a true or false value that is only true if every category is fair in
comparison to the reference level.
Disparate impact testing is best suited for use with constrained models in Driver-
less AI, such as linear models, monotonic GBMs, or RuleFit. The average group
metrics reported in most cases by DIA may miss cases of local discrimination,
especially with complex, unconstrained models that can treat individuals very
differently based on small changes in their data attributes.
DIA allows you to specify a disparate impact variable (the group variable that
is analyzed), a reference level (the group level that other groups are compared
to), and user-defined thresholds for disparity. Several tables are provided as
part of the analysis:
• Group metrics: The aggregated metrics calculated per group. For
example, true positive rates per group.
• Group disparity: This is calculated by dividing the metric for group by
the reference group metric. Disparity is observed if this value falls outside
of the user-defined thresholds.
• Group parity: This builds on Group disparity by converting the above
calculation to a true or false value by applying the user-defined thresholds
to the disparity values.
In accordance with the established four-fifths rule, user-defined thresholds are set
to 0.8 and 1.25 by default. These thresholds will generally detect if the model is
(on average) treating the non-reference group 20% more or less favorably than
the reference group. Users are encouraged to set the user-defined thresholds to
align with their organization’s guidance on fairness thresholds.
Notes:
• Although the process of DIA is the same for both classification and re-
gression experiments, the returned information is dependent on the type
of experiment being interpreted. An analysis of a regression experiment
returns an actual vs. predicted plot, while an analysis of a binary clas-
sification experiment returns confusion matrices. The above tables are
provided for both types of experiments.
• Users are encouraged to consider the explanation dashboard to understand
and augment results from disparate impact analysis. In addition to its
established use as a fairness tool, users may want to consider disparate
Interpreting a Model | 61
impact for broader model debugging purposes. For example, users can
analyze the supplied confusion matrices and group metrics for important,
non-demographic features in the Driverless AI model.
Classification Experiment
Regression Experiment
K-LIME creates one global surrogate GLM on the entire training data and
also creates numerous local surrogate GLMs on samples formed from K-means
clusters in the training data. The features used for K-means are selected from
the Random Forest surrogate model’s variable importance. The number of
features used for K-means is the minimum of the top 25 percent of variables
from the Random Forest surrogate model’s variable importance and the max
number of variables that can be used for K-means, which is set by the user in
the config.toml setting for mli max number cluster vars. (Note, if the
number of features in the dataset are less than or equal to 6, then all features
are used for K-means clustering.) The previous setting can be turned off to use
all features for k-means by setting use all columns klime kmeans in the
config.toml file to true. All penalized GLM surrogates are trained to model
the predictions of the Driverless AI model. The number of clusters for local
explanations is chosen by a grid search in which the R2 between the Driverless
AI model predictions and all of the local K-LIME model predictions is maximized.
The global and local linear model’s intercepts, coefficients, R2 values, accuracy,
and predictions can all be used to debug and develop explanations for the
Driverless AI model’s behavior.
The parameters of the global K -LIME model give an indication of overall linear
feature importance and the overall average direction in which an input variable
influences the Driverless AI model predictions. The global model is also used
to generate explanations for very small clusters (N < 20) where fitting a local
linear model is inappropriate.
The in-cluster linear model parameters can be used to profile the local region,
to give an average description of the important variables in the local region,
and to understand the average direction in which an input variable affects the
Driverless AI model predictions. For a point within a cluster, the sum of the local
linear model intercept and the products of each coefficient with their respective
input variable value are the K -LIME prediction. By disaggregating the K -LIME
predictions into individual coefficient and input variable value products, the
local linear impact of the variable can be determined. This product is sometimes
referred to as a reason code and is used to create explanations for the Driverless
AI model’s behavior.
In the following example, reason codes are created by evaluating and disaggre-
gating a local linear model.
Given the row of input data with its corresponding Driverless AI and K -LIME
predictions:
It can be seen that the local linear contributions for each variable are:
• debt to income ratio: 0.01 * 30 = 0.3
• credit score: 0.0005 * 600 = 0.3
• savings acct balance: 0.0002 * 1000 = 0.2
Each local contribution is positive and thus contributes positively to the Driver-
less AI model’s prediction of 0.85 for H2OAI predicted default. By taking into
consideration the value of each contribution, reason codes for the Driverless AI
decision can be derived. debt to income ratio and credit score would be the
two largest negative reason codes, followed by savings acct balance.
The local linear model intercept and the products of each coefficient and
corresponding value sum to the K -LIME prediction. Moreover it can be seen
64 | Interpreting a Model
Like all LIME explanations based on linear models, the local explanations are
linear in nature and are offsets from the baseline prediction, or intercept, which
represents the average of the penalized linear model residuals. Of course, linear
approximations to complex non-linear response functions will not always create
suitable explanations and users are urged to check the K -LIME plot, the local
model R2 , and the accuracy of the K -LIME prediction to understand the validity
of the K -LIME local explanations. When K -LIME accuracy for a given point
or set of points is quite low, this can be an indication of extremely nonlinear
behavior or the presence of strong or high-degree interactions in this local region
of the Driverless AI response function. In cases where K -LIME linear models
are not fitting the Driverless AI model well, nonlinear LOCO feature importance
values may be a better explanatory tool for local model behavior. As K -LIME
local explanations rely on the creation of k-means clusters, extremely wide input
data or strong correlation between input variables may also degrade the quality
of K -LIME local explanations.
This plot shows Driverless AI model predictions and LIME model predictions
in sorted order by the Driverless AI model predictions. This graph is interac-
tive. Hover over the Model Prediction, LIME Model Prediction, or Actual
Target radio buttons to magnify the selected predictions. Or click those radio
buttons to disable the view in the graph. You can also hover over any point in
the graph to view LIME reason codes for that value. By default, this plot shows
information for the global LIME model, but you can change the plot view to
show local results from a specific cluster. The LIME plot also provides a visual
indication of the linearity of the Driverless AI model and the trustworthiness
of the LIME explanations. The closer the local linear model approximates the
Interpreting a Model | 65
Driverless AI model predictions, the more linear the Driverless AI model and
the more accurate the explanation generated by the LIME local linear models.
Decision Tree
The decision tree surrogate model increases the transparency of the Driverless
AI model by displaying an approximate flowchart of the complex Driverless
AI model’s decision making process. The decision tree surrogate model also
displays the most important variables in the Driverless AI model and the most
important interactions in the Driverless AI model. The decision tree surrogate
model can be used for visualizing, validating, and debugging the Driverless AI
model by comparing the displayed decision-process, important variables, and
important interactions to known standards, domain knowledge, and reasonable
expectations.
In the decision tree plot, the highlighted row shows the path to the highest
probability leaf node and indicates the globally important variables and inter-
actions that influence the Driverless AI model prediction for that row. The
decision tree plot is available for binary classification and regression models.
66 | Interpreting a Model
LOCO
Interpreting a Model | 67
Local feature importance describes how the combination of the learned model
rules or parameters and an individual row’s attributes affect a model’s prediction
for that row while taking nonlinearity and interactions into effect. Local feature
importance values reported here are based on a variant of the leave-one-covariate-
out (LOCO) method (Lei et al, 2017 [9]).
In the LOCO-variant method, each local feature importance is found by re-
scoring the trained Driverless AI model for each feature in the row of interest,
while removing the contribution to the model prediction of splitting rules that
contain that feature throughout the ensemble. The original prediction is then
subtracted from this modified prediction to find the raw, signed importance
for the feature. All local feature importance values for the row are then scaled
between 0 and 1 for direct comparison with global feature importance values.
Given the row of input data with its corresponding Driverless AI and K -LIME
predictions:
A Partial Dependence and ICE plot is available for both Driverless AI and
surrogate models. Refer to the previous Partial Dependence and Individual
Conditional Expectation section for more information about this plot.
For years, common sense has deemed the complex, intricate formulas created by
training machine learning algorithms to be uninterpretable. While great advances
have been made in recent years to make these often nonlinear, non-monotonic,
and non-continuous machine-learned response functions more understandable
(Hall et al, 2017 [7]), it is likely that such functions will never be as directly or
universally interpretable as more traditional linear models.
It is well understood that for the same set of input variables and prediction
targets, complex machine learning algorithms can produce multiple accurate
models with very similar, but not exactly the same, internal architectures
(Brieman, 2001 [1]). This alone is an obstacle to interpretation, but when using
these types of algorithms as interpretation tools or with interpretation tools it is
important to remember that details of explanations will change across multiple
accurate models.
12 Viewing Explanations
Note: Not all explanatory functionality is available for multinomial classification
scenarios.
Driverless AI provides easy-to-read explanations for a completed model. You
can view these by clicking the Explanations button on the Model Interpretation
page. Note that this button is only available for completed experiments. Click
Close when you are done to return to the Model Interpretations page.
The UI allows you to view global, cluster-specific, and local reason codes. You
can also export the explanations to CSV.
• Global Reason Codes: To view global reason codes, select the Global
plot from the Cluster dropdown.
With Global selected, click the Explanations button beside the Cluster drop-
down.
• Local Reason Codes by Row Number: To view local reason codes for
a specific row, select a point on the graph or type a value in the Value
field.
• Local Reason Codes by ID: To view local reason codes for a specific
row, change the dropdown to ID and then type a value in the ID field.
1. Click the Experiments link in the top menu and select the experiment
that you want to use.
3. Locate the new dataset that you want to score on. Note that this new
dataset must include the same columns as the dataset used in selected
experiment.
4. Click Select at the top of the screen. This immediately starts the scoring
process.
Follow these steps to transform another dataset. Note that this assumes the
new dataset has been added to Driverless AI already.
Transform Another Dataset | 75
Note: Transform Another Dataset is not available for Time Series experi-
ments.
1. On the completed experiment page for the original dataset, click the
Transform Another Dataset button.
2. Select the new training dataset that you want to transform. Note that
this must have the same number columns as the original dataset.
3. In the Select drop down, specify a validation dataset to use with this
dataset, or specify to split the training data. If you specify to split the
data, then you also specify the split value (defaults to 25 percent) and
the seed (defaults to 1234). Note: To ensure the transformed dataset
respects the row order, choose a validation dataset instead of splitting
the training data. Splitting the training data will result in a shuffling of
the row order.
4. Optionally specify a test dataset. If specified, then the output also include
the final test dataset for final scoring.
5. Click Launch Transformation.
The following datasets will be available for download upon successful completion:
• Training dataset (not for cross validation)
• Validation dataset for parameter tuning
• Test dataset for final scoring. This option is available if a test dataset
was used.
76 | The Driverless AI Scoring Pipelines
This is the recommended method for running the Python Scoring Pipeline. Use
this method if:
• You have an air gapped environment with no access to the Internet.
• You are running Power.
• You want an easy quick start approach.
Prerequisites
• A valid Driverless AI license key
• A completed Driverless AI experiment
• Downloaded Python Scoring Pipeline
Running the Python Scoring Pipeline - Recommended
1. On https://www.h2o.ai/download/, download the TAR SH version of
Driverless AI (for either Linux or IBM Power).
2. Use bash to execute the download. This creates a new dai-nnn folder.
3. Change directories into the new Driverless AI folder.
cd dai-nnn directory.
4. Run the following to install the Python Scoring Pipeline for your completed
Driverless AI experiment:
./dai-env.sh pip install /path/to/your/scoring_experiment.whl
5. Run the following command to run the included scoring pipeline example:
DRIVERLESS_AI_LICENSE_KEY="pastekeyhere"
SCORING_PIPELINE_INSTALL_DEPENDENCIES=0 ./dai-env.sh
This section describes an alternative method for running the Python Scoring
Pipeline. This version requires Internet access. It is also not supported on
Power machines.
Prerequisites
The following are required in order to run the downloaded scoring pipeline.
The Driverless AI Scoring Pipelines | 79
• The scoring module and scoring service are supported only on Linux
x86 64 with Python 3.6 and OpenBLAS.
• The scoring module and scoring service download additional packages
at install time and require Internet access. Depending on your network
environment, you might need to set up internet access via a proxy.
• Valid Driverless AI license. Driverless AI requires a license to be specified
in order to run the Python Scoring Pipeline.
• Apache Thrift (to run the TCP scoring service)
• Linux x86 64 environment
• Python 3.6
• libopenblas-dev (required for H2O4GPU)
• Internet access to download and install packages. Note that depending
on your environment, you may also need to set up proxy.
• OpenCL
Examples of how to install these prerequisites are below:
Installing Python 3.6 and OpenBlas Ubuntu 16.10+
$ sudo apt install python3.6 python3.6-dev python3-pip python3-dev \
python-virtualenv python3-virtualenv libopenblas-dev
Run the following to refresh the runtime shared after installing Thrift:
$ sudo ldconfig /usr/local/lib
$ export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
$ bash run_http_server.sh
$ bash run_http_client.sh
Note: By default, the run *.sh scripts mentioned above create a virtual
environment using virtualenv and pip, within which the Python code is executed.
The scripts can also leverage Conda (Anaconda/Mininconda) to create Conda
virtual environment and install required package dependencies. The package
manager to use is provided as an argument to the script.
# to use conda package manager
$ export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
$ bash run_example.sh --pm conda
Note: If you experience errors while running any of the above scripts, please
check to make sure your system has a properly installed and configured Python
3.6 installation. Refer to the Troubleshooting Python Environment Issues section
at the end of this chapter to see how to set up and test the scoring module
using a cleanroom Ubuntu 16.04 virtual machine.
The scoring module is a Python module bundled into a standalone wheel file
(name scoring *.whl). All the prerequisites for the scoring module to work
correctly are listed in the requirements.txt file. To use the scoring module, all
you have to do is create a Python virtualenv, install the prerequisites, and then
import and use the scoring module as follows:
# See ’example.py’ for complete example.
from scoring_487931_20170921174120_b4066 import Scorer
scorer = Scorer() # Create instance.
score = scorer.score([ # Call score()
7.416, # sepal_len
3.562, # sepal_wid
1.049, # petal_len
2.388, # petal_wid
])
82 | The Driverless AI Scoring Pipelines
The scoring service hosts the scoring module as an HTTP or TCP service.
Doing this exposes all the functions of the scoring module through remote
procedure calls (RPC). In effect, this mechanism allows you to invoke scoring
functions from languages other than Python on the same computer or from
another computer on a shared network or on the Internet.
The scoring service can be started in two ways:
• In TCP mode, the scoring service provides high-performance RPC calls
via Apache Thrift (https://thrift.apache.org/) using a binary
wire protocol.
• In HTTP mode, the scoring service provides JSON-RPC 2.0 calls served
by Tornado (http://www.tornadoweb.org).
Scoring operations can be performed on individual rows (row-by-row) or in
batch mode (multiple rows at a time).
Scoring Service - TCP Mode (Thrift)
The TCP mode allows you to use the scoring service from any language supported
by Thrift, including C, C++, C#, Cocoa, D, Dart, Delphi, Go, Haxe, Java,
Node.js, Lua, perl, PHP, Python, Ruby and Smalltalk.
To start the scoring service in TCP mode, you will need to generate the Thrift
bindings once, then run the server:
The Driverless AI Scoring Pipelines | 83
Note that the Thrift compiler is only required at build-time. It is not a run time
dependency, i.e. once the scoring services are built and tested, you do not need
to repeat this installation process on the machines where the scoring services
are intended to be deployed.
To call the scoring service, simply generate the Thrift bindings for your language
of choice, then make RPC calls via TCP sockets using Thrift’s buffered transport
in conjunction with its binary protocol.
# See ’run_tcp_client.sh’ for complete example.
$ thrift --gen py scoring.thrift
You can reproduce the exact same result from other languages, e.g. Java:
$ thrift --gen java scoring.thrift
// Dependencies:
// commons-codec-1.9.jar
// commons-logging-1.2.jar
// httpclient-4.4.1.jar
// httpcore-4.4.1.jar
// libthrift-0.10.0.jar
// slf4j-api-1.7.12.jar
import ai.h2o.scoring.Row;
import ai.h2o.scoring.ScoringService;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import java.util.List;
transport.close();
} catch (TException ex) {
ex.printStackTrace();
}
}
}
The HTTP mode allows you to use the scoring service using plaintext JSON-
RPC calls. This is usually less performant compared to Thrift, but has the
advantage of being usable from any HTTP client library in your language of
choice, without any dependency on Thrift.
Similarly, you can use any HTTP client library to reproduce the above result.
For example, from Python, you can use the requests module as follows:
import requests
row = [7.486, 3.277, 4.755, 2.354]
req = dict(id=1, method=’score’, params=dict(row=row))
res = requests.post(’http://localhost:9090/rpc’, data=req)
print(res.json()[’result’])
The Driverless AI Scoring Pipelines | 85
#!/usr/bin/env bash
# end of bootstrap.sh
Vagrant.configure(2) do |config|
config.vm.box = "ubuntu/xenial64"
config.vm.provision :shell, path: "bootstrap.sh", privileged: false
config.vm.hostname = "h2o"
config.vm.provider "virtualbox" do |vb|
vb.memory = "4096"
end
end
# end of Vagrantfile
2. Launch the VM and SSH into it. Note that we are also placing the scoring
pipeline in the same directory so that we can access it later inside the
VM.
cp /path/to/scorer.zip .
vagrant up
vagrant ssh
At this point, you should see scores printed out on the terminal. If not, contact
us at support@h2o.ai.
This is the recommended method for running the MLI Scoring Pipeline. Use
this method if:
• You have an air gapped environment with no access to the Internet.
• You are running Power.
• You want an easy quick start approach.
Prerequisites
• A valid Driverless AI license key.
• A completed Driverless AI experiment.
• Downloaded MLI Scoring Pipeline.
88 | The Driverless AI Scoring Pipelines
5. Run the following command to run the included scoring pipeline example:
DRIVERLESS_AI_LICENSE_KEY="pastekeyhere"
SCORING_PIPELINE_INSTALL_DEPENDENCIES=0 ./dai-env.sh /path/to/
your/run_example.sh
This section describes an alternative method for running the MLI Standalone
Scoring Pipeline. This version requires Internet access. It is also not supported
on Power machines.
15.3.4 Prerequisites
Run the following to refresh the runtime shared after installing Thrift.
$ sudo ldconfig /usr/local/lib
Run the TCP scoring server example. Use two terminal windows. (This requires
Linux x86 64, Python 3.6 and Thrift.)
$ bash run_tcp_server.sh
$ bash run_tcp_client.sh
Run the HTTP scoring server example. Use two terminal windows. (This
requires Linux x86 64, Python 3.6 and Thrift.)
$ bash run_http_server.sh
$ bash run_http_client.sh
90 | The Driverless AI Scoring Pipelines
The MLI scoring module is a Python module bundled into a standalone wheel
file (name scoring *.whl). All the prerequisites for the scoring module to work
correctly are listed in the requirements.txt file. To use the scoring module, all
you have to do is create a Python virtualenv, install the prerequisites, and then
import and use the scoring module as follows:
----- See ’example.py’ for complete example. -----
from scoring_487931_20170921174120_b4066 import Scorer
scorer = KLimeScorer() # Create instance.
score = scorer.score_reason_codes([ # Call score_reason_codes()
7.416, # sepal_len
3.562, # sepal_wid
1.049, # petal_len
2.388, # petal_wid
])
There are times when the K-LIME model score is not close to the Driverless
AI model score. In this case it may be better to use reason codes using the
Shapley method on the Driverless AI model. Note: The reason codes from
Shapley will be in the transformed feature space.
To see an example of using both K-LIME and Driverless AI Shapley reason
codes in the same Python session, run:
$ bash run_example_shapley.sh
The Driverless AI Scoring Pipelines | 91
For this batch script to succeed, MLI must be run on a Driverless AI model.
If you have run MLI in standalone (external model) mode, there will not be a
Driverless AI scoring pipeline.
If MLI was run with transformed features, the Shapley example scripts will not
be exported. You can generate exact reason codes directly from the Driverless
AI model scoring pipeline.
The MLI scoring service hosts the scoring module as a HTTP or TCP service.
Doing this exposes all the functions of the scoring module through remote
procedure calls (RPC).
In effect, this mechanism allows you to invoke scoring functions from languages
other than Python on the same computer, or from another computer on a
shared network or the internet.
The scoring service can be started in two ways:
• In TCP mode, the scoring service provides high-performance RPC calls via
Apache Thrift (https://thrift.apache.org/) using a binary wire protocol.
• In HTTP mode, the scoring service provides JSON-RPC 2.0 calls served
by Tornado (http://www.tornadoweb.org).
Scoring operations can be performed on individual rows (row-by-row) using
score or in batch mode (multiple rows at a time) using score batch. Both
functions allow you to specify pred contribs=[True|False] to get MLI
predictions (KLime/Shapley) on a new dataset. See the example shapley.py
file for more information.
MLI Scoring Service - TCP Mode (Thrift)
The TCP mode allows you to use the scoring service from any language supported
by Thrift, including C, C++, C#, Cocoa, D, Dart, Delphi, Go, Haxe, Java,
Node.js, Lua, perl, PHP, Python, Ruby and Smalltalk.
To start the scoring service in TCP mode, you will need to generate the Thrift
bindings once, then run the server:
----- See ’run_tcp_server.sh’ for complete example. -----
$ thrift --gen py scoring.thrift
$ python tcp_server.py --port=9090
Note that the Thrift compiler is only required at build-time. It is not a run time
dependency, i.e. once the scoring services are built and tested, you do not need
92 | The Driverless AI Scoring Pipelines
to repeat this installation process on the machines where the scoring services
are intended to be deployed.
To call the scoring service, simply generate the Thrift bindings for your language
of choice, then make RPC calls via TCP sockets using Thrift’s buffered transport
in conjunction with its binary protocol.
----- See ’run_tcp_client.sh’ for complete example. -----
$ thrift --gen py scoring.thrift
You can reproduce the exact same result from other languages, e.g. Java:
// Dependencies:
// commons-codec-1.9.jar
// commons-logging-1.2.jar
// httpclient-4.4.1.jar
// httpcore-4.4.1.jar
// libthrift-0.10.0.jar
// slf4j-api-1.7.12.jar
import ai.h2o.scoring.Row;
import ai.h2o.scoring.ScoringService;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import java.util.List;
transport.close();
} catch (TException ex) {
ex.printStackTrace();
}
The Driverless AI Scoring Pipelines | 93
}
}
Similarly, you can use any HTTP client library to reproduce the above result.
For example, from Python, you can use the requests module as follows:
import requests
row = [7.486, 3.277, 4.755, 2.354]
req = dict(id=1, method=’score_reason_codes’, params=dict(row=row))
res = requests.post(’http://localhost:9090/rpc’, data=req)
print(res.json()[’result’])
Keep in mind that, similar to H2O-3, MOJOs are tied to experiments. Ex-
periments and MOJOs are not automatically upgraded when Driverless AI is
upgraded.
15.4.1 Prerequisites
The following are required in order to run the MOJO scoring pipeline.
• Java 7 runtime (JDK 1.7) or newer.
• Valid Driverless AI license. You can download the license.sig file
from the machine hosting Driverless AI (usually in the license folder).
Copy the license file into the downloaded mojo-pipeline folder.
• mojo2-runtime.jar file. This is available from the top navigation menu in
the Driverless AI UI and in the downloaded mojo-pipeline.zip file for an
experiment.
License Specification
Driverless AI requires a license to be specified in order to run the MOJO Scoring
Pipeline. The license can be specified in one of the following ways:
• Via an environment variable:
– DRIVERLESS AI LICENSE FILE: Path to the Driverless AI li-
cense file, or
– DRIVERLESS AI LICENSE KEY: The Driverless AI license key
(Base64 encoded string)
• Via a system property of JVM (-D option):
– ai.h2o.mojos.runtime.license.file: Path to the Driver-
less AI license file, or
– ai.h2o.mojos.runtime.license.key: The Driverless AI
license key (Base64 encoded string)
• Via an application classpath:
– The license is loaded from a resource called /license.sig.
– The default resource name can be changed via the JVM system
property ai.h2o.mojos.runtime.license.filename.
For example:
$ java -Dai.h2o.mojos.runtime.license.file=/etc/dai/license.sig -cp mojo2-
runtime.jar ai.h2o.mojos.ExecuteMojo pipeline.mojo example.csv
The Driverless AI Scoring Pipelines | 95
To enable MOJO Scoring Pipelines for each experiment, stop Driverless AI, then
restart using the DRIVERLESS AI MAKE MOJO SCORING PIPELINE=1 flag.
(Refer to the Config.toml File section in the User Guide. for more information.)
For example:
nvidia-docker run \
--add-host name.node:172.16.2.186 \
-e DRIVERLESS_AI_MAKE_MOJO_SCORING_PIPELINE=1 \
-p 12345:12345 \
--init -it --rm \
-v /tmp/dtmp/:/tmp \
-v /tmp/dlog/:/log \
-u $(id -u):$(id -g) \
opsh2oai/h2oai-runtime
Or you can change the value of make mojo scoring pipeline to true
in the config.toml file and specify that file when restarting Driverless AI.
15.4.4 Quickstart
Before running the quickstart examples, be sure that the MOJO scoring pipeline
is already downloaded and unzipped:
1. On the completed Experiment page, click on the Download Scoring
Pipeline button to download the scorer.zip file for this experiment onto
your local machine.
96 | The Driverless AI Scoring Pipelines
import ai.h2o.mojos.runtime.MojoPipeline;
import ai.h2o.mojos.runtime.frame.MojoFrame;
The Driverless AI Scoring Pipelines | 97
import ai.h2o.mojos.runtime.frame.MojoFrameBuilder;
import ai.h2o.mojos.runtime.frame.MojoRowBuilder;
import ai.h2o.mojos.runtime.utils.SimpleCSV;
The C++ Scoring Pipeline is provided as R and Python packages for the
protobuf-based MOJO2 protocol. The packages are self contained, so no
98 | The Driverless AI Scoring Pipelines
additional software is required. Simply build the MOJO Scoring Pipeline and
begin using your preferred method. To download the MOJO Scoring Pipeline
onto your local machine, click the Download MOJO Scoring Pipeline button,
then click the same button again in the pop-up menu that appears. Refer to
the provided instructions for Java, Python, or R.
Notes:
• MOJOs are currently not available for TensorFlow, RuleFit, or FTRL
models.
• The Download MOJO Scoring Pipeline button appears as Build
MOJO Scoring Pipeline if the MOJO Scoring Pipeline is disabled.
Examples
The following examples show how to use the R and Python APIs of the C++
MOJO runtime.
R Example
Prerequisites
• methods
• Rcpp (≥1.0.0)
• data.table
# Load the MOJO
{.r}
library(daimojo)
m <- load.mojo("../data/dai/pipeline.mojo")
create.time(m)
## [1] "2018-12-17 22:00:24 UTC"
uuid(m)
## [1] "65875c15-943a-4bc0-a162-b8984fe8e50d"
Python Example
The Driverless AI Scoring Pipelines | 99
Prerequisites
• Python 3.6
• datatable. Run the following to install:
pip install https://s3.amazonaws.com/h2o-release/datatable/stable/
datatable-0.8.0/datatable-0.8.0-cp36-cp36m-linux_x86_64.whl
• Python MOJO runtime. Run the following after downloading from the
GUI (Note: For PowerPC, replace x86 64 with ppc64le):
pip install daimojo-2.0.1+master.478-cp36-cp36m-linux\_x86\_64.whl
’mitoses’]
res.stypes
(stype.float64, stype.float64)
16 Deployment
Driverless AI can deploy the MOJO scoring pipeline for you to test and/or to
integrate into a final product.
Notes:
• This section describes how to deploy a MOJO scoring pipeline and assumes
that a MOJO scoring pipeline exists.
The following AWS access permissions need to be provided to the role in order
for Driverless AI Lambda deployment to succeed.
• AWSLambdaFullAccess
• IAMFullAccess
• AmazonAPIGatewayAdministrator
The policy can be further stripped down to restrict Lambda and S3 rights using
the JSON policy definition as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"iam:GetPolicyVersion",
"iam:DeletePolicy",
"iam:CreateRole",
"iam:AttachRolePolicy",
"iam:ListInstanceProfilesForRole",
Deployment | 103
"iam:PassRole",
"iam:DetachRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:GetRole",
"iam:GetPolicy",
"iam:DeleteRole",
"iam:CreatePolicy",
"iam:ListPolicyVersions"
],
"Resource": [
"arn:aws:iam::*:role/h2oai*",
"arn:aws:iam::*:policy/h2oai*"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "apigateway:*",
"Resource": "*"
},
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": [
"lambda:CreateFunction",
"lambda:ListFunctions",
"lambda:InvokeFunction",
"lambda:GetFunction",
"lambda:UpdateFunctionConfiguration",
"lambda:DeleteFunctionConcurrency",
"lambda:RemovePermission",
"lambda:UpdateFunctionCode",
"lambda:AddPermission",
"lambda:ListVersionsByFunction",
"lambda:GetFunctionConfiguration",
"lambda:DeleteFunction",
"lambda:PutFunctionConcurrency",
"lambda:GetPolicy"
],
"Resource": "arn:aws:lambda:*:*:function:h2oai*"
},
{
"Sid": "VisualEditor3",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::h2oai*/*",
"arn:aws:s3:::h2oai*"
]
}
]
}
This option opens a new dialog for setting the AWS account credentials (or use
those supplied in the Driverless AI configuration file or environment variables),
AWS region, and the desired deployment name (which must be unique per
Driverless AI user and AWS account used).
Note that the actual scoring endpoint is located at the path /score. In addition,
to prevent DDoS and other malicious activities, the resulting AWS lambda is
protected by an API Key, i.e., a secret that has to be passed in as a part of the
request using the x-api-key HTTP header.
The request is a JSON object containing attributes:
• fields: A list of input column names that should correspond to the training
data columns.
• rows: A list of rows that are in turn lists of cell values to predict the
target values for.
• (optional) includeFieldsInOutput: A list of input columns that should
be included in the output.
An example request providing 2 columns on the input and asking to get one
column copied to the output looks as follows:
{
"fields": [
"age", "salary"
],
"includeFieldsInOutput": [
"salary"
],
"rows": [
[
"48.0", "15000.0"
],
[
"35.0", "35000.0"
],
[
"18.0", "22000.0"
]
]
}
Assuming the request is stored locally in a file named test.json, the request to
the endpoint can be sent, e.g., using the curl utility, as follows:
$ URL={place the endpoint URL here}
$ API_KEY={place the endpoint API key here}
$ curl \
-X POST \
-H "x-api-key: ${API_KEY}" \
-d @test.json ${URL}/score
The response is a JSON object with a single attribute “score“, which contains
the list of rows with the optional copied input values and the predictions.
For the example above with a two class target field, the result is likely to look
something like the following snippet. The particular values would of course
depend on the scoring pipeline:
106 | Deployment
{
"score": [
[
"48.0",
"0.6240277982943945",
"0.045458571508101536",
],
[
"35.0",
"0.7209441819603676",
"0.06299909138586585",
],
[
"18.0",
"0.7209441819603676",
"0.06299909138586585",
]
]
}
We create a new S3 bucket per AWS Lambda deployment. The bucket names
have to be unique throughout AWS S3, and one user can create a maximum
of 100 buckets. Therefore, we recommend setting the bucket name used for
deployment with the deployment aws bucket name config option.
16.4.1 Prerequisites
This option opens a new dialog for setting the REST Server deployment name,
port number, and maximum heap size (optional).
108 | Deployment
1. Specify a name for the REST scorer in order to help track the deployed
REST scorers.
2. Provide a port number on which the REST scorer will run. For example,
if port number 8081 is selected, the scorer will be available at http://my-
ip-address:8081/models
3. Optionally specify the maximum heap size for the Java Virtual Machine
(JVM) running the REST scorer. This can help constrain the REST scorer
from overconsuming memory of the machine. Because the REST scorer
is running on the same machine as Driverless AI, it may be helpful to
limit the amount of memory that is allocated to the REST scorer. This
option will limit the amount of memory the REST scorer can use, but it
will also produce an error if the memory allocated is not enough to run
the scorer. (The amount of memory required is mostly dependent on the
size of MOJO. See Prerequisites for more information.)
An example request providing 2 columns on the input and asking to get one
column copied to the output looks as follows:
{
"fields": [
"age", "salary"
],
"includeFieldsInOutput": [
"salary"
],
"rows": [
[
"48.0", "15000.0"
],
[
"35.0", "35000.0"
],
[
"18.0", "22000.0"
]
]
}
Assuming the request is stored locally in a file named test.json, the request
to the endpoint can be sent, e.g., using the curl utility, as follows:
URL={place the endpoint URL here}
curl \
-X POST \
-d {"fields": [’age’, ’salary’, ’education’], "rows": [1, 2, 3], "
includeFieldsInOutput": ["education"]}\
-H "Content-Type: application/json" \
${URL}/score
The response is a JSON object with a single attribute score, which contains
the list of rows with the optional copied input values and the predictions.
For the example above with a two class target field, the result is likely to look
something like the following snippet (the exact values depend on the scoring
pipeline):
{
"score": [
[
"48.0",
"0.6240277982943945",
"0.045458571508101536",
],
[
"35.0",
"0.7209441819603676",
"0.06299909138586585",
],
[
"18.0",
"0.7209441819603676",
"0.06299909138586585",
]
]
}
110 | About Driverless AI Transformations
When using Docker, local REST scorers are deployed within the same container
as Driverless AI. As a result, all REST scorers will be turned off if the Driverless
AI container is closed. When using native installs (rpm/deb/tar.sh), the REST
scorers will continue to run even if Driverless AI is shut down.
and then calculates the mean of the response column for each group.
The mean of the response for the bin is used as a new feature. Cross
Validation is used to calculate mean response to prevent overfitting.
• NumToCatWoEMonotonic Transformer: The Numeric to Categorical
Weight of Evidence Transformer converts a numeric column to categorical
by binning and then calculates Weight of Evidence for each bin. The
Weight of Evidence is used as a new feature. Weight of Evidence measures
the strength of a grouping for separating good and bad risk and is
calculated by taking the log of the ratio of distributions for a binary
response column.
• Original Transformer: The Original Transformer applies an identity
transformation to a numeric column.
• TruncSVDNum Transformer: Truncated SVD Transformer trains a
Truncated SVD model on selected numeric columns and uses the compo-
nents of the truncated SVD matrix as new features.
This only works with a binary target variable. The likelihood needs to be
created within a stratified kfold if a fit transform method is used. More
information can be found here: http://ucanalytics.com/blogs/
information-value-and-weight-of-evidencebanking-case/.
18 Logs
Driverless AI provides a number of logs that can be retrieved while visualizing
datasets, while an experiment is running, and after an experiment is completed.
When running Autovisualization, you can access the Autoviz logs by clicking
the Display Logs button on the Visualize Datasets page.
Logs | 115
This page presents logs created while the dataset visualization was being
performed. You can download the vis-data-server.log file by clicking the
Download Logs button on this page. This file can be used to troubleshoot
any issues encountered during dataset visualization.
While an Experiment is Running
While the experiment is running, you can access the logs by clicking on the
Log button on the experiment screen. The Log button can be found in the
CPU/Memory section. Clicking on the Log button will present the experiment
logs in real time. You can download these logs by clicking on the Download
Logs button in the upper right corner.
This will download a zip file which includes the following logs:
• h2oai experiment.log: This is the log corresponding to the experiment.
• h2oai experiment anonymized.log: This is the log corresponding to
the experiment where all data in the log is anonymized.
• h2oai server.log: Contains the logs for all experiments and all users.
• h2oai server anonymized.log: Contains the logs for all experiments
and all users where all data in the log is anonymized.
• h2o.log: This is the log corresponding to H2O-3. (H2O-3 is used
internally for parts of Driverless AI.)
For troubleshooting purposes, view the complete h2oai experiment.log or the
h2oai experiment anonymized.log.
The following additional information about your particular experiment will also
be included in the zip file:
• tuning leaderboard.txt: The results of the parameter tuning stage. This
contains the model parameters investigated and their performance.
• gene summary.txt: A summary of the feature transformations available
for each gene over the feature engineering iterations
• features.txt: The features used in the final Driverless AI model along
with feature importance and feature description
• details folder: Contains standard streams for each of the subprocesses
performed by Driverless A. This information is for debugging purposes
After Model Interpretation
You can view an MLI log for completed model interpretations by selecting the
Download MLI Logs link on the MLI page.
Logs | 117
This will download a zip file which includes the following logs:
• h2oai experiment mli key.log: This is the log corresponding to the model
interpretation.
• h2oai experiment mli key anonymized.log: This is the log corresponding
to the model interpretation where all data in the log is anonymized.
This file can be used to view logging information for successful interpretations.
If MLI fails, then those logs are in ./tmp/h2oai experiment mli key.log and
./tmp/h2oai experiment mli key anonymized.log.
You can also retrieve the h2oai experiment.log for the corresponding
experiment in the Driverless AI ./tmp folder.
MLI
19 References
1. L. Breiman. Statistical modeling: The two cultures (with comments
and a rejoinder by the author). Statistical Science, 16(3), 2001. URL
https://projecteuclid.org/euclid.ss/1009213726
20 Authors
Patrick Hall
Patrick Hall is senior director for data science products at H2O.ai where he
focuses mainly on model interpretability. Patrick is also currently an adjunct
professor in the Department of Decision Sciences at George Washington Univer-
sity, where he teaches graduate classes in data mining and machine learning.
Prior to joining H2O.ai, Patrick held global customer facing roles and research
and development roles at SAS Institute.
Follow him on Twitter: @jpatrickhall
120 | Authors
Megan Kurka
Megan is a customer data scientist at H2O.ai. Prior to working at H2O.ai,
she worked as a data scientist building products driven by machine learning for
B2B customers. Megan has experience working with customers across multiple
industries, identifying common problems, and designing robust and automated
solutions.
Angela Bartz
Angela is the doc whisperer at H2O.ai. With extensive experience in technical
communication, she brings our products to life by documenting the features and
functionality of the entire suite of H2O products. Having worked for companies
both large and small, she is an expert at understanding her audience and
translating complex ideas into consumable documents. Angela has a BA degree
in English from the University of Detroit Mercy.