Forecast ML
Forecast ML
Nickalus Redell
2020-04-19
Purpose
The purpose of forecastML is to provide a series of functions and visualizations that simplify
the process of multi-step-ahead forecasting with standard machine learning
algorithms. It’s a wrapper package aimed at providing maximum flexibility in model-
building–choose any machine learning algorithm from any R or Python package–while
helping the user quickly assess the (a) accuracy, (b) stability, and (c) generalizability of
grouped (i.e., multiple related time series) and ungrouped forecasts produced from potentially high-dimensional
modeling datasets.
This package is inspired by Bergmeir, Hyndman, and Koo’s 2018 paper A note on the validity of cross-validation
for evaluating autoregressive time series prediction. which supports–under certain conditions–forecasting with
high-dimensional ML models without having to use methods that are time series specific.
The following quote from Bergmeir et al.’s article nicely sums up the aim of this package:
More information, cheat sheets, and worked examples can be found at https://github.com/nredell/forecastML.
Direct Forecasting
In contrast to the recursive or iterated method for producing multi-step-ahead forecasts used in traditional
forecasting methods like ARIMA, direct forecasting involves creating a series of distinct horizon-specific models.
Though several hybrid methods exist for producing multi-step forecasts, the simple direct forecasting method
used in forecastML lets us avoid the exponentially more difficult problem of having to “predict the predictors” for
forecast horizons beyond 1-step-ahead.
The direct forecasting approach used in forecastML involves the following steps:
Below is a plot of 5 forecast models used to produce a single 12-step-ahead forecast where each color
represents a distinct horizon-specific ML model. From left to right these models are:
1: A feed-forward neural network (purple); 2: An ensemble of ML models; 3: A boosted tree model; 4: A LASSO
regression model; 5: A LASSO regression model (yellow).
Multi-Output Forecasting
The multi-output forecasting approach used in forecastML involves the following steps:
1. Build a single multi-output model that simultaneously forecasts over both short- and long-term forecast
horizons.
2. Assess model generalization performance across a variety of heldout datasets through time.
3. Select the hyperparameters that minimize forecast error over the relevant forecast horizons and re-train.
Key Functions
1. fill_gaps: Optional if no temporal gaps/missing rows in data collection. Fill gaps in data collection and
prepare a dataset of evenly-spaced time series for modeling with lagged features. Returns a ‘data.frame’
with missing rows added in so that you can either (a) impute, remove, or ignore NAs prior to the forecastML
pipeline or (b) impute, remove, or ignore them in the user-defined modeling function–depending on the NA
handling capabilities of the user-specified model.
2. create_lagged_df: Create model training and forecasting datasets with lagged, grouped, dynamic, and
static features.
3. create_windows: Create time-contiguous validation datasets for model evaluation.
4. train_model: Train the user-defined model across forecast horizons and validation datasets.
5. return_error: Compute forecast error across forecast horizons and validation datasets.
6. return_hyper: Return user-defined model hyperparameters across validation datasets.
7. combine_forecasts: Combine multiple horizon-specific forecast models to produce one forecast.
Install forecastML
install.packages("forecastML")
library(forecastML)
library(dplyr)
library(DT)
library(ggplot2)
library(glmnet)
library(randomForest)
data("data_seatbelts", package = "forecastML")
data <- data_seatbelts
# The date indices, which don't come with the stock dataset, should not be included in the modeling
data.frame.
dates <- seq(as.Date("1969-01-01"), as.Date("1984-12-01"), by = date_frequency)
2 97 7685 0.102 0
4 87 10955 0.101 0
Train-Test Split
We’ll build our models on data_train and evaluate their out-of-sample performance on data_test.
forecastML::create_lagged_df
We’ll create a list of datasets for model training, one for each forecast horizon.
horizons <- c(1, 3, 6, 12) # 4 models that forecast 1, 1:3, 1:6, and 1:12 time steps ahead.
# A lookback across select time steps in the past. Feature lags 1 through 9, for instance, will be
# silently dropped from the 12-step-ahead model.
lookback <- c(1:6, 9, 12, 15)
# A non-lagged feature that changes through time whose value we either know (e.g., month) or whose
# value we would like to forecast.
dynamic_features <- "law"
Let’s view the modeling dataset for a forecast horizon of 6. Notice that “lag” has been appended to all lagged
features. Dynamic features keep their original names.
The plot below illustrates, for a given lagged feature, the number and position (in dataset rows) of lagged
features created for each forecast horizon/model. The lookback argument in created_lagged_df() was set to
create lagged features from a minimum of 1 lag to a maximum of 15 lags; however, feature lags that don’t
support direct forecasting at a given forecast horizon are silently removed from the modeling dataset.
plot(data_list)
forecastML::create_windows
create_windows() creates indices for partitioning the training dataset in the outer loop of a nested cross-validation
setup. The validation datasets are created in contiguous blocks of window_length, as opposed to randomly
selected rows, to mimic forecasting over multi-step-ahead forecast horizons. The skip, window_start, and
window_stop arguments take dataset indices–or dates if a vector of dates is supplied to create_lagged_df()–that
allow the user to adjust the number and placement of outer loop validation datasets.
windows <- forecastML::create_windows(lagged_df = data_list, window_length = 12, skip = 48,
window_start = NULL, window_stop = NULL,
include_partial_window = TRUE)
windows
Below is a plot of the nested cross-validation outer loop datasets or windows. In our example, a window_length of
12 (months) resulted in 3 validation windows.
In this nested cross-validation setup, a model is trained with data from 2 windows and forecast accuracy is
assessed on the left-out window. This means that we’ll need to train 3 models for each direct forecast horizon,
each potentially selecting different optimal hyperparameters and having different coefficients–if available–from
the inner cross-validation loop. Assessing the differences between these models is a good way to determine the
stability of a given modeling approach under various time series dynamics.
After model training and exploration, it’s entirely possible that a single multi-step-ahead forecast may use
different ML algorithms (e.g., a neural network for shorter horizons and linear regression for
longer horizons) to produce the short- and long-term forecasts.
Model Training
We’ll compare the forecasting performance of two models: (a) a cross-validated LASSO and (b) a non-tuned
Random Forest. The following user-defined functions are needed for each model:
A user-defined wrapper function for model training that takes the following arguments:
1: A horizon-specific data.frame made with create_lagged_df(..., type = "train") (e.g.,
my_lagged_df$horizon_h),
2: optionally, any number of additional named arguments which can be passed as ‘…’ in
train_model() or set with default arguments in the model function.
and returns a model object that will be passed into the user-defined predict() function.
Any data transformations, hyperparameter tuning, or inner loop cross-validation procedures should take place
within this function, with the limitation that it ultimately needs to return() a model suitable for the user-defined
predict() function; a list can be returned to capture meta-data and data pre-processing pipelines.
# Example 1 - LASSO
# Alternatively, we could define an outcome column identifier argument, say, 'outcome_col = 1' in
# this function or just 'outcome_col' and then set the argument as 'outcome_col = 1' in train_model().
model_function <- function(data) {
# The 'law' feature is constant during some of our outer-loop validation datasets so we'll
# simply drop it so that glmnet converges.
constant_features <- which(unlist(lapply(data[, -1], function(x) {!(length(unique(x)) > 1)})))
if (length(constant_features) > 1) {
data <- data[, -c(constant_features + 1)] # +1 because we're skipping over the outcome column.
}
forecastML::train_model
For each modeling approach, LASSO and Random Forest, a total of N forecast horizons * N validation windows
models are trained. In this example, that means training 12 models for each algorithm.
These models could be trained in parallel on any OS with the very flexible future package by un-commenting the
code below and setting use_future = TRUE. To avoid nested parallelization, models are either trained in parallel
across forecast horizons or validation windows, whichever is longer (when equal, the default is parallel across
forecast horizons).
#future::plan(future::multiprocess)
# Example 1 - LASSO.
prediction_function <- function(model, data_features) {
The predict.forecast_model() S3 method takes any number of trained models from train_model() and a list of
user-defined prediction functions. The list of prediction functions should appear in the same order as the models.
Note that the prediction_function and data arguments need to be named because the first function argument is
....
Outer loop nested cross-validation forecasts are returned for each user-defined model, forecast horizon, and
validation window.
1 LASSO 1 12 1 16
2 LASSO 1 12 1 17
3 LASSO 1 12 1 18
4 LASSO 1 12 1 19
5 LASSO 1 12 1 20
6 LASSO 1 12 1 21
7 LASSO 1 12 1 22
8 LASSO 1 12 1 23
9 LASSO 1 12 1 24
10 LASSO 1 12 1 25
Below is a plot of the historical forecasts for each validation window at select forecast horizons.
Below is a plot of the historical forecast error for select validation windows at select forecast horizons.
Model Performance
forecastML::return_error
Let’s calculate several common forecast error metrics for our holdout data sets in the training data.
The forecast errors for nested cross-validation are returned at 3 levels of granularity:
# Global error.
data_error$error_global[, -1] <- lapply(data_error$error_global[, -1], round, 1)
Below is a plot of error metrics across time for select validation windows and forecast horizons.
Below is a plot of forecast error metrics by forecast model horizon collapsed across validation windows.
Hyperparameters
While it may be reasonable to have distinct models for each forecast horizon or even forecasting model
ensembles across horizons, at this point we still have slightly different LASSO and Random Forest models from
the outer loop of the nested cross-validation within each horizon-specific model. Here, we’ll take a look at the
stability of the hyperparameters for the LASSO model to better understand if we can train one model across
forecast horizons or if we need additional predictors or modeling strategies to forecast well under various
conditions or time series dynamics.
forecastML::return_hyper
Below are two plots which show (a) univariate hyperparameter variability across the training data and (b) the
relationship between each error metric and hyperparameter values.
forecastML::create_lagged_df
To forecast with the direct forecasting method, we need to create another dataset of forward-looking features. We
can do this by running create_lagged_df() and setting type = "forecast".
Below is the forecast dataset for a 6-step-ahead forecast.
The forecast dataset has the following columns:
index: A column giving the row index or date of the forecast periods (e.g., a 100 row non-date-based
training dataset would start with an index of 101).
horizon: A column that indicates the forecast period from 1:max(horizons).
“features”: Lagged, dynamic, group, and static features identical to the type = "train", dataset.
1 1984-01-01 1 60 89 120
2 1984-02-01 2 84 82 95
4 1984-04-01 4 126 60 89
5 1984-05-01 5 122 84 82
6 1984-06-01 6 118 113 89
Dynamic features
Because we didn’t treat law as a lagged feature, we’ll have to fill in its future values when direct forecasting 1, 3,
6, and 12 steps ahead. In this example, we know that law <- 1 for the next 12 months. If we did not know the
future values of law we would either have to use a class of models that can predict with missing features or
forecast the value of law 1:12 months ahead.
for (i in seq_along(data_forecast_list)) {
data_forecast_list[[i]]$law <- 1
}
Forecast results
Running the predict method, predict.forecast_model(), on the dataset created above–with type = "forecast"–
and placing it in the data argument in predict.forecast_model() below, returns a data.frame of forecasts.
An S3 object of class, forecast_results, is returned. This object will have different plotting and error methods
than the training_results class from earlier.
1 LASSO 1 1 12 1 1984-01-01
2 LASSO 1 1 12 2 1984-01-01
3 LASSO 1 1 12 3 1984-01-01
4 LASSO 3 1 12 1 1984-01-01
5 LASSO 3 2 12 1 1984-02-01
6 LASSO 3 3 12 1 1984-03-01
7 LASSO 3 1 12 2 1984-01-01
8 LASSO 3 2 12 2 1984-02-01
9 LASSO 3 3 12 2 1984-03-01
10 LASSO 3 1 12 3 1984-01-01
plot(data_forecast,
data_actual = data[-(1:150), ], # Actuals from the training and test data sets.
actual_indices = dates[-(1:150)],
horizons = c(1, 6, 12))
Forecast Error
forecastML::return_error
Finally, we’ll look at our out-of-sample forecast error by forecast horizon for our two models by setting data_test
= data_test.
If the first argument of return_error() is an object of class forecast_results and the data_test argument is a
data.frame like data_test from our beginning train-test split, a data.frame of forecast error metrics with the
following columns is returned:
model: User-supplied model name in train_model().
model_forecast_horizon: The direct-forecasting time horizon that the model was trained on.
“error_metrics”: Forecast error metrics.
Because our LASSO model is both more stable and accurate, we’ll re-train this model across the entire training
dataset to get our final 4 models–1 for each forecast horizon. Note that for a real-world forecasting problem this
is when we would do additional model tuning to improve forecast accuracy across validation windows as well as
narrow the hyperparameter search in the user-specified modeling functions.
1 LASSO 1 0 1 16
2 LASSO 1 0 1 17
3 LASSO 1 0 1 18
4 LASSO 1 0 1 19
5 LASSO 1 0 1 20
6 LASSO 1 0 1 21
7 LASSO 1 0 1 22
8 LASSO 1 0 1 23
9 LASSO 1 0 1 24
10 LASSO 1 0 1 25
Below is a the training error collapsed across our 4 direct forecast horizons/models.
for (i in seq_along(data_forecast_list)) {
data_forecast_list[[i]]$law <- 1
}
The final step in the forecastML framework is to combine multiple direct-horizon forecast models with
combine_forecasts() to produce a single h-step-ahead forecast.
The default approach, type = 'horizon', is to combine forecasts across models such that short-term
models produce the shorter-term forecasts and long-term models produce the longer-term forecasts. This
implies that, for our 12-month-ahead forecast,
the 1-step-ahead model forecasts the next month,
the 3-step-ahead model forecasts from months 2 through 3,
the 6-step-ahead model forecasts from months 4 through 6, and
the 12-step-ahead model forecasts from months 7 through 12.