0% found this document useful (0 votes)
669 views30 pages

Francesca Lazzeri - Machine Learning For Time Series Forecasting With Python-Wiley (2020) (177-206)

Uploaded by

Nelson Ruiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
669 views30 pages

Francesca Lazzeri - Machine Learning For Time Series Forecasting With Python-Wiley (2020) (177-206)

Uploaded by

Nelson Ruiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CHAPTER

6
Model Deployment for Time
Series Forecasting

Throughout the book, I introduced a few real-world data science scenarios that
I used to showcase some of the key time series concepts, steps, and codes. In
this final chapter, I will walk you through the process of building and deploy-
ing some of the time series forecasting solutions by employing some of these
use cases and data sets.
The purpose of this chapter is to provide a complete overview of tools to
build and deploy your own time series forecasting solutions by discussing the
following topics:
➤➤ Experimental Set Up and Introduction to Azure Machine Learning SDK for
Python – In this section, I will introduce Azure Machine Learning SDK
for Python to build and run machine learning workflows. You will get an
overview of some of the most important classes in the SDK and how you
can use them to build, train, and deploy a machine learning model on
Azure.
Specifically, in this section you will discover the following concepts and
assets:
■■ Workspace, which is a foundational resource in the cloud that you use
to experiment, train, and deploy machine learning models.
■■ Experiment, which is another foundational cloud resource that repre-
sents a collection of trials (individual model runs).

167
Machine Learning for Time Series Forecasting with Python®, First Edition. Francesca Lazzeri, PhD.
© 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc.
168 Chapter 6 ■ Model Deployment for Time Series Forecasting

■■ Run, which represents a single trial of an experiment.


■■ Model, which is used for working with cloud representations of the
machine learning model.
■■ ComputeTarget, RunConfiguration, ScriptRunConfig, which are ab-­
stract parent classes for creating and managing compute targets. A
compute target represents a variety of resources you can use to train
your machine learning models.
■■ Image, which is an abstract parent class for packaging models into
container images that include the runtime environment and
dependencies.
■■ Webservice, which is the abstract parent class for creating and deploy-
ing web services for your models.

■■ Machine Learning Model Deployment – In this section, we will talk more


about machine learning model deployment, that is, the method of inte-
grating a machine learning model into an existing production environment
in order to begin developing practical business decisions based on data.
Through machine learning model deployment, companies can begin to
take full advantage of the predictive and intelligent models they build
and, therefore, transform themselves into actual AI-driven businesses.
■■ Solution Architecture for Time Series Forecasting with Deployment Examples
- In this final section of the chapter, we will build, train, and deploy a
demand forecasting solution. I will demonstrate how to build an end-to-
end data pipeline architecture and deployment code that can be general-
ized for different time series forecasting solutions.

Experimental Set Up and Introduction to Azure Machine


Learning SDK for Python
Azure Machine Learning provides SDKs and services for data scientists and
developers to prepare data and train and deploy machine learning models. In
this chapter, we will use Azure Machine Learning SDK for Python (aka.ms/
AzureMLSDK) to build and run machine learning workflows.
The following sections are a summary of some of the most important classes
in the SDK that you can use to build your time series forecasting solution: you
can find all information about the classes below on the official website of Azure
Machine Learning SDK for Python.
Chapter 6 ■ Model Deployment for Time Series Forecasting 169

Workspace
The Workspace is a Python-based function that you can use to experiment, train,
and deploy machine learning models. You can import the class and create a
new workspace by using the following code:
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<your-azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)

It is recommended that you set create_resource_group to False if you have


a previously existing Azure resource group that you want to use for the work-
space. Some functions might prompt for Azure authentication credentials. For
more information on the Workspace class in Azure ML SDK for Python, visit
aka.ms/AzureMLSDK.

Experiment
The Experiment is a cloud resource that embodies a collection of trials (individual
model runs). The following code fetches an experiment object from within
Workspace by name, or it creates a new experiment object if the name does not
exist (aka.ms/AzureMLSDK):
from azureml.core.experiment import Experiment
experiment = Experiment(workspace=ws, name='test-experiment')

You can run the following code to get a list of all experiment objects contained
in Workspace, as shown in the code below:
list_experiments = Experiment.list(ws)

For more information on the Experiment class in Azure ML SDK for Python,
visit aka.ms/AzureMLSDK.

Run
The Run class represents a single trial of an experiment. A Run is an object that
you use to monitor the asynchronous execution of a trial, store the output of the
trial, analyze results, and access generated artifacts. You can use a Run inside
your experimentation code to log metrics and artifacts to the Run History ser-
vice (aka.ms/AzureMLSDK).
170 Chapter 6 ■ Model Deployment for Time Series Forecasting

In the following code, I show how to create a run object by submitting an


experiment object with a run configuration object:
tags = {"prod": "phase-1-model-tests"}
run = experiment.submit(config=your_config_object, tags=tags)

As you notice, you can use the tags parameter to attach custom categories
and labels to your runs. Moreover, you can use the static list function to get a
list of all run objects from an experiment. You need to specify the tags param-
eter to filter by your previously created tag:
from azureml.core.run import Run
filtered_list_runs = Run.list(experiment, tags=tags)

For more information on the Run class in Azure ML SDK for Python, visit
aka.ms/AzureMLSDK.

Model
The Model class is used for working with cloud representations of different
machine learning models. You can use model registration to store and ver-
sion your models in your workspace in the Azure cloud. Registered models
are identified by name and version. Each time you register a model with the
same name as an existing one, the registry increments the version (aka.ms/
AzureMLSDK).
The following example shows how to build a simple local classification model
with scikit-learn, register the model in the workspace, and download the model
from the cloud:
from sklearn import svm
import joblib
import numpy as np

# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]

clf = svm.SVC(gamma=0.001, C=100.)


clf.fit(X_train, y_train)

joblib.dump(value=clf, filename="churn-model.pkl")
Chapter 6 ■ Model Deployment for Time Series Forecasting 171

Moreover, you can use the register function to register the model in your
workspace:
from azureml.core.model import Model
model = Model.register(workspace=ws,
model_path="churn-model.pkl",
model_name="churn-model-test")

After you have a registered model, deploying it as a web service is a simple


process:
1. You need to create and register an image. This step configures the Python
environment and its dependencies.
2. As the second step, you need create an image.
3. Finally, you need to attach your image.
For more information on the Run class in Azure ML SDK for Python, visit
aka.ms/AzureMLSDK.

Compute Target, RunConfiguration, and ScriptRunConfig


The ComputeTarget class is a parent class for creating and managing compute
targets. A compute target represents a variety of resources where you can train
your machine learning models. A compute target can be either a local machine or
a cloud resource, such as Azure Machine Learning Compute, Azure HDInsight,
or a remote virtual machine (aka.ms/AzureMLSDK).
First of all, you need to set up an AmlCompute (child class of ComputeTar-
get) target. For the sample below, we can reuse the simple scikit-learn churn
model and build it into its own file, train.py, in the current directory (aka.ms/
AzureMLSDK). At the end of the file, we create a new directory called outputs to
store your trained model that joblib.dump() serialized:
# train.py
from sklearn import svm
import numpy as np
import joblib
import os

# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["cat", "dog", "dog", "dog", "cat", "cat", "cat", "dog",
"dog", "cat"]
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X_train, y_train)

Continues
172 Chapter 6 ■ Model Deployment for Time Series Forecasting

continued
os.makedirs("outputs", exist_ok=True)
joblib.dump(value=clf, filename="outputs/churn-model.pkl")

Next, you can create the compute target by instantiating a RunConfiguration


object and setting the type and size (aka.ms/AzureMLSDK):
from azureml.core.runconfig import RunConfiguration
from azureml.core.compute import AmlCompute
list_vms = AmlCompute.supported_vmsizes(workspace=ws)
compute_config = RunConfiguration()
compute_config.target = "amlcompute"
compute_config.amlcompute.vm_size = "STANDARD_D1_V2"

Now you are ready to submit the experiment by using the ScriptRunConfig
and specifying the config parameter of the submit() function:
from azureml.core.experiment import Experiment
from azureml.core import ScriptRunConfig
script_run_config = ScriptRunConfig(source_directory=os.getcwd(),
script="train.py", run_config=compute_config)
experiment = Experiment(workspace=ws, name="compute_target_test")
run = experiment.submit(config=script_run_config)

For more information on these classes in Azure ML SDK for Python, visit
aka.ms/AzureMLSDK.

Image and Webservice
The Image class is a parent class for packaging models into container images
that include the runtime environment and dependencies. The Webservice class
is another parent class for creating and deploying web services for your models
(aka.ms/AzureMLSDK).
The following code shows a basic example of creating an image and using it
to deploy a web service. The ContainerImage class extends an image and cre-
ates a Docker image.
from azureml.core.image import ContainerImage
image_config = ContainerImage.image_configuration(execution_script="score.
py",
runtime="python",
conda_file="myenv.yml",
description="test-image-config")

In this example, score.py processes the request/response for the web service.
The script defines two methods: init() and run().
image = ContainerImage.create(name="test-image",
models=[model],
Chapter 6 ■ Model Deployment for Time Series Forecasting 173

image_config=image_config,
workspace=ws)

To deploy the image as a web service, you first need to build a deployment
configuration, as shown in the following sample code:
from azureml.core.webservice import AciWebservice
deploy_config = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1)

After, you can use the deployment configuration to create a web service, as
shown in the sample code below:
from azureml.core.webservice import Webservice
service = Webservice.deploy_from_image(deployment_config=deploy_config,
image=image,
name=service_name,
workspace=ws
)
service.wait_for_deployment(show_output=True)

In this first part of this chapter, I introduced some of the most important
classes in the SDK (for more information, visit aka.ms/AzureMLSDK) and common
design patterns for using them. In the next section, we will look at the machine
learning deployment on Azure Machine Learning.

Machine Learning Model Deployment


Model deployment is the method of integrating a machine learning model into an
existing production environment in order to begin developing practical business
decisions based on data. It is only once models are deployed to production that
they start adding value, making deployment a crucial step (Lazzeri 2019c).
Model deployment is a fundamental step of the machine learning model
workflow (Figure 6.1). Through machine learning model deployment, companies
can begin to take full advantage of the predictive and intelligent models they
build and, therefore, transform themselves into actual data-driven businesses.
When we think about machine learning, we focus our attention on key compo-
nents such as data sources, data pipelines, how to test machine learning models
at the core of our machine learning application, how to engineer our features,
and which variables to use to make the models more accurate. All these steps
are important; however, thinking about how we are going to consume those
models and data over time is also a critical step in the machine learning pipeline.
We can only begin extracting real value and business benefits from a model’s
predictions when it has been deployed and operationalized.
174 Chapter 6 ■ Model Deployment for Time Series Forecasting

Train Model Data Scientist

Data Featurize Train Evaluate register


Prepare Data
Catalog

Model
Registry

Data Engineer

Release Model release

Data Lake
Package Validate Profile Approve Deploy

collect

ML Engineer
Figure 6.1: The machine learning model workflow

Successful model deployment is fundamental for data-driven enterprises for


the following key reasons:
■■ Deployment of machine learning models means making models available
to external customers and/or other teams and stakeholders in your company.
■■ When you deploy models, other teams in your company can use them,
send data to them, and get their predictions, which are in turn populated
back into the company systems to increase training data quality and quantity.
■■ Once this process is initiated, companies will start building and deploy-
ing higher numbers of machine learning models in production and master
robust and repeatable ways to move models from development environ-
ments into business operations systems (Lazzeri 2019c).
From an organizational perspective, many companies see AI-enablement as
a technical effort. However, it is more of a business-driven initiative that starts
within the company; in order to become an AI-driven company, it is important
that the people who successfully operate and understand the business today
are also the ones who are responsible for building and driving the machine
learning pipeline, from model training to model deployment and monitoring.
Right from the first day of a machine learning process, machine learning
teams should interact with business partners. It is essential to maintain constant
interaction to understand the model experimentation process parallel to the
model deployment and consumption steps. Most organizations struggle to
unlock machine learning’s potential to optimize their operational processes
and get data scientists, analysts, and business teams speaking the same lan-
guage (Lazzeri 2019c).
Chapter 6 ■ Model Deployment for Time Series Forecasting 175

Moreover, machine learning models must be trained on historical data, which


demands the creation of a prediction data pipeline, an activity requiring multiple
tasks including data processing, feature engineering, and tuning. Each task,
down to versions of libraries and handling of missing values, must be exactly
duplicated from the development to the production environment. Sometimes,
differences in technology used in development and in production contribute
to difficulties in deploying machine learning models.
Companies can use machine learning pipelines to create and manage work-
flows that stitch together machine learning phases. For example, a pipeline might
include data preparation, model training, model deployment, and inference/
scoring phases. Each phase can encompass multiple steps, each of which can
run unattended in various compute targets.

How to Select the Right Tools to Succeed with Model


Deployment
Current approaches of handcrafting machine learning models are too slow and
unproductive for companies intent on transforming their operations with AI.
Even after months of development, which delivers a model based on a single
algorithm, the management team has little means of knowing whether their data
scientists have created a great model and how to operationalize it (Lazzeri 2019c).
Below, I share a few guidelines on how a company can select the right tools
to succeed with model deployment. I will illustrate this workflow using Azure
Machine Learning Service, but it can be also used with any machine learning
product of your choice.
The model deployment workflow should be based on the following three
simple steps:
■■ Register the model.
■■ Prepare to deploy (specify assets, usage, compute target).
■■ Deploy the model to the compute target.
As we saw in the previous section, Model is the logical container for one or
more files that make up your model. For example, if you have a model that is
stored in multiple files, you can register them as a single model in the workspace.
After registration, you can then download or deploy the registered model and
receive all the files that were registered.
Machine learning models are registered when you create an Azure Machine
Learning workspace. The model can come from Azure Machine Learning or it
can come from somewhere else.
To deploy your model as a web service, you must create an inference config-
uration (InferenceConfig) and a deployment configuration. Inference, or model
scoring, is the phase where the deployed model is used for prediction, most com-
176 Chapter 6 ■ Model Deployment for Time Series Forecasting

monly on production data. In the inferenceConfig, you specify the scripts and
dependencies needed to serve your model. In the deployment configuration you
specify details of how to serve the model on the compute target (Lazzeri 2019c).
The entry script receives data submitted to a deployed web service and passes
it to the model. It then takes the response returned by the model and returns
that to the client. The script is specific to your model; it must understand the
data that the model expects and returns.
The script contains two functions that load and run the model:
■■ init() – Typically, this function loads the model into a global object. This
function is run only once when the Docker container for your web service
is started.
■■ run(input_data) – This function uses the model to predict a value based
on the input data. Inputs and outputs to the run typically use JSON for
serialization and de-serialization. You can also work with raw binary
data. You can transform the data before sending to the model or before
returning to the client.
When you register a model, you provide a model name used for managing
the model in the registry. You use this name with the Model.get_model_path()
to retrieve the path of the model file(s) on the local file system. If you register
a folder or a collection of files, this API returns the path to the directory that
contains those files.
Finally, before deploying, you must define the deployment configuration. The
deployment configuration is specific to the compute target that will host the
web service. For example, when deploying locally, you must specify the port
where the service accepts requests. The following compute targets, or compute
resources, can be used to host your web service deployment:
■■ Local web service and notebook virtual machine (VM) web service – Both com-
pute targets are used for testing and debugging. They are considered good
for limited testing and troubleshooting.
■■ Azure Kubernetes Service (AKS) – This compute target is used for real-time
inference. It is considered good for high-scale production deployments.
■■ Azure Container Instances (ACI) – This compute target is used for testing.
It is considered good for low-scale, CPU-based workloads requiring <48
GB RAM.
■■ Azure Machine Learning Compute – This compute target is used for batch
inference as it is able to run batch scoring on serverless compute targets.
■■ Azure IoT Edge – This is an IoT module, able to deploy and serve machine
learning models on IoT devices.
Chapter 6 ■ Model Deployment for Time Series Forecasting 177

■■ Azure Stack Edge – Developers and data scientists can use this compute
target via IoT Edge.
In this section, I introduced some common challenges of machine learning
model deployment, and we discussed why successful model deployment is
fundamental to unlock the full potential of AI, why companies struggle with
model deployment, and how to select the right tools to succeed with model
deployment.
Next, we will apply what you learned in the first two sections of this chapter
to a real demand forecasting use case.

Solution Architecture for Time Series Forecasting


with Deployment Examples
In this final section of this chapter, we will build, train, and deploy an energy
demand forecasting solution on Azure. For this specific use case, we will use data
from the GEFCom2014 energy forecasting competition. For more information,
please refer to “Probabilistic Energy Forecasting: Global Energy Forecasting
Competition 2014 and Beyond” (Tao Hong et al. 2016).
The raw data consists of rows and columns. Each measurement is repre-
sented as a single row of data. Each row of data includes multiple columns (also
referred to as features or fields). After identifying the required data sources,
we would like to ensure that the raw data that has been collected includes the
correct data features. To build a reliable demand forecast model, we would need
to ensure that the data collected includes data elements that can help predict
future demand. Here are some basic requirements concerning the data struc-
ture (schema) of the raw data:
■■ Time stamp – The time stamp field represents the actual time the measurement
was recorded. It should comply with one of the common date/time for-
mats. Both date and time parts should be included. In most cases there is
no need for the time to be recorded till the second level of granularity. It
is important to specify the time zone in which the data is recorded.
■■ Load – Hourly historical load data for the utility were provided. This is
the actual consumption at a given date/time. The consumption can be
measured in kWh (kilowatt-hour) or any other preferred unit. It is impor-
tant to note that the measurement unit must stay consistent across all
measurements in the data. In some cases, consumption can be supplied
over three power phases. In that case we would need to collect all the
independent consumption phases.
178 Chapter 6 ■ Model Deployment for Time Series Forecasting

■■ Temperature – Hourly historical temperature data for the utility were


provided. The temperature is typically collected from an independent
source. However, it should be compatible with the consumption data. It
should include a time stamp as described above that will allow it to be
synchronized with the actual consumption data. The temperature value
can be specified in degrees Celsius or Fahrenheit but should stay consis-
tent across all measurements.
The modeling phase is where the conversion of the data into a model takes
place. In the core of this process there are advanced algorithms that scan the his-
torical data (training data), extract patterns, and build a model. That model can
be later used to predict on new data that has not been used to build the model.

The Modeling and Scoring Process

X Y Error
Training
Y–Y

Historical Data Validation Data

Model

X ? X Y
Scoring

New Data Prediction Data

Figure 6.2: The modeling and scoring process

As can be seen from Figure 6.2, the historical data feeds the training module.
Historical data is structured where the independent feature is denoted as X and
the dependent (target) variable is denoted as Y. Both X and Y are produced dur-
ing the data preparation process. The training module consists of an algorithm
that scans the data and learns its features and patterns. The actual algorithm
is selected by the data scientist and should best match the type of the problem
we attempt to solve.
The training algorithms are usually categorized as regression (predict numeric
outcomes), classification (predict categorical outcomes), clustering (identify
groups), and forecasting. The training module generates the model as an object
that can be stored for future use. During training, we can also quantify the
prediction accuracy of the model by using validation data and measuring the
prediction error.
Chapter 6 ■ Model Deployment for Time Series Forecasting 179

Once we have a working model, we can then use it to score new data that is
structured to include the required features (X). The scoring process will make
use of the persisted model (object from the training phase) and predict the target
variable that is denoted by Ŷ.
In case of demand forecasting, we make use of historical data that is ordered
by time. We generally refer to data that includes the time dimension as time
series. The goal in time series modeling is to find time-related trends, seasonality,
and autocorrelation (correlation over time) and formulate those into a model. In
recent years, advanced algorithms have been developed to accommodate time
series forecasting and to improve forecasting accuracy.

Train and Deploy an ARIMA Model


In the next few sections, I will show how to build, train, and deploy an ARIMA
model for energy demand forecasting. Let’s start with the data setup: the data
in this example is taken from the GEFCom2014 forecasting competition. It con-
sists of three years of hourly electricity load and temperature values between
2012 and 2014.
Let’s import the necessary Python modules to get started:
# Import modules
import os
import shutil
import matplotlib.pyplot as plt
from common.utils import load_data, extract_data, download_file
%matplotlib inline

As a second step, you need to download the data and store it in a data folder:
data_dir = './data'

if not os.path.exists(data_dir):
os.mkdir(data_dir)

if not os.path.exists(os.path.join(data_dir, 'energy.csv')):


# Download and move the zip file
download_file("https://mlftsfwp.blob.core.windows.net/mlftsfwp/GEFCom2014.zip")
shutil.move("GEFCom2014.zip", os.path.join(data_dir,"GEFCom2014.zip"))
# If not done already, extract zipped data and save as csv
extract_data(data_dir)

Once you have completed the task above, you are ready to load the data from
CSV into a pandas DataFrame:
energy = load_data(data_dir)[['load']]
energy.head()
180 Chapter 6 ■ Model Deployment for Time Series Forecasting

This code will produce the output illustrated in Figure 6.3.

load
2012-01-01 00:00:00 2698.0

2012-01-01 01:00:00 2558.0

2012-01-01 02:00:00 2444.0

2012-01-01 03:00:00 2402.0

2012-01-01 04:00:00 2403.0

Figure 6.3: First few rows of the energy data set

In order to visualize our data set and make sure that all data was uploaded,
let’s first plot all available load data (January 2012 to December 2014):
energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
plt.xlabel('timestamp', fontsize=12)
plt.ylabel('load', fontsize=12)
plt.show()

The code above will output the plot shown in Figure 6.4.

load
5000

4500

4000
load

3500

3000

2500

2000

Jan 2 Jul Jan 3 Jul Jan 4 Jul Jan 5


1 1 1 1
20 20 20 20
time stamp

Figure 6.4: Load data set plot

In the preceding example (Figure 6.4), we plot the first column of our data
set (the time stamp is taken as an index of the DataFrame). If you want to print
another column, the variable column_to_plot can be adjusted.
Now let’s visualize a subsample of the data, by plotting the first week of July
2014:
Chapter 6 ■ Model Deployment for Time Series Forecasting 181

energy['7/1/2014':'7/7/2014'].plot
(y=column_to_plot, subplots=True,
figsize=(15, 8), fontsize=12)
plt.xlabel('timestamp', fontsize=12)
plt.ylabel(column_to_plot, fontsize=12)
plt.show(

This will create the plot with the data points from the first week of July 2014,
as shown in Figure 6.5.

5000 load

4500

4000
load

3500

3000

2500

01 l 02 03 04 05 06 07 08
Ju
14
20 time stamp

Figure 6.5: Load data set plot of the first week of July 2014

If you are able to run this notebook successfully and see all the visualizations,
you are ready to move to the training step. Let’s start with the configuration
part: At this point you need to set up your Azure Machine Learning services
workspace and configure your notebook library. For more information, visit
aka.ms/AzureMLConfiguration and follow the instructions in the notebook.
The training script executes a training experiment. Once the data is prepared,
you can train a model and see the results on Azure.
There are several steps to follow:
■■ Configure the workspace.
■■ Create an experiment.
■■ Create or attach a compute cluster.
■■ Upload the data to Azure.
■■ Create an estimator.
■■ Submit the work to the remote cluster.
■■ Register the model.
■■ Deploy the model.
182 Chapter 6 ■ Model Deployment for Time Series Forecasting

Let’s start with importing the Azure Machine Learning Python SDK library
and other modules and configuring the workspace.

Configure the Workspace
First of all, you need to import Azure Machine Learning Python SDK and other
Python modules that you will need for the training script:
import datetime as dt
import math
import os
import urllib.request
import warnings

import azureml.core
import azureml.dataprep as dprep
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from azureml.core import Experiment, Workspace
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.environment import Environment
from azureml.train.estimator import Estimator
from IPython.display import Image, display
from sklearn.preprocessing import MinMaxScaler
from statsmodels.tsa.statespace.sarimax import SARIMAX

get_ipython().run_line_magic("matplotlib", "inline")
pd.options.display.float_format = "{:,.2f}".format
np.set_printoptions(precision=2)
warnings.filterwarnings("ignore") # specify to ignore warning messages

For the second step, you need to configure your workspace. You can set up
your Azure Machine Learning (Azure ML) service (aka.ms/AzureMLservice)
workspace and configure your notebook library by running the following code:
# Configure the workspace, if no config file has been downloaded.
subscription_id = os.getenv("SUBSCRIPTION_ID", default="<Your Subcription ID>")
resource_group = os.getenv("RESOURCE_GROUP", default="<Your Resource Group>")
workspace_name = os.getenv("WORKSPACE_NAME", default="<Your Workspace Name>")
workspace_region = os.getenv
("WORKSPACE_REGION", default="<Your Workspace Region>")

try:
ws = Workspace(subscription_id = subscription_id,
resource_group = resource_group,
workspace_name = workspace_name)
ws.write_config()
print("Workspace configuration succeeded")
Chapter 6 ■ Model Deployment for Time Series Forecasting 183

except:
print("Workspace not accessible.
Change your parameters or create a new workspace below")

# Or take the configuration of the existing config.json file


ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

Make sure that you have the correct version of Azure ML SDK. If that’s not
the case, you can run the following code:
!pip install --upgrade azureml-sdk[automl,notebooks,explain]
!pip install --upgrade azuremlftk

Then configure your workspace and write the configuration to a config.json


file or read your config.json file to get your workspace. As a second option,
you can copy the config file from the Azure workspace in an azureml folder.
In an Azure workspace you will find the following items:
■■ Experiment results
■■ Trained models
■■ Compute targets
■■ Deployment containers
■■ Snapshots
For more information about the AML services workspace setup, visit aka.ms/
AzureMLConfiguration and follow the instructions in the notebook.

Create an Experiment
We now create an Azure Machine Learning experiment, which will help keep
track of the specific data used as well as the model training job logs. If the
experiment already exists on the selected workspace, the run will be added to
the existing experiment. If not, the experiment will be added to the workspace,
as shown in the following code:
experiment_name = 'energydemandforecasting'
exp = Experiment(workspace=ws, name=experiment_name)
184 Chapter 6 ■ Model Deployment for Time Series Forecasting

Create or Attach a Compute Cluster


At this point, you need to create or attach an existing compute cluster. For
training an ARIMA model, a CPU cluster is enough, as illustrated in the code
below. Note the min_nodes parameter is 0, meaning by default this will have
no machines in the cluster:
# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpucluster")

compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")

if compute_name in ws.compute_targets:
compute_target = ws.compute_targets[compute_name]
if compute_target and type(compute_target) is AmlCompute:
print('found compute target. just use it. ' + compute_name)
else:
print('creating a new compute target...')
provisioning_config =
AmlCompute.provisioning_configuration
(vm_size = vm_size,
min_nodes = compute_min_nodes,
max_nodes = compute_max_nodes)

# create the cluster


compute_target = ComputeTarget.create(ws,
compute_name, provisioning_config)

# can poll for a minimum number of nodes and for a specific timeout.
compute_target.wait_for_completion
(show_output=True, min_node_count=None,
timeout_in_minutes=20)

# For a more detailed view of current


# AmlCompute status, use 'get_status()'
print(compute_target.get_status().serialize())

Upload the Data to Azure


Now you need to make the data accessible remotely by uploading it from your
local machine into Azure. Then it can be accessed for remote training. The
datastore is a convenient construct associated with your workspace for you
to upload or download data. You can also interact with it from your remote
Chapter 6 ■ Model Deployment for Time Series Forecasting 185

compute targets. It’s backed by an Azure Blob storage account. The energy file
is uploaded into a directory named energy_data at the root of the datastore:
■■ First, you can download the GEFCom2014 data set and save the files into
a data directory locally, which can be done by executing the commented
lines in the cell. The data in this example is taken from the GEFCom2014
forecasting competition. It consists of three years of hourly electricity load
and temperature values between 2012 and 2014.
■■ Then, the data is uploaded to the default blob data storage attached to
your workspace. The energy file is uploaded into a directory named
energy_data at the root of the datastore. The upload of data must be run
only the first time. If you run it again, it will skip the uploading of files
already present on the datastore.

# save the files into a data directory locally


data_folder = './data'

#data_folder = os.path.join(os.getcwd(), 'data')


os.makedirs(data_folder, exist_ok=True)

# get the default datastore


ds = ws.get_default_datastore()
print(ds.name, ds.datastore_type, ds.account_name, ds.container_name,
sep='\n')

# upload the data


ds.upload(src_dir=data_folder,
target_path='energy_data',
overwrite=True,
show_progress=True)

ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

Now we need to create a training script:


# ## Training script
# This script will be given to the estimator
# which is configured in the AML training script.
# It is parameterized for training on `energy.csv` data.

#%% [markdown]
# ### Import packages.
# utils.py needs to be in the same directory as this script,
# i.e., in the source directory `energydemandforcasting`.

#%%
import argparse
Continues
186 Chapter 6 ■ Model Deployment for Time Series Forecasting

(continued)
import os
import numpy as np
import pandas as pd
import azureml.data
import pickle

from statsmodels.tsa.statespace.sarimax import SARIMAX


from sklearn.preprocessing import MinMaxScaler
from utils import load_data, mape
from azureml.core import Run

#%% [markdown]
# ### Parameters
# * COLUMN_OF_INTEREST: The column containing data that will be
forecasted.
# * NUMBER_OF_TRAINING_SAMPLES:
# The number of training samples that will be trained on.
# * ORDER:
# A tuple of three non-negative integers
# specifying the parameters p, d, q of an Arima(p,d,q) model,
# where:
# * p: number of time lags in autoregressive model,
# * d: the degree of differencing,
# * q: order of the moving avarage model.
# * SEASONAL_ORDER:
# A tuple of four non-negative integers
# where the first three numbers
# specify P, D, Q of the Arima terms
# of the seasonal component, as in ARIMA(p,d,q)(P,D,Q).
# The fourth integer specifies m,
# i.e, the number of periods in each season.

#%%
COLUMN_OF_INTEREST = 'load'
NUMBER_OF_TRAINING_SAMPLES = 2500
ORDER = (4, 1, 0)
SEASONAL_ORDER = (1, 1, 0, 24)

#%% [markdown]
# ### Import script arguments
# Here, Azure will read in the parameters, specified in the AML
training.

#%%
parser = argparse.ArgumentParser(description='Process input arguments')
parser.add_argument
('--data-folder',
default='./data/',
type=str,
Chapter 6 ■ Model Deployment for Time Series Forecasting 187

dest='data_folder')
parser.add_argument
('--filename',
default='energy.csv',
type=str,
dest='filename')
parser.add_argument('--output', default='outputs', type=str, dest='output')
args = parser.parse_args()
data_folder = args.data_folder
filename = args.filename
output = args.output
print('output', output)
#%% [markdown]
# ### Prepare data for training
# * Import data as pandas dataframe
# * Set index to datetime
# * Specify the part of the data that the model will be fitted on
# * Scale the data to the interval [0, 1]

#%%
# Import data
energy = load_data(os.path.join(data_folder, filename))
# As we are dealing with time series, the index can be set to datetime.
energy.index = pd.to_datetime(energy.index, infer_datetime_format=True)

# Specify the part of the data that the model will be fitted on.
train = energy.iloc[0:NUMBER_OF_TRAINING_SAMPLES, :]

# Scale the data to the interval [0, 1].


scaler = MinMaxScaler()
train[COLUMN_OF_INTEREST] =
scaler.fit_transform(np.array
(train.loc[:, COLUMN_OF_INTEREST].values).
reshape(-1, 1))
#%% [markdown]
# ### Fit the model

#%%
model = SARIMAX(endog=train[COLUMN_OF_INTEREST]
.tolist(),
order=ORDER,
seasonal_order=SEASONAL_ORDER)
model.fit()

#%% [markdown]
# ### Save the model
# The model will be saved on Azure in the specified directory as a pickle file.

#%%
# Create a directory on Azure in which the model will be saved.
os.makedirs(output, exist_ok=True)

Continues
188 Chapter 6 ■ Model Deployment for Time Series Forecasting

(continued)
# Write the the model as a .pkl file to the specified directory on Azure.
with open(output + '/arimamodel.pkl', 'wb') as m:
pickle.dump(model, m)

# with open('arimamodel.pkl', 'wb') as m:


# pickle.dump(model, m)

#%%

Create an Estimator
Let’s see now how to create an estimator. In order to start this process, we need to
create some parameters. The following parameters will be given to the estimator:
■■ source_directory: the directory that will be uploaded to Azure and con-
tains the script train.py
■■ entry_script: the script that will be executed (train.py)
■■ script_params: the parameters that will be given to the entry script
■■ compute_target: the compute cluster that was created above
■■ conda_dependencies_file: the packages in your conda environment that
the script needs.

script_params = {
"--data-folder": ds.path("energy_data").as_mount(),
"--filename": "energy.csv",
}
script_folder = os.path.join(os.getcwd(), "energydemandforecasting")

est = Estimator(
source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script="train.py",
conda_dependencies_file="azureml-env.yml",
)

Submit the Job to the Remote Cluster


You can create and manage a compute target using the Azure Machine Learning
SDK, Azure portal, Azure CLI, or Azure Machine Learning VS Code extension. The
following sample code shows you how to submit your work to the remote cluster:
run = exp.submit(config=est)

# specify show_output to True for a verbose log


run.wait_for_completion(show_output=False)
Chapter 6 ■ Model Deployment for Time Series Forecasting 189

Register the Model
The last step in the training script wrote the file outputs/arima_model.pkl in a
directory named outputs in the VM of the cluster where the job is run. Outputs is
a special directory in that all content in this directory is automatically uploaded
to your workspace. This content appears in the run record in the experiment
under your workspace. So the model file is now also available in your workspace.
You can also see files associated with that run. As a last step, we register the
model in the workspace, which saves it under Models on Azure, so that you
and other collaborators can later query, examine, and deploy this model. By
registering the model, it is now available on your workspace:
# see files associated with that run
print(run.get_file_names())

# register model
model = run.register_model(model_name='arimamodel',
model_path='outputs/arimamodel.pkl')

Deployment
Once we have nailed down the modeling phase and validated the model
performance, we are ready to go into the deployment phase. In this context,
deployment means enabling the customer to consume the model by running
actual predictions on it at large scale. The concept of deployment is key in Azure
ML since our main goal is to constantly invoke predictions as opposed to just
obtaining the insight from the data. The deployment phase is the part where
we enable the model to be consumed at large scale.
Within the context of energy demand forecast, our aim is to invoke contin-
uous and periodical forecasts while ensuring that fresh data is available for the
model and that the forecasted data is sent back to the consuming client.
The main deployable building block in Azure ML is the web service. This
is the most effective way to enable consumption of a predictive model in the
cloud. The web service encapsulates the model and wraps it up with a REST
API (application programming interface). The API can be used as part of any
client code, as illustrated in Figure 6.6.
The web service is deployed on the cloud and can be invoked over its exposed
REST API endpoint, which you can see in Figure 6.6. Different types of clients
across various domains can invoke the service through the Web API simulta-
neously. The web service can also scale to support thousands of concurrent calls.
Deploying the model requires the following components:
■■ An entry script. This script accepts requests, scores the request using the
model, and returns the results.
190 Chapter 6 ■ Model Deployment for Time Series Forecasting

Web Service

Service
REST API Endpoint Azure ML
Experiment

Figure 6.6: Web service deployment and consumption

■■ Dependencies, such as helper scripts or Python/Conda packages required


to run the entry script or model.
■■ The deployment configuration for the compute target that hosts the deployed
model. This configuration describes things like memory and CPU require-
ments needed to run the model.
These entities are encapsulated into an inference configuration and a deploy-
ment configuration. The inference configuration references the entry script and
other dependencies. These configurations are defined programmatically when
using the SDK and as JSON files when using the CLI to perform the deployment.

Define Your Entry Script and Dependencies


The entry script receives data submitted to a deployed web service and passes
it to the model. It then takes the response returned by the model and returns
that to the client. The script is specific to your model; it must understand the
data that the model expects and returns.
The script contains two functions that load and run the model: the init() and
run(input_data) functions. When you register a model, you provide a model
name used for managing the model in the registry. You use this name with the
Model.get_model_path() to retrieve the path of the model file(s) on the local file
system. If you register a folder or a collection of files, this API returns the path
to the directory that contains those files.
When you register a model, you give it a name which corresponds to where
the model is placed, either locally or during service deployment. The follow-
ing example will return a path to a single file called sklearn_mnist_model.pkl
(which was registered with the name sklearn_mnist):
model_path = Model.get_model_path('sklearn_mnist')
Chapter 6 ■ Model Deployment for Time Series Forecasting 191

Automatic Schema Generation


To automatically generate a schema for your web service, you need to provide
a sample of the input and output in the constructor for one of the defined type
objects, and the type and sample are used to automatically create the schema
(aka.ms/ModelDeployment).
To use schema generation, include the inference-schema package in your Conda
environment file. After this, you need to define the input and output sample for-
mats in the input_sample and output_sample variables. The following example
demonstrates how to accept and return JSON data for our energy demand fore-
casting solution. First, the workspace that was used for training must be retrieved:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

We already registered the model in the training script. But if the model you
want to use is only saved locally, you can uncomment and run the following cell
that will register your model in the workspace. Parameters may need adjustment:
# model = Model.register(model_path = "path_of_your_model",
# model_name = "name_of_your_model",
# tags = {'type': "Time series ARIMA model"},
# description = "Time series ARIMA model",
# workspace = ws)

# get the already registered model


model = Model.list(ws, name='arimamodel')[0]
print(model)

We now need to get or register an environment for our model deployment


(aka.ms/ModelDeployment). Since, in our example, we already registered the
environment in the training script, we can just retrieve it:
my_azureml_env = Environment.get(workspace=ws, name="my_azureml_env")

inference_config = InferenceConfig(
entry_script="energydemandforecasting/score.py", environment=my_
azureml_env
)

After this, the deployment configuration can be arranged, as illustrated in


the sample code below:
# Set deployment configuration
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

aci_service_name = "aci-service-arima"
192 Chapter 6 ■ Model Deployment for Time Series Forecasting

Finally, the deployment configuration and web service name and location to
deploy can be defined, as illustrated in the sample code below:
# Define the web service
service = Model.deploy(
workspace=ws,
name=aci_service_name,
models=[model],
inference_config=inference_config,
deployment_config=deployment_config,
)
service.wait_for_deployment(True)

Below is an overview of the code in the scoring file, named score.py:


### score.py
#### Import packages
import pickle
import json
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

from azureml.core.model import Model

MODEL_NAME = 'arimamodel'
DATA_NAME = 'energy'
DATA_COLUMN_NAME = 'load'
NUMBER_OF_TRAINING_SAMPLES = 2500
HORIZON = 10

#### Init function


def init():
global model
model_path = Model.get_model_path(MODEL_NAME)
# deserialize the model file back into a sklearn model
with open(model_path, 'rb') as m:
model = pickle.load(m)

#### Run function


def run(energy):
try:
# load data as pandas dataframe from the json object.
energy = pd.DataFrame(json.loads(energy)[DATA_NAME])
# take the training samples
energy = energy.iloc[0:NUMBER_OF_TRAINING_SAMPLES, :]

scaler = MinMaxScaler()
energy[DATA_COLUMN_NAME] =
scaler.fit_transform
Chapter 6 ■ Model Deployment for Time Series Forecasting 193

(energy[[DATA_COLUMN_NAME]])
model_fit = model.fit()

prediction = model_fit.forecast(steps = HORIZON)


prediction = pd.Series.to_json
(pd.DataFrame(prediction),
date_format='iso')

# you can return any data type as long as it is JSON-serializable


return prediction
except Exception as e:
error = str(e)
return error

Before deploying, you must define the deployment configuration. The deploy-
ment configuration is specific to the compute target that will host the web ser-
vice. The deployment configuration is not part of your entry script. It is used
to define the characteristics of the compute target that will host the model and
entry script (aka.ms/ModelDeployment). You may also need to create the com-
pute resource—for example, if you do not already have an Azure Kubernetes
Service associated with your workspace.
Table 6.1 provides an example of creating a deployment configuration for
each compute target:

Table 6.1: Creating a deployment configuration for each compute target

COMPUTE TARGET DEPLOYMENT CONFIGURATION EXAMPLE


Number of data points deployment_config = LocalWebservice.
deploy_configuration(port=8890)
Azure Container Instances deployment_config = AciWebservice.deploy_
configuration(cpu_cores = 1, memory_gb = 1)
Azure Kubernetes Service deployment_config = AksWebservice.deploy_
configuration(cpu_cores = 1, memory_gb = 1)

For this specific example, we are going to create an Azure Container Instances
(ACI), which is typically used when you need to quickly deploy and validate
your model or you are testing a model that is under development.
First, you must configure the service with the number of CPU cores, the size of
the memory, and other parameters like the description. Then, you must deploy
the service from the image.
You can deploy the service only once. If you want to deploy it again, change
the name of the service or delete the existing service directly on Azure:
# load the data to use for testing and encode it in json
energy_pd = load_data('./data/energy.csv')
energy = pd.DataFrame.to_json(energy_pd, date_format='iso')
Continues
194 Chapter 6 ■ Model Deployment for Time Series Forecasting

(continued)
energy = json.loads(energy)
energy = json.dumps({"energy":energy})

# Call the service to get the prediction for this time series
prediction = aci_service.run(energy)

If you want, at this final step, you can plot the prediction results. The follow-
ing sample will help you achieve the following three tasks:
■■ Convert the prediction to a DataFrame containing correct indices and
columns.
■■ Scale the original data as in the training.
■■ Plot the original data and the prediction.

# prediction is a string, convert it to a dictionary


prediction = ast.literal_eval(prediction)

# convert the dictionary to pandas dataframe


prediction_df = pd.DataFrame.from_dict(prediction)

prediction_df.columns=['load']
prediction_df.index = energy_pd.iloc[2500:2510].index)

# Scale the original data


scaler = MinMaxScaler()
energy_pd['load'] = scaler.fit_transform
(np.array(energy_pd.loc[:, 'load'].values).
reshape(-1, 1))

# Visualize a part of the data before the forecasting


original_data = energy_pd.iloc[1500:2501]

# Plot the forecasted data points


fig = plt.figure(figsize=(15, 8))

plt.plot_date(x=original_data.index,
y=original_data, fmt='-',
xdate=True, label="original load", color='red')
plt.plot_date(x=prediction_df.index, y=prediction_df, fmt='-',
xdate=True, label="predicted load", color='yellow')

When deploying an energy demand forecasting solution, we are interested in


deploying an end-to-end solution that goes beyond the prediction web service
and facilitates the entire data flow. At the time we invoke a new forecast, we
would need to make sure that the model is fed with the up-to-date data features.
That implies that the newly collected raw data is constantly ingested, processed,
and transformed into the required feature set on which the model was built.
Chapter 6 ■ Model Deployment for Time Series Forecasting 195

At the same time, we would like to make the forecasted data available for
the end consuming clients. An example data flow cycle (or data pipeline) is
illustrated in Figure 6.7.

Energy Demand Forecast End-to-End Data Flow

Feature
Data Collection Raw Data Aggregation
Processing
1 2 3 4

Power Grid Cloud

Consumption Scoring Processed Re-training


Forecast
Client Web Service Data Web Service
Data
8 7 6 5

Figure 6.7: Energy demand forecast end-to-end data flow

These are the steps that take place as part of the energy demand forecast cycle:
1. Millions of deployed data meters are constantly generating power con-
sumption data in real time.
2. This data is being collected and uploaded onto a cloud repository (such
as Azure Storage).
3. Before being processed, the raw data is aggregated to a substation or
regional level as defined by the business.
4. The feature processing then takes place and produces the data that is
required for model training or scoring—the feature set data is stored in
a database (such as Azure SQL Database).
5. The retraining service is invoked to retrain the forecasting model. That
updated version of the model is persisted so that it can be used by the
scoring web service.
6. The scoring web service is invoked on a schedule that fits the required
forecast frequency.
7. The forecasted data is stored in a database that can be accessed by the end
consumption client.
8. The consumption client retrieves the forecasts and applies it back into the
grid and consumes it in accordance with the required use case.
It is important to note that this entire cycle is fully automated and runs on
a schedule.
196 Chapter 6 ■ Model Deployment for Time Series Forecasting

Conclusion
In this final chapter, we looked closely into the process of building and deploying
some of the time series forecasting solutions. Specifically, this chapter provided
a complete overview of tools to build and deploy your own time series fore-
casting solutions:
■■ Experimental Set Up and Introduction to Azure Machine Learning SDK for
Python – I introduced Azure Machine Learning SDK for Python to build
and run machine learning workflows, and you learned the following
concepts and assets:
■■ Workspace, which is foundational resource in the cloud that you use
to experiment, train, and deploy machine learning models.
■■ Experiment, which is another foundational cloud resource that repre-
sents a collection of trials (individual model runs).
■■ Run, which represents a single trial of an experiment.
■■ Model, which is used for working with cloud representations of the
machine learning model.
■■ ComputeTarget, RunConfiguration, ScriptRunConfig, which are ab­-
stract parent classes for creating and managing compute targets. A
compute target represents a variety of resources where you can train
your machine learning models.
■■ Image, which is abstract parent class for packaging models into con-
tainer images that include the runtime environment and
dependencies.
■■ Webservice, which is the abstract parent class for creating and deploy-
ing web services for your models.
■■ Machine Learning Model Deployment – This section introduced the machine
learning model deployment, that is, the method of integrating a machine
learning model into an existing production environment in order to begin
developing practical business decisions based on data.
■■ Solution Architecture for Time Series Forecasting with Deployment Examples – In
this final section, we built, trained, and deployed an end-to-end data
pipeline architecture and walked through the deployment code and
examples.

You might also like