Chapter 9
Chapter 9
1
See Reference [1] for more information on the Gartner Top Strategic Technology
Trends for 2022.
196
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
AI Engineering
2
See References [2] and[3] for more information on reference models of the AI
lifecycle.
3
See Reference [4] for more information on service-level requirements.
197
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
198
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
199
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
200
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
Hyper-
Model
parameter
selection
tuning
Feature Model
engineering evaluation
ModelOps
Prepare Model
Data Data data deployment Code Test
cleanse governance
Model Model
DataOps governance monitoring DevOps
Data Data Operate Build
quality discovery
Database
Database
Data Database Deploy
enrichment
4
Please review Chapter 5.
5
See Reference [5] for more information on DataOps.
201
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
6
See Reference [6] for more information on the difference between ModelOps
and MLOps.
202
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
203
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
204
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
7
See Reference [7] for more information on the CDO 2021 study.
8
See Chapter 8.
9
See Reference [8] for more information on Gartner’s vision for data and analytics.
205
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
On-premise Cloud
See Reference [9] for more information on data and sample projects in IBM
10
206
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
In this case study, a bank in the United States wants to offer a smart
mortgage service for California residents. The interest rate on the loan is
based on a combination of the applicant’s personal credit score and the
latest interest rate regulations. To implement this service, data engineers
in this bank need to collect all key information about the applicant and
recommended rates. The key information is spread in different database
systems as depicted in Figure 9-3:
207
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
First, adding the amounts of the loan and credit card debt gives the
total debt of an applicant. Then query the interest rate table in MongoDB
with the credit score to find the corresponding interest rate. Finally,
generate recommended interest rate for mortgage applicants. The whole
data pipeline looks like Figure 9-5.
However, interest rate data keeps changing. Loan and credit card
debt amounts are updated monthly. Credit scores are calculated by other
applications and pushed to the PostgreSQL database daily. Interest rates
are updated daily. If the tasks of joining of data, the calculation of total
debt, and querying of interest rates are separated and run independently,
208
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
11
See Reference [10] for more information on why Andrew Ng advocates for
data-centric AI.
12
See Reference [11] for more information on the impact of poor-quality data on
business from Forbes.
209
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
210
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
211
212
Chapter 9
Data Fabric and Data Mesh for the AI Lifecycle
There are a wealth of algorithms for training the model. Once it’s
available, the data scientists save it to the project and use a holdout
dataset to evaluate the model. Figure 9-6 explains in plain English why
the model makes a prediction with a high degree of confidence. When the
performance of the model is satisfactory, it can be taken to production.
Moreover, the status and production performance of the models can be
monitored at any time from the model inventory, as shown in Figure 9-7.
There are a few challenges when operationalizing AI. The most
common one is that deploying AI models into production is expensive and
time-consuming. For many organizations, over 80% of the models have
never been operationalized. While data science teams build many models,
very few are actually deployed into production, which is where the real
value comes from. For many organizations, the time it takes them to build,
train, and deploy models is 6–12 months.
The issue of AI bias has attracted increasing attention in public. Drift
occurs as data patterns change, which leads to a reduction in the accuracy
of each model’s predictions. When this happens, line-of-business leaders
are increasingly losing confidence that their models are producing
actionable insights for their business. Fairness is also an area of concern.
If the model has produced favorable predicting results for specific groups
(gender, age, or nationality), then it can lead to AI ethics discussions and
possibly even legal risks.
213
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
214
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
Another AI trust issue that affects model deployment comes from the
lack of model lineage analysis. This includes two aspects. One is how the
model is built and which features have a decisive role in the final scoring
results of the model. This is the area where the interpretability of the
model comes into play. The other area is data lineage: where the data used
to train the model comes from, whether it is accurate and secure, and
whether there is a possibility of tampering.
The fact sheets shown in Figure 9-7 are an example to help business
users understand and trust the model.
These challenges need to be considered when an organization chooses
a Data Fabric and Data Mesh implementation. The goal of acquiring data
is to use it for a particular business purpose. Therefore, the best solution
is to have the capabilities to operationalize data and AI implemented
within the Data Fabric architecture. It helps organizations reduce the skills
required to build and manipulate AI models, speed up delivery time by
minimizing mundane tasks and data preparation challenges, and, at the
same time, optimize the quality and accuracy of AI models with real-time
governance.
215
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
216
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
RBFOpt is an open source library for black-box optimization with costly function
14
evaluations.
217
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
218
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
15
See Reference [12] for more information about the benefits AutoAI could bring
to MLOps and the AI lifecycle.
219
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
220
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
16
A/B testing is a method of comparing two versions of a web page or app.
221
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
On-Premise Public
Machine Learning Training Platform Cloud
Loan Approval
Hyperparameter
Virtualize, optimization
replicate Model evaluation
transform
for training
Like the previous pattern, during the training phase, data is moved off
from the on-premises system to the public cloud, and after the training is
complete, it is deployed directly to the public cloud. Applications running
on public cloud retrieve raw data either directly from the data source on-
premises or though access to a near-real-time cache of the data source
222
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
on-premises. Once applications get the data needed for model inference
APIs, they send requests to the API gateway, which dispatches the requests
to a specific runtime for models as described in the first pattern.
There are two important consideration factors when deploying this
pattern:
223
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
Loan Approval
Hyperparameter
optimization
Model evaluation
Deploy Deploy
Deploy
Edge Edge Edge
DL model DL model DL model
Applications in C++ Applications in C++ Applications in C++
In these two examples, very likely the production lines in the shop floor
don’t have connections to the public cloud. Even if they have connections,
the network overhead increases the delay in the return of model inference
results, and the performance requirements of the application cannot be
met. That’s why edge deployment has traction.
In this deployment pattern, as depicted in Figure 9-11, all data
captured at edge devices (including images and videos) is sent to
public cloud or on-premises for training. Since images and videos are
unstructured data, manual annotation is usually required. When the
model training is completed, the model is deployed to multiple edge
devices. One difficulty with this deployment pattern is that the model may
be rewritten in a language like C++ due to the resource constraints of the
edge-side devices and the extremely high requirements for performance,
which imposes additional difficulties for model upgrades and version
management. It often requires an additional component to dispatch the
models to edge devices and manage the lifecycle of models at the edge.
Key Takeaways
We conclude this chapter with a few key takeaways as summarized in
Table 9-1.
224
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
225
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
R
eferences
[1] Gartner Top Strategic Technology Trends for 2022,
www.gartner.com/en/information-technology/
insights/top-technology-trends
[2] Mark Haakman, Luís Cruz, Hennie Huijgens, & Arie
van Deursen , AI lifecycle models need to be revised,
https://link.springer.com/article/10.1007/
s10664-021-09993-1
226
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
[9] Data and Sample projects in IBM Cloud Pak for Data
Gallery, https://dataplatform.cloud.ibm.com/ga
llery?context=cpdaas&format=project-template
&topic=Data-fabric
227
Chapter 9 Data Fabric and Data Mesh for the AI Lifecycle
228