Machine Learning in Production
Machine Learning in Production
Christian Kästner
This chapter covers part of the “From Models to AI-Enabled
Systems (Systems Thinking)” lecture of our Machine
Learning in Production course. For other chapters see
the table of content.
Architecture sketch of a transcription system, illustrating the central ML component for speech
recognition and many non-ML components.
Figure from Google’s 2015 technical debt paper, indicating that the amount of code for actual
model training is comparably small compared to lots of infrastructure code needed to automate
model training, serving, and monitoring. These days, much of this infrastructure is readily
available through competing MLOps tools (e.g., serving infrastructure, feature stores, cloud
resource management, monitoring).
Researchers and consultants report that shifting a team’s
mindset from models to machine-learning pipelines is
challenging. Data scientists are often used to working with
private datasets and local workspaces (e.g., in computational
notebooks) to create models. Migrating code toward an
automated machine-learning pipeline, where each step is
automated and tested, requires a substantial shift of
mindsets and a strong engineering focus. This is not
necessarily valued by all team members; for example, data
scientists frequently report resenting having to do too much
engineering work which prevents them from focusing on
their models; though many eventually appreciate the
additional benefits of being able to experiment more rapidly
in production and deploy improved models with confidence.
The ML pipeline corresponds to all activities for producing, deploying, and updating the ML
component that is part of a larger system.
Systems Thinking
A system consists of components working together toward the system goal. The system is situated
in and interacts with the environment.
A smart safe browsing feature uses machine learning to warn of malicious web sites. In this
case, the design is fairly forceful, prompting the user to make a choice, but stops short of fully
automating the action.
Beyond forcefulness, another common user interface design
question is to what degree to explain predictions to users.
For example, shall the tax software simply report an audit
risk score, or explain how the prediction was made, or
explain which inputs are mostly responsible for the predicted
audit risk? As we will discuss in chapter Interpretability and
Explainability, the need for explaining decisions depends
heavily on the confidence of the model and the potential
impact of mistakes on users: In high-risk situations, such as
medical diagnosis it is much more important that a human
expert can understand and check a prediction based on an
explanation than in a routine and low-risk situation as
ranking hotel offers.
Consider the safety concerns of wrong predictions in a smart toaster and how to design a safe
system regardless.
Interdisciplinary Teams
On Terminology
Unfortunately, there is no standard term for referring to
building production systems with machine-learning
components. In this quickly evolving field, there are many
terms and they are largely not used consistently. In this
book, we adopt the term “ML-enabled system” or simply the
descriptive “production system with machine-learning
components” to emphasize the broad focus on the entire
system, in contrast to a more narrow model-centric focus of
data science education or even MLOps pipelines. The terms
“ML-infused system” or “ML-based system” have been used
with similar intentions.
Summary
Further Readings