TowardsMLOps AFrameworkandMaturityModel

The document discusses MLOps (Machine Learning Operations), which aims to integrate machine learning model development with software development practices like DevOps. It presents a literature review on MLOps adoption and derives an MLOps framework and maturity model. The framework and model are then validated through a case study of three companies.

Uploaded by

gary

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

TowardsMLOps AFrameworkandMaturityModel

Uploaded by

gary

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Towards MLOps: A Framework and Maturity Model

Meenu Mary John Helena Holmström Olsson Jan Bosch

Computer Science and Media Technology Computer Science and Media Technology Computer Science and Engineering
Malmö University Malmö University Chalmers University of Technology
Malmö, Sweden Malmö, Sweden Gothenburg, Sweden
[email protected] [email protected] [email protected]

Abstract—The adoption of continuous software engineering well integrated with continuous development and production
practices such as DevOps (Development and Operations) in in practice [2].
business operations has contributed to significantly shorter soft- Despite the popularity of ML, there is little research on
ware development and deployment cycles. Recently, the term
MLOps because it is a recent phenomenon. To advance
MLOps (Machine Learning Operations) has gained increasing
interest as a practice that brings together data scientists and understanding of how companies practice MLOps, including
operations teams. However, the adoption of MLOps in practice collaboration between data science and operations teams, we
is still in its infancy and there are few common guidelines on use a Systematic Literature Review (SLR), a Grey Literature
how to effectively integrate it into existing software development Review (GLR), and a validation study in three case companies.
practices. In this paper, we conduct a systematic literature review
The paper makes three contributions.
and a grey literature review to derive a framework that identifies
the activities involved in the adoption of MLOps and the stages • We conduct a SLR and a GLR literature review to present
in which companies evolve as they become more mature and the state-of-the-art regarding the adoption of MLOps in
advanced. We validate this framework in three case companies practice and derive a framework from the reviews
and show how they have managed to adopt and integrate
MLOps in their large-scale software development companies. • We present a maturity model with different stages in
The contribution of this paper is threefold. First, we review which companies evolve during MLOps adoption
contemporary literature to provide an overview of the state-of- • We validate the framework and map the three case
the-art in MLOps. Based on this review, we derive an MLOps companies to the stages of the maturity model
framework that details the activities involved in the continuous
development of machine learning models. Second, we present The remainder of the paper is organized as follows: Section II
a maturity model in which we outline the different stages that describes the background of the study, Section III describes
companies go through in evolving their MLOps practices. Third, the research methods used and Section IV addresses the
we validate our framework in three embedded systems case threats to validity. Section V summarizes the findings from the
companies and map the companies to the stages in the maturity
model. literature review. Section VI describes the MLOps framework
Index Terms—MLOps, Framework, Maturity Model, SLR, and maturity model. Section VII describes the validation study
GLR, Validation Study conducted in three case companies and Section VIII discusses
the results. Section IX concludes our study.
I. I NTRODUCTION
Machine Learning (ML) has a significant impact on the II. BACKGROUND
decision-making process in companies. As a result, companies
can save significant costs in the long run while ensuring This section discusses DevOps, DevOps application on the
value for their customers [1] and also enabling fundamentally ML systems (referred to as MLOps), and the challenges
new ways of doing business. To improve value creation and associated with it.
automate the end-to-end life cycle of ML, data scientists and
operations teams are trying to apply DevOps concepts to their A. DevOps
ML systems [2] in companies. DevOps is a “set of practices DevOps [3] aims to “reduce the time between committing a
and tools focused on software and systems engineering” [3] change to a system and the change being placed into normal
with close collaboration between developers and operations production while ensuring high quality” [7]. The goal is to
teams to improve quality of service [4]. ML models embedded merge development, quality assurance, and operations into a
in a larger software system [5] are only a small part of the single continuous process. The key principles of DevOps are
overall software system, so the interaction between the model automation, continuous delivery, and rapid feedback. DevOps
and the rest of the software and its context is essential [6]. requires a “delivery cycle that involves planning, development,
From literature it is apparent that ML processes are often not testing, deployment, release and monitoring as well as active
Software Center cooperation between different team members” [3].
Continuous software engineering (SE) refers to iterative which companies evolve as they gain maturity and become
software development and related aspects like continuous inte- more advanced. To achieve this objective, we developed the
gration, continuous delivery, continuous testing and continuous following research questions:
deployment. Continuous SE enables development, deployment • RQ1: What is the state-of-the-art regarding the adoption
and feedback at a rapid pace [8] [9] and is divided into three of MLOps in practice and the different stages that com-
phases: a) Business strategy and planning, b) Development and panies go through in evolving their MLOps practices?
c) Operations. Software development activities such as con- • RQ2: How do case companies evolve and advance their
tinuous integration (CI) and continuous delivery (CD) support MLOps practices?
the operations phase. In CI [8], team members of software- We performed a SLR [18] [19], a GLR [20] [21] and a
intensive companies often integrate and merge development validation case study [22] to address the two RQs.
code to have a faster and more efficient delivery cycle and
increases team productivity [9]. This facilitates the automation A. SLR and GLR
of software development and testing [10]. CD ensures that The goal of the SLR is to find, examine and interpret
an application is not moved to the production phase until relevant studies on the topic of interest [18] [19]. To answer
automated testing and quality checks have been successfully the RQs, we defined search strings according to [18] and
completed [11] [12]. It lowers deployment risk, cost, and searched five popular scientific libraries. Figure 1 shows an
provides rapid feedback to users [13] [14]. overview of the SLR and the GLR process that was used in
B. MLOps this study. We integrated and exported relevant studies into an
Excel spreadsheet for deeper analysis. In SLR, we included
With the successful adoption of DevOps, companies are
conference and journal studies that reported MLOps. On the
looking for continuous practices in the development of ML
other hand, we excluded studies that were duplicate versions,
systems. To unify the development and operation of ML sys-
published in a language other than English, were not peer-
tems, MLOps [5] extends DevOps principles [15]. In addition
reviewed, and were not available electronically on the Internet.
to traditional unit and integration testing, CI introduces addi-
We conducted the GLR [20] to provide a detailed descrip-
tional testing procedures such as data and model validation.
tion of the state-of-practice and practitioner experiences in
From the perspective of CD, processed datasets and trained
adopting MLOps. Compared to the SLR, the GLR provides
models are automatically and continuously delivered by data
the voice of practitioners on the topic under study. In GLR,
scientists to ML systems engineers. From the perspective
we included studies in the Google Search that address MLOps,
of continuous training (CT), introduction of new data and
published in English in PDF format and documents from
model performance degradation require a trigger to retrain the
companies by filtering the site under the domain name “.com”.
model or improve model performance through online methods.
To improve the reliability of the retrieved results from the
In addition, appropriate monitoring facilities ensure proper
GLR, we excluded peer-reviewed scientific articles and other
execution of operations.
sources of knowledge such as blogs, posts, etc.
C. Challenges associated with MLOps For the SLR and the GLR, we used the search query as
In our own previous research [16] [17], we have identified “MLOps” OR “Machine Learning Operations” and restricted
a number of challenges when it comes to the business case, the search to the period between January 1, 2015 and March
data, modeling and deployment of ML or Deep Learning (DL) 31, 2021. The time interval was chosen because the term
models. These include high AI costs and expectations, fewer MLOps is prevalent after the concept “Hidden Technical Debt
data scientists, need for large datasets, privacy concerns and in Machine Learning Systems” [6] in 2015. Based on the SLR
noisy data, lack of domain experts, labeling issues, increasing and the GLR, we shortlisted 6 SLR ( [23] - [28]) and 15 GLR
feature complexity, improper feature selection, introduction of [29] - [43] studies. Based on these studies, we developed an
bias when experimenting with models, highly complex DL MLOps framework and various stages that companies take in
models, need for deep DL knowledge, difficulty in determining evolving MLOps practices.
final model, model execution environment, more hyperparam- B. Validation Case Study
eter settings, and verification and validation. It also includes
less DL deployment, integration issues, internal deployment, Following [44], we conducted a validation study to map
need for an understandable model, training-serving skew, end- companies to the stages in the maturity model derived
user communication, model drifts, and maintaining robustness. from literature reviews. Case study methodology is an
Some of the challenges in MLOps practice [5] include tracking empirical research approach based on an in-depth study of a
and comparing experiments, lack of version control, difficulty contemporary phenomenon that is difficult to study separately
in deploying models, insufficient purchasing budgets and a in its real-world environment [45]. In SE, case studies are
challenging regulatory environment. used to better understand how and why SE was done and
thus improve the SE process and resulting software products
III. R ESEARCH M ETHOD [46]. Throughout the validation study, we worked closely
The main objective of the study is to identify the activities with practitioners in each case company. Table 1 provides a
associated with the adoption of MLOps and the stages in brief description of each case company, the practitioners (P*,
Fig. 1. Overall SLR and GLR process used in the study

W*, M*, and S* represent interview, workshop, meeting, and and meetings lasted 30 minutes to one hour, and daily stand-
stand-up meeting participants respectively) and their roles. up meetings lasted 15 minutes. We validated the MLOps
framework in case companies and present different stages
TABLE I that companies go through when implementing MLOps.
D ESCRIPTION OF PRACTITIONERS IN THE VALIDATION STUDY
Transcripts from interviews and notes from workshops,
Case Company Practitioners meetings and stand-ups were used to capture empirical data.
ID Roles
Later, they were shared with the other authors by primary
P1, W1, S1 Senior Data Engineer
Telecommunications P2, W2, S2 Data Scientist author for detailed analysis. We applied elements of open
P3, W3, S3 Data Scientist coding to analyse and categorize collected empirical data
P4, W4, S4 Data Scientist [47]. In order to obtain different perspectives on the topic
W5, S5 Senior Data Scientist
W6, S6 Data Scientist under study, triangulation was used [48].
W7, S7 Software Developer
W8, S8 Software Developer IV. T HREATS TO VALIDITY
S9 Operational Product Owner
S10 Sales Director Potential validity threats were considered and minimized in
Automotive W9 Expert Engineer this study [49]. Construct validity was improved by consid-
W10 Expert Engineer ering information from SLR, GLR and the validation case
Packaging M1 Solution Architect
M2 Data scientist
study. Authors and practitioners involved in this study are
well versed in MLOps. Multiple techniques (semi-structured
interviews, workshops, meetings, and stand-up meetings) and
Case Companies: We present three case companies and use multiple sources (senior data engineer, data scientist, software
cases that were investigated in each company as part of our developer, expert engineer, etc.) were used to collect and
validation study. validate empirical data. Internal validity threats caused by
1. Hardware Screening: The telecommunications company faulty conclusions due to primary author bias in data selection
predicts faults in hardware to minimize the amount of or interpretation are mitigated by consulting with other two au-
hardware returned by the customer for repair. In this use case, thors. By extending our research to additional case companies,
they focus on a) Returning defect-free hardware back to the generalization of the results can be justified and thus external
customer b) Sending defective hardware to the repair center. validity can be mitigated.
2. Self-driving Vehicles: The company that manufactures
vehicles strives to provide autonomous transportation V. L ITERATURE R EVIEW F INDINGS
solutions. The main use case is self-driving vehicles to Based on the SLR and the GLR, we extract insights from the
increase the productivity. The company also needs to ensure literature to give an overview of the state-of-the-art of MLOps
that the failure rate is low in these safety-critical use case. in practice. They are divided into three parts: a) Data for ML
3. Defect Detection: The packaging company provides Development b) ML Model Development and c) Release of
packaging solutions as well as machines to customers. ML Models. Below, we discuss each part in detail.
One of the main use cases is the detection of defects in
finished/semi-finished packages. A. Data for ML Development
Aggregate heterogeneous data from different data sources
Data Collection and Analysis: For data collection, we [32] [31] [41], preprocessing [27] and extracting relevant
used interview studies, workshops, meetings and stand-up features are necessary to provide data for ML development.
meetings in companies. They were held in English via video Later, the features are registered in a feature store [42] which
conferencing. All interviews lasted 45 minutes, workshops can be used for development of any ML models [42] and used
for inference when deploying the model. Also, the data points with hyperparameters, evaluate the model and package it
are stored in the data repository [39] after versioning. The data for production deployment. If performance degrades, trigger
collected from various sources has to be properly stored and retraining of the model by initiating a data feedback loop.
managed. Data anonymization and encryption [27] should be Data and code that have been versioned are stored in the data
performed to comply with data regulations (e.g. GDPR [25]). repository and code repository. To track the deployable model
version, store it in the model registry. Deployment cycles of
B. ML Model Development ML models can be shortened using CI/CD/CT.
In ML model development, provisions should be made to
run experiments in parallel, optimize the chosen model with MLOps Maturity Model: Based on the SLR and the
hyperparameters, and finally evaluate the model to ensure that GLR, we present a maturity model in which we outline four
it fits the business case. After versioning, the code is stored stages in which companies evolve when adopting MLOps
in the code repository [42] [23]. The model repository [39] practices. The four stages are a) Automated Data Collection
keeps track of the models that will be used in production, and b) Automated Model Deployment c) Semi-automated Model
the metadata repository contains all the information about the Monitoring and d) Fully-automated Model Monitoring.
models (e.g., hyperparameter information). Data scientists can These stages capture key transition points in the adoption of
collaborate on the same code base, which also allows them to MLOps in practice. Below, we detail each MLOps stage and
run the code in different environments and against a variety preconditions for a company to reach this stage.
of datasets. This facilitates scaling and the ability to track the A. Automated Data Collection: In this stage, companies
execution of multiple experiments and reproducibility [29]. have a manual processing of data, model, deployment
and monitoring. With the adoption of MLOps, company
C. Release of ML Models experience a transition from manual process to automated
To release ML models, package [41], validate [41] and data collection for (re)training process.
deploy models [40] to production [41]. When deploying a Preconditions: For transition from manual process to
model to production, it has to be integrated with other models automated data collection, there is a need for mechanism
as well as existing applications [30] [41]. When the model to aggregate data from different data sources which can be
is in production, it serves requests. Despite the fact that stored and accessed whenever required [32]. In addition,
training is often a batch process, the inferences can be REST it also demands capability for integrating and processing
endpoint/custom code, streaming engine, micro-batch, etc. new data sources, regardless of variety, volume or velocity
[35]. When performance drops, monitor the model [41] and [31]. It also requires infrastructure resources for automated
enable the data feedback loop [41] to retrain the models . In a data collection [34], data preparation and collaboration [38].
fully mature MLOps context, perform continuous integration Also, standardized and automated pipelines helps to drive
and delivery by enabling the CI/CD pipeline and continuous ingestion, transformation and storage of analytic data into a
retraining through CT pipeline [41] [31]. database or data lake [31]. Same feature manipulation during
From the literature review, we see that successful AI/ML training has to be replicated at the inference [35]. AI teams
operationalization ensures a safe, traceable, testable, and re- can promote trust by addressing data management challenges
peatable path for developing, training, deploying, and updat- like accountability, transparency, regulation and compliance,
ing ML models in different environments [30]. The use of and ethics [37].
MLOps enables automation, versioning, reproducibility, etc., B. Automated Model Deployment: The companies at this
with successful collaboration of required skills such as data stage have a manual model deployment and monitoring. With
engineer, data scientist, ML engineer/developer [40] [29]. For the adoption of MLOps, they undergo transition from manual
example, data scientists must specialize in SE skills such as model deployment and monitoring to automated deployment
modularization, testing, versioning, etc. [36]. Supporting pro- of the retrained model.
cesses formalized in policies serve as the basis for governance Preconditions: The transition can be achieved by implementing
[31] and can be automated to ensure solution reliability and provisions for automated model deployment to environments
compliance [31]. MLOps also support explainability (GDPR [43] [39] [43] [38] especially across dev, Q/A and production
regulation [25]) and audit trails [40]. environments [43] [34]. It encourages deployment freedom in
on-premise, cloud and edge [34] [38]. Automated deployment
VI. MLO PS F RAMEWORK AND M ATURITY M ODEL of retrained model can be achieved by providing a dedicated
Based on the SLR and the GLR, we derive an MLOps infrastructure-centric CI/CD pipeline [31], integration with
framework that identifies the activities involved in MLOps DevOps for automation, scale and collaboration [35].
adoption. Figure 2 depicts the MLOps framework. The Sufficient infrastructure choices for deployment includes
entire framework is divided into three pipelines: a) Data model hosting, evaluation, and maintenance [32], and
Pipeline b) Modeling Pipeline and c) Release Pipeline. After means to register, package (containerization [38] [24]),
collecting data relevant to ML models from data sources, deploy models [43] [40] [39] and integration of reusable
preprocessing of data and feature extraction is performed. software environments for training and deploying models
Once a suitable model has been experimented and optimized [43]. Tracking experiments [43] [39] [40] and models [31],
Fig. 2. MLOps Framework

automation scripts to manage and monitor models based on

drift [38] and ability to perform continuous model tracking
[31]. For easy monitoring of models, MLOps professionals
has to be provided with visual tools [34], and dedicated
and centralized dashboards [38] [27] [28]. It also requires
data orchestration pipelines and rule-based data governance
to ensure data changes [31], feedback loop and continuous
model retraining [43]. There should be also a mechanism
to automatically train model in production using fresh data
based on live pipeline triggers and feedback loops [38]
D. Fully-automated Model Monitoring: The companies
have deployment and monitoring of models in place where
performance degradation is acknowledged by alert. By
Fig. 3. MLOps Maturity Model utilizing MLOps, they undergo transition towards fully
automated monitoring of models.
Preconditions: For this transition, company requires CI/CD
integration with automation and orchestration [43] and CT
proper validation of models and data [36] can accelerate
pipeline to retrain models when performance degrades [31].
automated model deployment. Canary Deployments [32] and
For this transition, there is a need to ensure certification of
provisions to store, annotate, discover, and manage models in
models [32] [23], governance and security controls [43] [34]
a central repository [39] can facilitate automated deployment
[36], model explainability [43] [36], auditing of model usage
of retrained model. This stage also requires multi-talented
[34] [43], reproducible workflow and models [36]. There
teams of technologists and ML professionals to operationalize
should be mechanisms to perform end-to-end QA test and
and scale AI [37].
performance checks [43]. There should be assurance that data
C. Semi-automated Model Monitoring: At this stage,
security and privacy requirements are built into data pipelines
companies have a manual model monitoring in place. With
[31] as well as retrain production models on newer data using
MLOps, they can attain a transition from manual monitoring
the data, algorithms and code used to create the original [34].
to semi-automated model monitoring.
Preconditions: To reach this transition, there should be VII. VALIDATION S TUDY
provisions for triggering [43] when performance degrades and
availability of tools for diagnostics, performance monitoring The framework derived from the previous literature was
and addressing model drift [43] [27] [36]. It also requires validated in three case companies. Below, we detail how they
have tried to introduce and integrate MLOps into their software W5: “We have versioning for data and model but
development systems. may be in future we will go for versioning up the
entire pipeline, configurations etc”
A. Case Company A P1: “There are somethings which we do in our day
to day work which are continuous and some of these
Prior to implementing MLOps, the company planned an can be automatized. A data scientist should focus on
initial meeting with team members to discuss realistic ex- feature engineering, model training etc., instead of
pectations for MLOps. According to P2, practitioners must deployment or even writing test cases which can be
spend a significant amount of time creating the architecture, automatized to a great extent”
communicating, and discussing MLOps in the beginning, the
end result is a significant reduction in manual work and end- B. Case company B
to-end automation. According to case company A, the primary Case company B is also trying to implement MLOps in its
goal of MLOps is to achieve a) Automation b) Versioning of context. In dealing with data, the company collects data from
datasets and models c) Traceability and d) Reproducibility. real vehicles or generates it using simulations. They input this
P1: “We have already implemented something, but data into a logging system and add metadata to the logs. After
not fully, we are still investigating the concept of labeling, they ensure the quality of the images. Practitioners
MLOps. We are trying to see what is in place and access the logged data via an API, which is offloaded from
what is not and also how to implement that are not hard drives to servers. To extract data, they perform property
in place” selection on selected data frames, run queries to find data to
select for future investigation, run algorithms to find valuable
When working with data, practitioners need to have a data data and anonymize data. The company also looks forward
pipeline in place and provisions to register training data. to ensuring consistency of annotations throughout the project.
Before MLOps was implemented, this process was manual in Once the dataset is annotated, they perform pre-processing and
the company. For instance, when practitioners train a model, split the dataset into training, validation, and testing.
they read data from log files and ended up using new data When it comes to models, they train neural networks, spin
each time to train because they did not have access to old computational nodes, and deallocate them after training. They
data due to the lack of a data pipeline. To ensure the quality back up the experiments and validate the models and use
of the data pipeline, data schema has to be validated. The model pruning to increase inference speed. The practitioners
company is thinking about using DVC (Data Version Control) convert the model using ONXX and deploy it to artifactory.
to facilitate comparison of different models and visualizations. Once the model is deployed in Artifactory, it can be used
When dealing with models, practitioners keep track of model in vehicles. They essentially deploy the pipeline using the
performance by tuning them with hyperparameters and main- CI/CD loop. They also update the validation set based on new
tain the quality of the model pipeline. data, domain, etc. The company is interested in moving from
Practitioners in case company A place more emphasis on on-premise to cloud services for scalability. The company is
understanding how a project works, especially when it comes creating artifacts.
to the concept of model deployment. According to P3, the W8: ”We build artifacts as we need tool chain
best way to verify that integration of ML code into a project for data selection, development and deployment on
works or not is to have a CI or mechanism to test it without target devices to run inferences. We depend on other
running it in production. For example, practitioners develop a teams inside the company for certain artifacts. For
program that can run on a laptop and give everyone access instance, the logging system. Besides that, the team
to a local repository and environment. In such a way they build the rest of the artifacts.”
can run the data and the entire project with less data set (for
instance, 10 percent of the total size of the data set) to make C. Case company C
sure it works properly. If the models work properly in the Case company C drive towards attaining fully automated
development environment, practitioners will move them to the MLOps. Company has standardized and quality model devel-
production environment. To integrate these models, they need opment, manageable deployment workflow and model lifecy-
to consider other models that are already in production. As a cle in production. The main architectural principles of case
result, it takes time for practitioners to understand how to feed company C is to achieve a) Scale and high availability b)
previous data into the system. The company uses Tableau for Flexibility and extensibility c) Integration d) Automation and
data visualization and Graffana for model monitoring. e) Maintainability.
In case company A, data pipeline is quite immature and When dealing with data, the company captures images of
the model pipeline is not fully automated. On the other packages using cameras or generate them using simulations.
hand, model serving pipeline is quite aligned with MLOps. The data captured is stored in data lake before using it for
The company utilizes dataset versioning to compare models. training. The practitioners experiment with several algorithms
They earlier uses Gate for versioning and recently moved to before finalizing the best suitable model and adopt hyper-
artifactory. parameter tuning. They utilize GPUs for training to reduce
needed time. The company applies DevOps principles to
their ML systems. The models are packaged and deployed
to production via docker containers. They employ Kubernetes
to automate deployment as well as scaling. The company
has provisions for tracking data, models and experiments.
They have model management in place and also models
can be deployed to cloud, edge and on-premise. The model
monitoring can be visualized using dashboards and retrain
models when performance degrades. The company also uses
tool chain for development and deployment of models.

VIII. D ISCUSSION

This study highlights the emerging interest in MLOps and

the increasing adoption of these practices in software-intensive
Fig. 4. Mapping: Case companies to stages in Maturity model
systems. Compared to SLRs ( [23] - [28]), more relevant
GLR studies [29] - [43] on MLOps are retrieved from the
literature. This is a positive sign as it gives an indication that
more companies are driving towards achieving fully automated IX. C ONCLUSION
MLOps. Both the SLR and the GLR emphasize the fact that Companies adopt DevOps principles to ML systems in order
cross-functional teams with skills from data engineering, data to allow continuous development, deployment and delivery
science, or operations can facilitate MLOps. Based on insights of these systems. In this paper, we derive a framework that
from the literature, we see that feature store, data repository, identifies the activities involved when adopting MLOps and
code repository, metadata store, model registry, and feed-back the stages in which companies evolve as they become more
loops can shorten the transition of models from prototype advanced. We validate this framework in three software-
to production stage. As a result, they promote automation, intensive embedded systems companies and highlight how they
versioning, explainability, and traceability. have managed to adopt and integrate MLOps into their large-
In Figure 4, we see that case company A is placed in scale software development organizations. In future research,
between the phases - Automated Data Collection and Auto- we plan to expand our study by involving additional case com-
mated Model deployment. This is because while the modeling panies and experts for validation of our results. We believe our
and deployment pipelines in this company are very mature, findings support successful adoption of MLOps in software-
the data pipeline is immature. Also the company intends to intensive embedded systems companies.
version the data, model and release pipelines. The challenges
faced by case company A in the beginning phases of MLOps X. ACKNOWLEDGMENT
corresponds to challenges we identified in literature reviews. We would like to thank all case companies and practitioners
The data pipeline challenges that company A is experiencing involved in the validation study for sharing their experiences
is something common and also reported by other companies and examples for this paper. This work is funded by Software
in studies that were part of the GLR. Similar to case company Center.
A, we place company B at stage one in the maturity model -
Automated Data Collection. This is because they are looking R EFERENCES
forward to ensuring consistency of annotations across project. [1] Miklosik, A., Kuchta, M., Evans, N. and Zak, S., 2019. Towards the
On the other hand, they have provisions for data collection adoption of machine learning-based analytical tools in digital marketing.
IEEE Access, 7, pp.85705-85718.
from multiple sources, queries and algorithms to run valuable [2] Karamitsos, I., Albarhami, S. and Apostolopoulos, C., 2020. Applying
data, experiment tracking, etc. Since their data pipeline is DevOps practices of continuous automation for machine learning. In-
not completely automated, we place them in stage one. Case formation, 11(7), p.363.
[3] DevOps Documentation. Available online: https://devops.com (accessed
company C is placed in stage two - Automated Model deploy- on 25 March 2021).
ment in the maturity model. They employ DevOps principles [4] Farroha, B.S.; Farroha, D.L. A framework for managing mission needs,
in their application, data lakes available for collecting data, compliance, and trust in the DevOps environment. In Proceedings of the
IEEE Military Communications Conference, Baltimore, MD, USA, 6–8
frameworks for running experiments, track experiments, utilize October 2014; pp. 288–293.
docker containers for deployment. They also have mechanisms [5] Mäkinen, S., Skogström, H., Laaksonen, E. and Mikkonen, T., 2021.
to deploy models in cloud or edge. Whenever degrades, Who Needs MLOps: What Data Scientists Seek to Accomplish and
How Can MLOps Help?. arXiv preprint arXiv:2103.08942.
they initiate model retraining and has model management [6] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner,
in place. Even though they have mechanisms for automated D., Chaudhary, V., Young, M., Crespo, J.F. and Dennison, D., 2015.
model deployment, they are placed in stage two as they look Hidden technical debt in machine learning systems. Advances in neural
information processing systems, 28, pp.2503-2511.
forward to achieve fairness, generalizability, explainability and [7] Zhu, L., Bass, L. and Champlin-Scharff, G., 2016. DevOps and its
governance of models. practices. IEEE Software, 33(3), pp.32-34
[8] Fitzgerald, B.; Stol, K.J. Continuous Software Engineering: A Roadmap [30] MLOps Foundations. https://www.rackspace.com/sites/default/files/white-
and Agenda. J. Syst. Softw. 2017, 123, 176–189. papers/MLOps Foundations Data Sheet.pdf
[9] Bosch, J. Continuous Software Engineering: An Introduction. In Con- [31] MLOps. https://perspecta.com/sites/default/files/2021-
tinuous Software Engineering; Springer: Berlin/Heidelberg, Germany, 02/MLOps%20Whitepaper 022321.pdf
2014; pp. 3–13 [32] Machine Learning Operations (MLOps) A Meetup.
[10] Leppanen, M.; Makinen, S.; Pagels, M.; Eloranta, V.P.; Itkonen, J.; https://higherlogicdownload.s3.amazonaws.com/IMWUC/UploadedImages
Mantyla, M.V.; Mannisto, T. The Highways and Country Roads to /12f13a33-bced-4573-8dd7-be58d519757c/MLOps.pdf
Continuous Deployment. IEEE Softw. 2015, 32, 64–72 [33] Inroducing MLOps. https://itlligenze.com/uploads/5/137039/files/oreilly-
[11] Weber, I.; Nepal, S.; Zhu, L. Developing Dependable and Secure Cloud ml-ops.pdf
Applications. IEEE Internet Comput. 2016, 20, 74–79. [34] Eight must-haves for MLOps success and when to use them.
[12] Humble, J. Continuous Delivery vs. Continuous Deployment. Available https://info.algorithmia.com/hubfs/2020/Webinars/Forrester/Eight-must-
online: https://continuousdelivery. com/2010/08/continuous-delivery-vs- haves-slides-final.pdf?hsLang=en-u
continuous-deployment (accessed on 25 March 2021). [35] MLOps: Machine Learning Operationalization.
[13] Chen, L. Continuous Delivery: Huge Benefits, but Challenges Too. IEEE https://cdn.activestate.com/wp-content/uploads/2018/10/webinar-slides-
Softw. 2015, 32, 50–54. mlops.pdf
[14] Humble, J. What is Continuous Delivery? Available online: [36] What is MLOps. https://blogs.bmc.com/mlops-machine-learning-
https://continuousdelivery.com/2010/02/ continuous-delivery/ (accessed ops/?print=pdf
on 25 March 2021). [37] MLOps: Industrialized AI. https://www2.deloitte.com/content/dam/insights
[15] Google Cloud. 2020. MLOps: Continuous delivery and /articles/7022 TT-MLOps-industrialized-AI/DI 2021-TT-
automation pipelines in machine learning. Retrieved from MLOps infographic.pdf
https://cloud.google.com/solutions/machine-learning/mlops- [38] Deolite. https://www2.deloitte.com/content/dam/Deloitte/us/Documents
continuousdelivery-and-automation-pipelines-in-machine-learning. /technology/us-ml-oops-to-mlops.pdf
[39] MLOps using MLFlow. https://www.iteblog.com/ppt/dataai-summit-
[16] John, M.M., Olsson, H.H. and Bosch, J., 2020, June. Developing
europe-2020/mlops-using-mlflow-iteblog.com.pdf
ML/DL models: A design framework. In Proceedings of the International
[40] Seamless MLOps with Seldon and MLflow.
Conference on Software and System Processes (pp. 1-10).
https://www.plainconcepts.com/wp-content/uploads/2020/12/Seamless-
[17] John, M.M., Olsson, H.H. and Bosch, J., 2020, August. AI on the MLOps-with-Seldon-and-MLflow.pdf
Edge: Architectural Alternatives. In 2020 46th Euromicro Conference [41] MLOps with Azure AutoML. https://query.prod.cms.rt.microsoft.com
on Software Engineering and Advanced Applications (SEAA) (pp. 21- /cms/api/am/binary/RE4KW7y
28). IEEE. [42] MLOps stack. https://valohai.com/assets/files/mlops-stack.pdf
[18] Keele, S., 2007. Guidelines for performing systematic literature reviews [43] Onica. https://insights.onica.com/hubfs/2020%20Offerings/2020
in software engineering (Vol. 5). Technical report, Ver. 2.3 EBSE Onica MLOps Foundations Offer.pdf
Technical Report. EBSE. [44] Gerring, J., 2004. What is a case study and what is it good for?.
[19] Kitchenham, B.A., Budgen, D. and Brereton, P., 2015. Evidence-based American political science review, pp.341-354.
software engineering and systematic reviews (Vol. 4). CRC press. [45] Walsham, G., 1995. Interpretive case studies in IS research: nature and
[20] Garousi, V., Felderer, M. and Mäntylä, M.V., 2016, June. The need for method. European Journal of information systems, 4(2), pp.74-81.
multivocal literature reviews in software engineering: complementing [46] Runeson, P., Host, M., Rainer, A. and Regnell, B., 2012. Case study
systematic literature reviews with grey literature. In Proceedings of the research in software engineering: Guidelines and examples. John Wiley
20th international conference on evaluation and assessment in software & Sons.
engineering (pp. 1-6). [47] Holton, J.A., 2007. The coding process and its challenges. The Sage
[21] Williams, A., 2018, June. Using reasoning markers to select the more handbook of grounded theory, 3, pp.265-289.
rigorous software practitioners’ online content when searching for grey [48] Wilson, V., 2014. Research methods: triangulation. Evidence based
literature. In Proceedings of the 22nd International Conference on library and information practice, 9(1), pp.74-75.
Evaluation and Assessment in Software Engineering 2018 (pp. 46-56). [49] Easterbrook, S., Singer, J., Storey, M.A. and Damian, D., 2008. Selecting
[22] Yin, R.K., 2017. Case study research and applications: Design and empirical methods for software engineering research. In Guide to ad-
methods. Sage publications. vanced empirical software engineering (pp. 285-311). Springer, London.
[23] Zhou, Y., Yu, Y. and Ding, B., 2020, October. Towards MLOps: A Case [50] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of
Study of ML Pipeline Platform. In 2020 International Conference on Lipschitz-Hankel type involving products of Bessel functions,” Phil.
Artificial Intelligence and Computer Engineering (ICAICE) (pp. 494- Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955.
500). IEEE.
[24] Silva, L.C., Zagatti, F.R., Sette, B.S., dos Santos Silva, L.N., Lucrédio,
D., Silva, D.F. and de Medeiros Caseli, H., 2020, December. Bench-
marking Machine Learning Solutions in Production. In 2020 19th
IEEE International Conference on Machine Learning and Applications
(ICMLA) (pp. 626-633). IEEE.
[25] Tamburri, D.A., 2020, September. Sustainable MLOps: Trends and
Challenges. In 2020 22nd International Symposium on Symbolic and
Numeric Algorithms for Scientific Computing (SYNASC) (pp. 17-23).
IEEE.
[26] Vuppalapati, C., Ilapakurti, A., Chillara, K., Kedari, S. and Mamidi,
V., 2020, December. Automating Tiny ML Intelligent Sensors DevOPS
Using Microsoft Azure. In 2020 IEEE International Conference on Big
Data (Big Data) (pp. 2375-2384). IEEE. Banerjee, A., Chen, C.C., Hung,
C.C.,
[27] Huang, X., Wang, Y. and Chevesaran, R., 2020. Challenges and experi-
ences with mlops for performance diagnostics in hybrid-cloud enterprise
software deployments. In 2020 USENIX Conference on Operational
Machine Learning (OpML 20).
[28] Lim, J., Lee, H., Won, Y. and Yeon, H., 2019. MLOP lifecycle scheme
for vision-based inspection process in manufacturing. In 2019 USENIX
Conference on Operational Machine Learning (OpML 19) (pp. 9-11).
[29] MLOps: Continuous Delivery for Machine Learning on AWS.
https://d1.awsstatic.com/whitepapers/mlops-continuous-delivery-
machine-learning-on-aws.pdf