Experimentation, deployment and monitoring Machine Learning models: Approaches for applying MLOps

Diego Nogare
[email protected]
ORCID: 0000-0003-0796-9431
PPGEEC - Programa de Pós-Graduação em Engenharia Elétrica e Computação; ICTi - Instituto de Ciência e Tecnologia Itaú

Ismar Frango Silveira
[email protected]
ORCID: 0000-0001-8029-072X
PPGEEC - Programa de Pós-Graduação em Engenharia Elétrica e Computação

In recent years, Data Science has become increasingly relevant as a support tool for industry, significantly enhancing decision-making in a way never seen before. In this context, the MLOps discipline emerges as a solution to automate the life cycle of Machine Learning models, ranging from experimentation to monitoring in productive environments. Research results shows MLOps is a constantly evolving discipline, with challenges and solutions for integrating development and production environments, publishing models in production environments, and monitoring models throughout the end to end development lifecycle. This paper contributes to the understanding of MLOps techniques and their most diverse applications.
Keywords: MLOps, Model Experimentation, Model Deployment, Model Monitoring, Machine Learning, Machine Learning Operations

1 Introduction

In recent years, especially since 2010, Data Science has proven to be a fundamental discipline and a support tool for the industry to improve its decision-making supported by data. With the increased relevance of this area, the challenges of publishing the developed models into production to deliver the proposed value to end-users have become more prominent

To address these challenges, the MLOps discipline has proven to be a promising approach, enabling the automation and governance of the processes of experimenting, publishing and monitoring Machine Learning models. The creation of MLOps pipelines is one of the main strategies to ensure the effectiveness and efficiency of these processes.

This work is expected to contribute to the advancement of AI, promoting more efficient and transparent methodologies for end-to-end Machine Learning project development, looking for to answer the investigative question "What are the main challenges faced by companies when publishing Machine Learning models into production, and how can the discipline of MLOps helps overcome them?" and more specific questions like "Why is it necessary to carry out continuous monitoring throughout the entire development lifecycle of machine learning models?" and "What are the essential steps to ensure an automated, efficient, and secure environment for publishing machine learning models?".

The remainder of the paper is organised as follow: in section 2 - MLOps pipeline, which explains the concepts and challenges of MLOps pipelines, in section 3 - Application and Case Study, applications and the benefits of implementing a solution with the stages of experimentation, publication and monitoring and three case studies from different fields of the industry that benefited from the implementation of MLOps are presented, and, in section 4 - Conclusion, the views of each of the three major areas explored are exposed.

2 MLOps pipeline

According to [21] the key objectives of MLOps are to achieve faster development and deployment, mainly based on DevOps practices such as CI/CD, and to manage the ML lifecycle of models with high quality, reproducibility and end-to-end tracking. Additionally, MLOps enables a shorter code build-deploy cycle and aims to automate and monitor all ML steps. [11] understands that the benefits of MLOps include automating the ML lifecycle, customizing models for product objectives, abstracting software infrastructure concerns, and increasing productivity. Research from [24] highlights the benefits of MLOps in uniting development and ML system operations, automating repetitive tasks like model training and code deployment, and enhancing model quality. This, in turn, reduces errors and downtime, enabling teams to detect and resolve problems quickly. These points positively impact development teams by enabling them to work together more efficiently and collaboratively, which increases developer productivity by allowing data scientists and ML engineers to continuously monitor and adjust performance with increased traceability and transparency, so that teams can track changes to the model and understand how it is being used.

Following an attempt to answer the investigative questions, this paper presents details involving the steps for creating MLOps pipelines, and which elements are fundamental in strategies to develop data science projects, independent of the industry or field.

Considering that it is necessary to adopt clear and direct methodologies in the development stages of these pipelines, just as they are adopted in traditional software development, the stages of the model development life cycle will be discussed, which have been grouped into three key categories: Experimentation, Deployment and Model Monitoring.

2.1 Experimentation

The Experimentation stage of the Machine Learning model development life cycle involves technical aspects such as component development and testing, in addition to automated learning, which are widely researched in academic literature, as pointed by [19, 6, 23].

However, elements involving human factors such as collaboration in writing code, managing data science and software engineering work, dealing with deadlines and priorities, documenting interfaces and responsibilities, as well as planning, operating, and evolving the system are not so explored in publications, as highlighted by [4].

Therefore, it is important that artificial intelligence projects are planned and integrated with other systems in a clear and direct way, following established standards such as CRISP-DM (Cross-industry standard process for data mining) or TDSP (Team data science process), such as pointed out by [14, 18, 7]. This way, it is possible to guarantee the efficiency and quality of the development of Machine Learning models.

Iterative steps are part of the Machine Learning model lifecycle experimentation process and are executed repeatedly until a model achieves the expected result. These tasks are crucial to ensuring the efficiency and quality of model development, focusing on data preparation, model development and training, and validating its performance, as cited by [6, 18, 20, 9, 2, 8].

Data preparation involves collecting, cleaning, processing, and transforming data, as well as selecting relevant features, and splitting data into training, validation, and test datasets. Model development and training involves choosing the architecture, defining hyperparameters, and applying supervised, unsupervised learning or reinforcement learning techniques. Model performance validation involves evaluating performance metrics, such as accuracy, precision, recall and F1-score, among others, as well as optimizing hyperparameters to improve model performance. Loop execution of these iterative steps is essential to ensure that the model matches business expectations and achieves good performance.

2.2 Deployment

Although publishing Machine Learning models is an important and challenging topic, it is not a subject frequently addressed in academic literature, as pointed out by [23].

However, the literature highlights the importance of ensuring that the model is implemented in a production environment in a safe and efficient way, following already established standards, as pointed out by [14, 18]. Continuous monitoring after deployment must also be ensured in order to detect possible problems and ensure that the model is working correctly, as highlighted by [10]. The publish step can contain container creation and model serialization, as mentioned by [6, 13].

There are several challenges to be addressed to publish a model efficiently and securely. Cientific literature highlights some of the main challenges, such as:

1. Variety of computing environments: One of the main challenges is the variety of computing environments in which the model can be deployed, such as an on-premises environment, in the cloud, in containers or at the edge, as mentioned by [13]. Each environment may have its own limitations and requirements, which can make large-scale deployment of models difficult.

2. Programming languages and frameworks: Another challenge is the variety of programming languages and frameworks available to implement Machine Learning or Deep Learning models, as presented by [13]. Each language and framework can have its own advantages and disadvantages, which can make it difficult to choose the best option for the model.

3. Portability: Portability is another important challenge, as the possibility of transporting the model publicated to other platforms may be limited, which affects the maintenance and portability of the model, as highlighted by [13].

4. Security: Security is a critical challenge as Machine Learning models can be vulnerable to malicious attacks such as data poisoning attacks or adversarial attacks, as pointed out by [6].

2.3 Model Monitoring

Continuous monitoring is a critical part of the MLOps pipeline lifecycle, as it helps identify and minimize impacts of data or concept drift, according to [20, 10].

ML models and data are constantly changing and this can happen unexpectedly, causing the model to become obsolete and respond with less accurate results, as [6] explains. Continuous monitoring identify trends and patterns in data, which allows system adjustment to improve performance, as explained by [20].

Furthermore, continuous monitoring also involves quality assurance, management and maintenance of new models, as well as the specificity of security policies, protection and other non-functional requirements of the project, as highlighted by [2]. Continuous monitoring also allows the identification of risks and maintenance of the model in production, aligned with the metrics established by the business, according to [10]. This is essential to ensure the efficiency and quality of the model in production, as well as to guarantee customer satisfaction.

The scientific papers highlights that what should be monitored depends on the specific context of the ML system in question and the relevant business metrics, as mentioned by [1, 2]. However, in general, the distribution of input and output data from the model must be monitored to detect data or concept drift, detection of missing values, outliers or extreme values and inconsistent data, without neglecting the identification of ethical or bias issues, as [12] highlights. Furthermore, it is important to monitor the model’s performance in relation to performance evaluation metrics, such as accuracy, precision, recall, F1-score, AUC and ROC curve, among others, according to [2].

Another monitoring aspect that must be considered is the growing demand on issues related to model transparency seeking to offer better explainability, interpretability and accountability of AI, as mentioned by [8, 22]. Not to mention the importance of monitoring ethical and bias issues in Machine Learning models, as mentioned by [12], because Machine Learning models can be vulnerable to discrimination and fairness issues that can negatively affect accuracy and effectiveness of the model, according to [1]. This occurs, in large scale, because traditional models tend to operate as a black box, making it unlikely to have an effective understanding beyond control and evaluation of their operations, as well as in deep learning models that solve most problems by automatically learning from the input data, which also leads to difficulties in understanding and interpreting the models developed, according to [6].

3 Application and Case Study

According to [15] there is a lack of general knowledge and methodology that can facilitate and guide the adoption and migration of machine learning management tools by developers and data scientists, which impacts to increased overhead during configuration and maintenance of these environments and tools. [17], explains that MLOps can help address the challenges inherent in the process of improving the efficiency and effectiveness of deploying machine learning models. This could lead to greater adoption of ML in business processes and therefore greater potential benefits for the industry. According to [10], developing and deploying MLOps pipelines helps ensure versatile model quality over an extended period of time, simultaneously driving accuracy and stability, reducing training time, and increasing model resilience. Furthermore, it mentions that machine learning pipelines are responsible for continuously monitoring and ensuring the quality of the models developed during their operation.

3.1 Case Study

Looking for of showing the MLOps applicability in different fields of the industry, three examples were selected from different fields that benefited from MLOps in their applications and reaped positive results.

3.1.1 Financial Services

Itaú Unibanco is the biggest financial bank in Latin America, currently has more than 65 million customers and has more than 70 PB of data. As explained by [16], the platform developed by Itaú helps simplify the management of Data Science models by enabling fast delivery, following good practices in software engineering, security and architecture of machine learning solutions integrated with the production process. Furthermore, the platform facilitates the observability of models and data so that retraining moments can be defined, as well as detecting deviations in statistical distributions, both of data and predictions, improving reliability for the business. An additional advantage is the greater transparency in the stages of publishing Data Science models in a production environment thanks to the governance required by these environments, such as change management and mandatory log storage. The proposed solution was created based on the main needs relating to an environment that supported the life cycle of Data Science models, including experimentation, deploy and monitoring, mapped together by the business and technology areas of Itaú Unibanco.

3.1.2 Tech Solution Provider

The EXPLAIN (EXPLanatory Interactive Artificial Intelligence for INdustry) project was led by [3] and built from interviews carried out with four project partner companies, each with experience related to the AI/ML sector. One to four employees from each company were interviewed, including data scientists/analysts, project managers, IT specialists, software engineers, and ML architects. A structured interview guide was developed and used to conduct a semi-structured interview, which covered all research questions. The information collected was analyzed and used to determine the extent to which MLOps is implemented by the project partner companies, what the MLOps architectures, tools and requirements are from the company perspective, and the requirements for developing a new MLOps software architecture. Interviews carried out with the project’s partner companies showed that each of them uses MLOps differently, depending on the specific use case.

3.1.3 Energy Consumption Prediction

Research led by [5] presents a digital twin architecture for modeling energy consumption in homes. Through sensors implanted in devices, the solution collects device-level energy consumption data to recognize micro-moments and provide personalized, timed recommendations to users. The solution is more granular than the collaborative filtering-based approach, since each family cluster and its users are modeled in a specialized ontology. The solution is also robust to missing data and supports multiple time granularity from 1 to 1440 minutes (24 hours). Furthermore, data versioning is essential to ensure the reproducibility of experiments carried out in previous versions. MLOps was a fundamental part of the project, as it allowed the integration of the software development stages and operations of information technology systems, ensuring automation, monitoring and integration of tests and management of infrastructure as code, among other techniques, thus advancing towards the continuous delivery and deployment of the ML system.

4 Conclusion

Even though there is no concrete and objective definition of what MLOps is in the sense of defining it in a single sentence, its importance for the processes of publishing data science models is crystal clear, which can be observed from the research covered in this paper. It is possible to consider that the model development life cycle and all their stages are grouped into three key categories, namely, Experimentation, Deploy and Monitoring.

Experimentation is an iterative stage in the model life cycle, which involves evertything from data preparation, development and training of the model to validating its performance. This is a critical task in the model’s life cycle, as it is in this phase that we look for to achieve the expected result and identify possible problems and challenges that will help answer the business problems.

Deploy Machine Learning models is a growing challenge in the area of Data Science and there is little academic literature on the subject. However, the stage of publishing Machine Learning models is involved with technical challenges, such as choosing the appropriate infrastructure to serve the model, configuring the production environment and integrating with other existing systems.

In terms of monitoring, it is important to monitor the models to avoid ethical and bias problems in Machine Learning projects, in order to minimize the risks of discrimination and fairness, which is mainly related to the transparency of the model, seeking to offer better explainability, interpretability and responsibility of the AI. Not to mention the operational monitoring that can monitor the detection of network failures, hardware failures and software errors, in addition to monitoring service interruptions, response latency and scalability of the solution, as well as attempted attacks that leave the model vulnerable.

Based on the steps grouped into these three key categories, it is expected that machine learning engineers can build environments that make it possible to take models from development to production in an automated, efficient and safe way, with continuous monitoring.

Details in each of the steps can be explored in future work in order to improve the processes.

5 Acknowledgment

We would like to express our deep gratitude to the Mackenzie Presbyterian Institute for the support provided during the development of this research. The financial support, infrastructure, and resources provided were fundamental to the success of this work.

We also want to extend our thanks to the Institute of Science and Technology Itaú (ICTi) for their continued encouragement and investment in Brazilian science. We firmly believe in the importance of their contribution to the advancement of knowledge and research in our country.

Any opinions, findings, and conclusions expressed in this manuscript are those of the authors and do not necessarily reflect the views, official policies or position of Itaú Unibanco.

References

[1] Aurélien Bourgais and Issam Ibnouhsein. Ethics-by-design: the next frontier of industrialization. AI and Ethics, 2:317–324, 5 2022.
[2] Alexandre Carqueja, Bruno Cabral, João Paulo Fernandes, and Nuno Lourenço. On the democratization of machine learning pipelines. pages 455–462. Institute of Electrical and Electronics Engineers Inc., 2022.
[3] Leonhard Faubel and Klaus Schmid. An analysis of mlops practices. Technical report, 2023.
[4] Lukas Fischer, Lisa Ehrlinger, Verena Geist, Rudolf Ramler, Florian Sobiezky, Werner Zellinger, David Brunner, Mohit Kumar, and Bernhard Moser. Ai system engineering - key challenges and lessons learned. 2020.
[5] Tiago Yukio Fujii, Victor Takashi Hayashi, Reginaldo Arakaki, Wilson Vicente Ruggiero, Romeo Bulla, Fabio Hirotsugu Hayashi, and Khalil Ahmad Khalil. A digital twin architecture model applied with mlops techniques to improve short-term energy consumption prediction. Machines, 10(1), 2022.
[6] Gharib Gharibi, Vijay Walunj, Raju Nekadi, Raj Marri, and Yugyung Lee. Automated end-to-end management of the modeling lifecycle in deep learning. Empirical Software Engineering, 26, 3 2021.
[7] Mark Haakman, Luís Cruz, Hennie Huijgens, and Arie van Deursen. Ai lifecycle models need to be revised: An exploratory study in fintech. Empirical Software Engineering, 26, 9 2021.
[8] Willem-Jan Van Den Heuvel and Damian A Tamburri. Model-driven ml-ops for intelligent enterprise applications: Vision, approaches and challenges, 2020.
[9] Alexander Isenko, Ruben Mayer, Jeffrey Jedele, and Hans Arno Jacobsen. Where is my training bottleneck? hidden trade-offs in deep learning preprocessing pipelines. pages 1825–1839. Association for Computing Machinery, 6 2022.
[10] Eyad Kannout, Michal Grodzki, and Marek Grzegorowski. Considering various aspects of models’ quality in the ml pipeline - application in the logistics sector. pages 403–412. Institute of Electrical and Electronics Engineers Inc., 2022.
[11] Igor L. Markov, Hanson Wang, Nitya S. Kasturi, Shaun Singh, Mia R. Garrard, Yin Huang, Sze Wai Celeste Yuen, Sarah Tran, Zehui Wang, Igor Glotov, Tanvi Gupta, Peng Chen, Boshuang Huang, Xiaowen Xie, Michael Belkin, Sal Uryasev, Sam Howie, Eytan Bakshy, and Norm Zhou. Looper: An end-to-end ml platform for product decisions. pages 3513–3523. Association for Computing Machinery, 8 2022.
[12] Beatriz M.A. Matsui and Denise H. Goya. Mlops: A guide to its adoption in the context of responsible ai. pages 45–49. Institute of Electrical and Electronics Engineers Inc., 2022.
[13] Raúl Miñón, Josu Díaz-De-Arcaya, Ana I Torre-Bastida, Gorka Zarate, and Aitor Moreno-Fernandez-De-Leceta. Mlpacker: A unified software tool for packaging and deploying atomic and distributed analytic pipelines, 2022.
[14] D. R. Niranjan and Mohana. Jenkins pipelines: A novel approach to machine learning operations (mlops). pages 1292–1297. Institute of Electrical and Electronics Engineers Inc., 2022.
[15] Aquilas Tchanjou Njomou and Polytechnique Montreal Montreal. On the challenges of migrating to machine learning life cycle management platforms marios fokaefs*, 2022.
[16] Diego Nogare, Rodrigo Fernandes Mello, and Marco Antonio Lopes. Automação no processo de publicação de modelos de ciência de dados. In Anais Estendidos do XIII Congresso Brasileiro de Software: Teoria e Prática, pages 40–43. SBC, 2022.
[17] Andrei Paleyes, Raoul Gabriel Urma, and Neil D. Lawrence. Challenges in deploying machine learning: A survey of case studies. ACM Computing Surveys, 55, 12 2022.
[18] Janvi Prasad, Arushi Jain, and Ushus Elizabeth Zachariah. Comparative evaluation of machine learning development lifecycle tools. pages 460–465. Institute of Electrical and Electronics Engineers Inc., 2022.
[19] Romesh Ranawana and Asoka S. Karunananda. An agile software development life cycle model for machine learning application development. Institute of Electrical and Electronics Engineers Inc., 2021.
[20] Nathalie Rauschmayr, Sami Kama, Muhyun Kim, Miyoung Choi, and Krishnaram Kenthapadi. Profiling deep learning workloads at scale using amazon sagemaker. pages 3801–3809. Association for Computing Machinery, 8 2022.
[21] Rakshith Subramanya, Seppo Sierla, and Valeriy Vyatkin. From devops to mlops: Overview and application to electricity market forecasting. Applied Sciences (Switzerland), 12, 10 2022.
[22] Damian A. Tamburri. Sustainable mlops: Trends and challenges. pages 17–23. Institute of Electrical and Electronics Engineers Inc., 9 2020.
[23] Matteo Testi, Matteo Ballabio, Emanuele Frontoni, Giulio Iannello, Sara Moccia, Paolo Soda, and Gennaro Vessio. Mlops: A taxonomy and a methodology, 2022.
[24] Yue Zhou, Yue Yu, and Bo Ding. Towards mlops: A case study of ml pipeline platform. pages 494–500. Institute of Electrical and Electronics Engineers Inc., 10 2020.