0% found this document useful (0 votes)
40 views10 pages

MLOPS Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views10 pages

MLOPS Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit 1 -Introduction to Machine Learning Operations

Introduction to concept of Machine Learning Operations:

Machine Learning: Machine Learning is the field of study that gives computers the capability
to learn without being explicitly programmed. ML focused on enabling systems to learn from
data, uncover patterns, and autonomously make decisions. It gives the computer that makes it
more similar to humans: The ability to learn. Machine learning is actively being used today,
perhaps in many more places than one would expect. In today’s era dominated by data, ML is
transforming industries ranging from healthcare to finance, offering robust tools for
predictive analytics, automation, and informed decision-making.

Machine Learning Operations:

MLOps stands for Machine Learning Operations. MLOps is a core function of Machine
Learning engineering, focused on streamlining the process of taking machine learning models
to production, and then maintaining and monitoring them. MLOps is a collaborative function,
often comprising data scientists, devops engineers, and IT. Machine learning operations
(MLOps) are a set of practices that automate and simplify machine learning (ML) workflows
and deployments. Machine learning and artificial intelligence (AI) are core capabilities that
you can implement to solve complex real-world problems and deliver value to your
customers. MLOps is an ML culture and practice that unifies ML application development
(Dev) with ML system deployment and operations (Ops). Your organization can use MLOps
to automate and standardize processes across the ML lifecycle. These processes include
model development, testing, integration, release, and infrastructure management.

How MLOps works: MLOps implements the machine learning lifecycle. These are the
stages that an ML model must undergo to become production-ready. The following are the
four cycles that make up the ML lifecycle:
1. Data cycle- The data cycle entails gathering and preparing data for training. First, raw data is
culled from appropriate sources, and then techniques such as feature engineering are used to
transform, manipulate and organize raw data into labeled data that's ready for model training.
2. Model cycle- This cycle is where the model is trained with this data. Once a model is trained,
tracking future versions of it as it moves through the rest of the lifecycle is important. Certain
tools, such as the open source tool MLflow, can be used to simplify this.
3. Development cycle- Here, the model is further developed, tested and validated so that it can
be deployed to a production environment. Deployment can be automated using continuous
integration/continuous delivery (CI/CD) pipelines and configurations that reduce the number
of manual tasks.
4. Operations cycle- The operations cycle is an end-to-end monitoring process that ensures the
production model continues working and is retrained to improve performance over time.
MLOps can automatically retrain an ML model either on a set schedule or when triggered by
an event, such as a model performance metric falling below a certain threshold.

Main components of MLOps:


Various components make up the MLOps model building process. They're usually
implemented sequentially and ensure the reproducibility of the process. The four steps in the
MLOps lifecycle provide an overview of the process, but these cycles can be broken down
into the more detailed components:
● Data collection and analysis- Valuable data must be identified and collected.
● Data preparation- Developers clean and prepare the data to ensure consistent formatting and
readability before it's introduced to the model.
● Model development and training- The prepared data is used to train the ML model, which is
tested to ensure it produces the insights, predictions and other outputs needed.
● Model deployment- The model is put into production, making it accessible to users after it's
developed and tested.
● Model monitoring- The model's performance is monitored to ensure it runs smoothly.
Any debugging that's needed happens at this stage.
● Model retraining- Models require new data to continue producing accurate and up-to-date
insights and predictions. Retraining is an on-going process.
● CI/CD- This component applies throughout the process, from development and testing to
deployment and retraining. It automates and streamlines these processes

Importance of MLOps in Machine Learning:

Machine learning helps organizations analyse data and derive insights for decision-making.
However, it's an innovative and experimental field that comes with its own set of challenges.
Sensitive data protection, small budgets, skills shortages, and continuously evolving
technology limit a project's success. Without control and guidance, costs may spiral, and data
science teams may not achieve their desired outcomes.
MLOps provides a map to guide ML projects toward success, no matter the constraints. Here
are some key benefits of MLOps.
● Faster time to market-
MLOps provides your organization with a framework to achieve your data science goals
more quickly and efficiently. Your developers and managers can become more strategic and
agile in model management. ML engineers can provision infrastructure through declarative
configuration files to get projects started more smoothly. Automating model creation and
deployment results in faster go-to-market times with lower operational costs. Data scientists
can rapidly explore an organization's data to deliver more business value to all.
● Improved productivity-
MLOps practices boost productivity and accelerate the development of ML models. For
instance, you can standardize the development or experiment environment. Then, your ML
engineers can launch new projects, rotate between projects, and reuse ML models across
applications. They can create repeatable processes for rapid experimentation and model
training. Software engineering teams can collaborate and coordinate through the ML software
development lifecycle for greater efficiency.
● Efficient model deployment-
MLOps improves troubleshooting and model management in production. For instance,
software engineers can monitor model performance and reproduce behaviour for
troubleshooting. They can track and centrally manage model versions and pick and choose
the right one for different business use cases. When you integrate model workflows with
continuous integration and continuous delivery (CI/CD) pipelines, you limit performance
degradation and maintain quality for your model. This is true even after upgrades and model
tuning.
● Speed and efficiency-
MLOps automates many of the repetitive tasks in ML development and within the ML
pipeline. For example, automating initial data preparation procedures reduces development
time and cuts down on human error in the model.
● Scalability-
ML models often must be scaled to handle increased workloads, larger data sets and new
features. To provide scalability, MLOps uses technology such as containerized software and
data pipelines that can handle large amounts of data efficiently.
● Reliability-
MLOps model testing and validation fix problems in the development phase, increasing
reliability early on. Operational processes also ensure models comply with policies that an
organization has in place. This reduces risks such as data drift, in which the accuracy of a
model deteriorates over time because the data it was trained on has changed.
● Risk reduction-
Machine learning models often need regulatory scrutiny and drift-check, and MLOps enables
greater transparency and faster response to such requests and ensures greater compliance with
an organization's or industry's policies.

Key use cases for MLOps


On the surface, MLOps appears to be exclusive to the tech industry; however, other industries
find value in using MLOps practices to enhance their operations:
● Finance- ML is valuable for analyzing millions of data points fast. This lets financial services
companies use ML to analyze many transactions and quickly detect fraud, for example.
● Retail and e-commerce- Retail relies on MLOps to produce models that analyze customer
purchase data and make predictions on future sales.
● Healthcare- MLOps-enabled software is used to analyze data sets of patient diseases to help
institutions make better-informed diagnoses.
● Travel- The travel industry analyzes customers' travel data to better target them with
advertisements for their next trips.
● Logistics- This software is used to analyze performance data on different modes of
transportation to predict failures and risks. This practice is known as predictive maintenance.
● Manufacturing- MLOps tools are used to monitor manufacturing equipment and provide
predictive maintenance capabilities.
● Oil and gas- In the oil and gas industry, MLOps monitors equipment and analyzes geological
data to identify suitable areas for drilling and extraction of oil and natural gas.

Introduction to Responsible AI
Responsible AI is a set of practices that ensure AI systems are designed, deployed, and used
in an ethical and legal way. The goal of responsible AI is to use AI in a safe, trustworthy, and
ethical fashion. Responsible AI considers the societal impact of the development and scale of
these technologies, including potential harms and benefits. Google's AI Principles provide a
framework to create the most helpful, safe, and trusted experiences for all. The principles
include objectives for AI applications as well as applications we will not pursue in the
development of AI systems. Responsible AI (RAI) designs practices to develop, deploy, and
scale AI for constructive causes and good intentions to impact people and society positively.
It nurtures people’s trust and confidence in the AI system. RAI helps transform AI
applications into more accountable, ethical, and transparent. It evaluates organizational AI
efforts from both ethical and legal points of view. With the growing use of AI systems in all
domains, many questions arise around AI ethics, trust, legality, and data governance. AI-led
decisions need to be evaluated on different fronts like business risks, data privacy, health -
safety issues, and equality. AI applications are business-critical and deal with sensitive data.
Therefore people need to understand the role of AI in depth.

Why is Responsible AI Important?


With Responsible AI, enterprises set key objectives and establish governance strategies for
AI initiatives. RAI enables:
● Reduced bias in datasets: Responsible AI helps ensure that both algorithm and
underlying data are unbiased and representative of ground truth.
● Ethical AI: RAI brings a security-first approach to ensure ethical use of data. It
protects the privacy and security of your sensitive data to avoid its unethical use by
any means. It helps mitigate risk and benefits people, organizations, and society as a
whole.
● AI transparency: Responsible AI drives transparency across processes and functions.
It enables human-understandable explanations for predictions made in contrast to
traditional black-box ML.
● Effective governance: ML development processes should be documented to avoid
their alteration for evil intention.
● Adaptability: Models supporting AI initiatives should be adapted to complex
environments without introducing bias.

Practicing Responsible AI
Responsible AI is an enabler of market growth, development, and competitive edge. RAI
journey requires designing the following best practices for it.
● Building all-inclusive and diverse team
● Ensuring transparent and explainable AI systems
● Ensuring measurable processes and tasks wherever possible
● Developing and executing guidelines on how RAI is implemented in the organization
● Performing time-to-time RAI checks for ML algorithms and data platforms
● Using automated tools for fairness, monitoring, and explainability. AI observability solutions
such as Censius AI Observability Platform can help here. It automates ML model monitoring
for desired performance metrics.

MLOps for Responsible AI


A responsible use of machine learning (more commonly referred to as Responsible AI)
covers two main dimensions: Intentionality and Accountability
Intentionality -Ensuring that models are designed and behave in ways aligned with their
purpose. This includes assurance that data used for AI projects comes from compliant and
unbiased sources plus a collaborative approach to AI projects that ensures multiple checks
and balances on potential model bias. Intentionality also includes explainability, meaning the
results of AI systems should be explainable by humans (ideally, not just the humans who
created the system).
Accountability: Centrally controlling, managing, and auditing the enterprise AI effort—no
shadow IT! Accountability is about having an overall view of which teams are using what
data, how, and in which models. It also includes the need for trust that data is reliable and
being collected in accordance with regulations as well as a centralized understanding of
which models are used for what business processes. This is closely tied to traceability: if
something goes wrong, is it easy to find where in the pipeline it happened?
These principles may seem obvious, but it’s important to consider that machine learning
models lack the transparency of traditional imperative code. In other words, it is much harder
to understand what features are used to determine a prediction, which in turn can make it
much harder to demonstrate that models comply with the necessary regulatory or internal
governance requirements.
The reality is that introducing automation vis-à-vis machine learning models shifts the
fundamental onus of accountability from the bottom of the hierarchy to the top. That is,
decisions that were perhaps previously made by individual contributors who operated within
a margin of guidelines (for example, what the price of a given product should be or whether
or not a person should be accepted for a loan) are now being made by a model. The person
responsible for the automated decisions of said model is likely a data team manager or even
executive, and that brings the concept of Responsible AI even more to the forefront.
Given the previously discussed risks as well as these particular challenges and principles, it’s
easy to see the interplay between MLOps and Responsible AI. Teams must have good
MLOps principles to practice Responsible AI, and Responsible AI necessitates MLOps
strategies. Given the gravity of this topic, we’ll come back to it multiple times throughout this
book, examining how it should be addressed at each stage of the ML model life cycle.

MLOps to Mitigate Risk:


Risk assessment and advantages
The risks of a Machine Learning application are many and MLOps is a way, derived from the
methodologies born of classic applications, to mitigate them. Therefore, when looking at
MLOps as a way to mitigate risk, an analysis should cover:
● The risk that the model is unavailable for a given period of time
● The risk that the model returns a bad prediction for a given sample
● The risk that the model accuracy or fairness decreases over time
● The risk that the skills necessary to maintain the model (i.e., data science talent) are lost.
Risks are usually larger for models that are deployed widely and used outside of the
organization. Risk assessment is generally based on two metrics: the probability and the
impact of the adverse event. Risk assessment should be performed at the beginning of each
project and reassessed periodically, as models may be used in ways that were not foreseen
initially.

Risk Mitigation
MLOps really tips the scales as critical for risk mitigation when a centralized team (with
unique reporting of its activities, meaning that there can be multiple such teams at any given
enterprise) has more than a handful of operational models. At this point, it becomes difficult
to have a global view of the states of these models without the standardization that allows the
appropriate mitigation measures to be taken for each of them. Pushing machine learning
models into production without MLOps infrastructure is risky for many reasons, but first and
foremost because fully assessing the performance of a machine learning model can often only
be done in the production environment. Why? Because prediction models are only as good as
the data they are trained on, which means the training data must be a good reflection of the
data encountered in the production environment. If the production environment changes, then
the model performance is likely to decrease rapidly.
Another major risk factor is that machine learning model performance is often very sensitive
to the production environment it is running in, including the versions of software and
operating systems in use. They tend not to be buggy in the classic software sense, because
most weren’t written by hand, but rather were machine-generated. Instead, the problem is that
they are often built on a pile of open source software (e.g., libraries, like scikit-learn, Python,
or Linux), and having versions of this software in production that match those that the model
was verified on is critically important.
Ultimately, pushing models into production is not the final step of the machine learning life
cycle—far from it. It’s often just the beginning of monitoring its performance and ensuring
that it behaves as expected. As more data scientists start pushing more machine learning
models into production, MLOps becomes critical in mitigating the potential risks, which
(depending on the model) can be devastating for the business if things go wrong. Monitoring
is also essential so that the organization has a precise knowledge of how broadly each model
is used.

MLOps Lifecycle

Machine Learning projects often fail due to a lack of collaboration between data scientists
who develop algorithms and engineers responsible for deploying them into production
system. By unifying DevOps and MLOps methods, companies can ensure successful machine
learning projects.

MLOps Lifecycle:
The MLOps lifecycle encompasses several key stages, each playing an important role in
ensuring the successful and sustainable deployment of ML models.
Let's explore each stage:
1. Data Management: The foundation of any successful machine learning model lies in the
quality and management of data. In the MLOps lifecycle, data management is a crucial
component that ensures the quality, integrity, and traceability of the data used for model
training and deployment. This stage involves establishing data pipelines, including data
collection, processing, implementing data versioning, lineage tracking, and enforcing quality
checks. Tools like Data Version Control (DVC) and Kubeflow Pipelines can organize the data
management process, ensuring consistent, reliable, and well-documented data for model
development and deployment.
2. Model Development: Once the data is prepared, the next stage is model development. During
this phase, machine learning models are created, trained, and tested. This phase involves
activities such as exploratory data analysis, feature engineering, model training, and
hyperparameter tuning. Collaboration between data scientists and MLOps engineers is crucial
to ensure models meet the desired performance criteria. To facilitate collaboration and
efficiency, tools like MLflow and TensorFlow Extended (TFX) can manage the model
development lifecycle, track experiment runs, and package models for deployment.
3. Model Deployment: After developing and testing the models, the next step is to deploy them
into production environments. This stage involves packaging the model, creating necessary
infrastructure (e.g., containers, Kubernetes clusters), and integrating the model with existing
applications or services. Cloud-based platforms like AWS SageMaker by Amazon, Google
Cloud AI Platform, and Azure Machine Learning Service can simplify the model deployment
process by providing managed services and streamlined deployment workflows. Additionally,
open-source tools like Polyaxon can help orchestrate model deployment across different
environments.
4. Model Monitoring and Maintenance: Once a model is deployed, it's essential to monitor its
performance, accuracy, and behavior over time. This stage involves setting up monitoring
systems to track key performance indicators (KPIs) such as prediction accuracy, response
times, or data drift, and identify potential issues or biases. Regular maintenance and updates,
including periodic retraining or fine-tuning, are crucial to keeping models optimized and
aligned with evolving data patterns.
5. Retiring and Replacing Models: Over time, models may become outdated or less effective
due to changes in data patterns, business requirements, or technology advancements. The
retiring and replacing stage involves assessing the need to retire existing models and
introducing newer, improved models. Careful planning and execution ensure a seamless
transition from the old model to the new one while minimizing disruptions in production
environments.

Application of MLOps:
MLOps, or Machine Learning Operations, is a set of practices that can be applied in a variety
of ways to improve the quality and efficiency of machine learning and AI solutions:
● Model development and production- MLOps can help increase the speed of model
development and production by using continuous integration and deployment (CI/CD)
practices.
● Model management- MLOps can improve model management and troubleshooting in
production by allowing users to track and manage model versions, monitor performance, and
reproduce behavior.
● Data management and governance- MLOps can help ensure the quality, integrity, and
compliance of data used in machine learning models.
● Predictive maintenance- MLOps can be used to analyze equipment data and predict machine
failures, which can help reduce downtime and repair costs.
● Real-time model monitoring and alerting- MLOps can continuously monitor the performance
of deployed models and generate alerts when issues are detected.
● Automated testing and validation- MLOps can help enable automated testing and validation
for machine learning models.
● Scalability and resource management- MLOps can help organizations efficiently train,
deploy, and manage machine learning models at scale.

Here are some examples of real-world applications of MLOps:


● Merck research labs: Uses MLOps to accelerate vaccine research and discovery
● GTS data processing company: Uses MLOps to enhance its ecosystem
● EY: Uses MLOps to accelerate model deployments
● KONUX: Uses MLOps to excel in predictive maintenance operations
● PadSquad: Uses MLOps to enhance ad performance in real-time
● Senko Logistics Group: Uses MLOps to enhance shipment volume accuracy
● Ocado: Uses MLOps to achieve a streamlined process for data management and governance.

Key MLOps Features:

Key Features of MLOps Solutions


● End-to-End Automation
MLOps systems provide a complete solution for automating every stage of the ML process.
MLOps tech eliminates manual involvement and potential errors at each stage by combining
data collection, preparation, model training, deployment, and monitoring into a single
process.
Without the need for continual human supervision, this end-to-end automation guarantees
that machine learning models are created, implemented and maintained effectively. These
platforms make use of cutting-edge technologies like cloud infrastructure, CI/CD pipelines,
and containerization.
● Scalability Optimization
MLOps platforms are excellent at maximizing machine learning models' performance,
scalability, and deployment in a variety of operating contexts. It gives businesses the
opportunity to grow their AI initiatives effectively without sacrificing model performance or
reliability.
MLOps platforms ensure that machine learning models can be easily scaled up or down to
meet changing demands while maintaining consistent and reliable operation across a variety
of computing environments.
Businesses can swiftly implement and modify their AI-powered products to match changing
market demands and maintain an advantage over rivals thanks to this scalability optimization.
● Standardization
Another key feature of the MLOps platform is that it offers a thorough coordination of
workflows for ML, standardizing and automating the many processes involved. It comprises
tools that manage various phases of the machine learning lifecycle, from data preparation to
monitoring.
It eliminates manual errors and facilitates smooth collaboration between data scientists,
machine learning engineers, and IT specialists.
● Integration with DevOps and CI/CD
MLOps can be easily integrated with DevOps and CI/CD. The integration facilitates the
development, deployment, and maintenance of machine learning models by allowing data
science and engineering teams to work together harmoniously through the use of common
tools, procedures, and workflows.
This feature allows businesses to automate the model deployment process, apply stringent
quality checks, and guarantee the dependability and consistency of their AI-powered
solutions by integrating MLOps into the CI/CD pipeline.
● Democratization
A noteworthy feature called "MLOps democratization" aims to increase machine learning
accessibility for a wider group of stakeholders and users. MLOps platforms are removing
entry barriers for non-experts by offering low-code/no-code MLOps solutions and
user-friendly interfaces, making it simple for users to design, deploy, and manage machine
learning models.
Without requiring in-depth technical knowledge, this democratization of MLOps enables
business analysts, domain experts, and even citizen data scientists to actively engage in the
machine learning process.

Difference between traditional software development and MLOps:


Traditional Software
Aspect MLOps
Development
Code and models learn
Code is fixed and
Code Nature from data and change over
predictable (deterministic).
time.
Iterative process (train
Linear process (design →
Development Process model → test → deploy →
code → test → deploy).
retrain).
Test models for accuracy,
Unit tests, integration tests,
Testing performance, and
checks for bugs.
reliability.
Deploy both models and
Deploy software (code) to
Deployment code, with continuous
servers or cloud.
updates.
Scale by adding resources
Scale by adding servers or
Scaling for model training and
resources to handle users.
prediction (e.g., GPUs).
Monitor model accuracy,
Monitor for bugs, crashes,
Monitoring data drift, and prediction
or performance issues.
performance.
Developers, QA testers, Data scientists, developers,
Collaboration and business teams work and operations teams work
together. together.
Version control for code,
Version control for code
Version Control models, datasets, and
(e.g., Git).
experiments.
Automate building, Automate training,
Automation testing, and deployment deployment, monitoring,
(CI/CD). and retraining of models.
Continuously retrain
Fix bugs and update
Maintenance models and improve
software versions.
predictions.
Building and updating
Building software with
Focus machine learning models
defined rules.
with data.
More complex; involves
Less complex; focuses on
Complexity data, models, and
software logic.
infrastructure.

You might also like