0% found this document useful (0 votes)
36 views3 pages

Idge Between MLflow and Kubeflow

Uploaded by

dba.qin.cn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

Idge Between MLflow and Kubeflow

Uploaded by

dba.qin.cn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

The Concurrent Project

Build a Bridge Between


MLflow and Kubeflow
4th September 2022

OVERVIEW
Create a Kubeflow Component that that enables users to run complex DAGs of AI computation in
Kubernetes, while using MLflow for Experiment Tracking, Artifact Management and Model
Management

GOALS
1. Enable smooth integration of Kubeflow and MLflow such that users will enjoy the benefits
of best of breed tools
2. Integration with MLflow includes open source MLflow and commercial implementations
such as Databricks, Azure ML and InfinStor
3. We do not propose new helm charts or other mechanisms for creating an MLflow service
in Kubernetes; our intention is to ensure that our project is able to use MLflow service
created in such a manner
4. We are keen on building a system that can manage DAGs across multiple Kubernetes
clusters
5. Data Management is integral to this project for two reasons:
a. parallelizing depends on partitioning of data
b. Cross-Kubernetes/cross-cloud access to data involves authentication

NOTE ON PROJECT NAME


We initially named this project MLflow Parallels. However, given our interest in turning it into a
bridge between MLflow and Kubeflow, we renamed the project Concurrent.

USE CASES
Concurrent is designed for complex pre-processing of AI data and for batch/micro-batch
inference. The DAG is well suited to decision making based on results of inference in prior steps
of the DAG.

Concurrent is not suitable for real time inference, i.e. it is not designed to start and operate a
fleet of inference containers. There are better tools in Kubeflow for that.

Concurrent is not suitable for distributed training - tools built into Tensorflow and Pytorch are
better suited for that.

MLFLOW ADVANTAGES AND DISADVANTAGES

MLflow works well using Spark for Structured Data in Data Warehouses,
lacks a compute engine for Unstructured Data
Databricks, the original inventor of MLflow, has ensured the smooth operation of MLflow in Data
Warehouses with Spark as the computation engine. They appear to be less focussed on using
Kubernetes for AI using unstructured data.

MLflow works well for Experiment Tracking and Model Management


Experiment tracking and Model Management are very successful features in MLflow. Users love it
and use it.

MAJOR COMPONENTS OF CONCURRENT


1. Control Plane - initiates and controls DAGs, manages pods, etc.
2. Container Creation in k8s: Enables low resource environments such as web
browsers and serverless functions to run Concurrent DAGs
3. Access Control: Access to external MLflow, external storage, etc. need to be handled
so that end user code in pods created by Concurrent have access to MLflow, to MLflow
Artifacts Storage, etc.

CURRENT STATE OF PROJECT


● Initial version with Control Plane in AWS (a new control plane that runs in k8s needs to be
developed) and support for InfinStor MLflow for MLflow and EKS, GKE for Kubernetes is
now available in github - https://github.com/concurrent-ai/concurrent.git
● Development has just started for support of Kubeflow Pipeline Components
(https://www.kubeflow.org/docs/components/pipelines/v1/reference/component-spec/) as
a node in Concurrent’s DAG (current code base only knows to deal with MLflow Projects)

You might also like