Idge Between MLflow and Kubeflow
Idge Between MLflow and Kubeflow
OVERVIEW
Create a Kubeflow Component that that enables users to run complex DAGs of AI computation in
Kubernetes, while using MLflow for Experiment Tracking, Artifact Management and Model
Management
GOALS
1. Enable smooth integration of Kubeflow and MLflow such that users will enjoy the benefits
of best of breed tools
2. Integration with MLflow includes open source MLflow and commercial implementations
such as Databricks, Azure ML and InfinStor
3. We do not propose new helm charts or other mechanisms for creating an MLflow service
in Kubernetes; our intention is to ensure that our project is able to use MLflow service
created in such a manner
4. We are keen on building a system that can manage DAGs across multiple Kubernetes
clusters
5. Data Management is integral to this project for two reasons:
a. parallelizing depends on partitioning of data
b. Cross-Kubernetes/cross-cloud access to data involves authentication
USE CASES
Concurrent is designed for complex pre-processing of AI data and for batch/micro-batch
inference. The DAG is well suited to decision making based on results of inference in prior steps
of the DAG.
Concurrent is not suitable for real time inference, i.e. it is not designed to start and operate a
fleet of inference containers. There are better tools in Kubeflow for that.
Concurrent is not suitable for distributed training - tools built into Tensorflow and Pytorch are
better suited for that.
MLflow works well using Spark for Structured Data in Data Warehouses,
lacks a compute engine for Unstructured Data
Databricks, the original inventor of MLflow, has ensured the smooth operation of MLflow in Data
Warehouses with Spark as the computation engine. They appear to be less focussed on using
Kubernetes for AI using unstructured data.