-
Notifications
You must be signed in to change notification settings - Fork 26
Add DSS plugin for MLflow #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DSS plugin for MLflow #191
Conversation
This pull request has been linked to Shortcut Story #75434: Write a mlflow.request_header_provider plugin for MLflow. |
Can you please:
|
…34-mlflow-plugin-autentication
…ure/sc-75437-mlflow-plugin-artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In load_dss_mlflow_plugin()
, having a fixed name will be an issue:
- if we have multiple API clients running on the same host
- even more if we change the content of the generated file in a later version
So, I would:
- randomize a bit the dir name
- give to the user the opportunity to clean up at the end.
I would probably return here the full path of the created dir and, have setup_mlflow
in DSSClient
return an object containing the set variables and this path.
In addition to this, I would add a context manager, so that we can write something like:
with client.setup_mlflow("pouet") as pouet:
...
And automatically unset env variables and cleanup directory at the end.
Co-authored-by: Ludovic Pénet <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM+Tested. Note: merging in the feature branch.
We follow the setup proposed in MLflow Tracking scenario 4 for tracking experiments with a remote backend server which is directly implemented in DSS public API (in https://github.com/dataiku/dip/tree/feature/mlflow-experiment-tracking) and an artifact host which is a DSS managed folder.
In order to communicate with DSS public API and handle manage folder for artifact store, we define in this PR a new MLflow plugin. Doc on MLflow plugins. What it does:
add authentication and project key to MLflow client requests
In this part, the communication with MLflow client is done by adding env variables "DSS_MLFLOW_HEADER", "DSS_MLFLOW_TOKEN", "DSS_MLFLOW_PROJECTKEY". The content of these env variables is just added to the headers of MLflow client requests.
add a connector to handle managed folder for MLflow artifacts
MLflow provides connectors for many backends: S3, FileSystem, Databricks, etc. Basically, implementing the connector means creating a new one for managed folder and using the dataikuapi's client methods to manipulate managed folders.
Indeed, in MLflow, to log artifacts, the MLflow client asks the backend a URI and then uses it to upload the artifact to the returned URI himself. So we need to give the MLflow client the ability to upload, download, list, delete, etc. from a managed folder. In the plugin, an instance of dataikuapi's client is spawned by the MLflow client to do the operations.
Note on the implementation:
To avoid loading the plugin for every user installing
dataiku-api-client-python
, the entry points of the plugins are added dynamically using "load_dss_mlflow_plugin" function instead of defining them in the setup.py.The plugin can be used using the PR: https://github.com/dataiku/dip/pull/14442 . Here is a code sample: