Join our community | Newsletter | Docs | Twitter | Blog | YouTube | Contact us
This repository contains sample pipelines developed using Ploomber.
Note: We recommend you to go through the first tutorial to learn the basics of Ploomber.
Use Colab:
Use Binder (free, hosted JupyterLab):
Or run locally:
pip install ploomber
# list examples
ploomber examples
# download example with name
ploomber examples --name {name}
# example
ploomber examples --name templates/mlflow
Each example contains a README.md
file that describes it; a README.ipynb
is also available with the same contents but in Jupyter notebook format and with command outputs. In addition, files for pip
(requirements.txt
) and conda
(environment.yml
) are provided for local execution.
Starting points for common use cases. Use them to ramp up a project quickly.
-
templates/etl
Download a data file, upload it to a database, process it, and plot with Python and R. -
templates/exploratory-analysis
Sample pipeline that explores penguins data. -
templates/google-cloud
Use Google Cloud and Ploomber to develop a scalable and production-ready pipeline. -
templates/ml-advanced
ML pipeline using the Python API. Shows how to create a Python package, test it with pytest, and train models in parallel. -
templates/ml-basic
Download data, clean it, generate features and train a model. -
templates/ml-intermediate
Training and serving ML pipelines with integration testing to evaluate training data quality. -
templates/ml-online
Load data, generate features, train a model, and deploy model with flask. -
templates/mlflow
Train a grid of models and log them to MLflow. -
templates/python-api
Loads, clean, and plot data using the Python API. -
templates/pytorch
Using GPUs to train models in Ploomber Cloud. -
templates/shell
Create a pipeline with shell scripts as tasks. -
templates/spec-api-directory
Create a pipeline from a directory with scripts (without a pipeline.yaml file). -
templates/spec-api-r
Load, clean and plot data with R. -
templates/spec-api-sql
Use SQL scripts to manipulate data in a database, dump a table, and plot it with Python.
Short and to-the-point examples showing how to use a specific feature.
-
cookbook/dynamic-params
Pipeline parameters whose values are computed at runtime. -
cookbook/file-client
Upload task's products upon execution (local, S3, GCloud storage) -
cookbook/grid
An example showing how to create a grid of tasks to train models with different parameters. -
cookbook/hooks
Task hooks -
cookbook/incremental
A pipeline that processes new records from a database and uploads them. -
cookbook/nested-cv
Nested cross-validation for model selection and hyperparameter tuning. -
cookbook/python-load
Load pipeline.yaml file in a Python session to customize initialization. -
cookbook/report-generation
Generating HTML/PDF reports. -
cookbook/serialization
Shows how to use the serializer and unserializer decorators. -
cookbook/sql-dump
A minimal example showing how to dump a table from a SQL database. -
cookbook/variable-number-of-products
Shows how to create tasks whose number of products depends on runtime conditions.
In-depth tutorials for learning. These are part of the documentation.
-
guides/cron
This guide shows how to schedule Ploomber pipelines using cron. -
guides/debugging
Tutorial showing techniques for debugging pipelines. -
guides/first-pipeline
Introductory tutorial to learn the basics of Ploomber. -
guides/intro-to-ploomber
Introductory tutorial to learn the basics of Ploomber. -
guides/logging
Tutorial showing how to add logging to a pipeline. -
guides/monitoring
To show the capabilities we'll run our pipeline monitoring through a pre-built Ploomber template: ml-basic. -
guides/parametrized
Tutorial showing how to parametrize pipelines and change parameters from the command-line. -
guides/refactor
Using Soorgeon to convert a notebook into a Ploomber pipeline. -
guides/serialization
Tutorial explaining how the serializer and unserializer fields in a pipeline.yaml file work. -
guides/sql-templating
Introductory tutorial teaching how to develop modular SQL pipelines. -
guides/testing
Tutorial showing how to use a task's on_finish hook to test data quality. -
guides/versioning
A tutorial showing how to version pipeline products.
The simplest way to get started with Ploomber is via the Spec API, which allows you to describe pipelines using a pipeline.yaml
file, most examples on this repository use the Spec API. However, if you want more flexibility, you may write pipelines with Python.
The templates/python-api/
directory contains a project written using the Python API. And the python-api-examples/
includes some tutorials and more examples.
In Ploomber 0.21
, we introduced a simplified API to write pipelines in a single Jupyter notebook (or .py
) file. This is a great option for small projects.
You can find the examples in the micro-pipelines/
directory.