Ploomber sample projects

This repository contains sample pipelines developed using Ploomber.

Note: We recommend you to go through the first tutorial to learn the basics of Ploomber.

Running examples

Use Colab:

Use Binder (free, hosted JupyterLab):

Or run locally:

pip install ploomber

# list examples
ploomber examples

# download example with name
ploomber examples --name {name}

# example
ploomber examples --name templates/mlflow

How to read the examples

Each example contains a README.md file that describes it; a README.ipynb is also available with the same contents but in Jupyter notebook format and with command outputs. In addition, files for pip (requirements.txt) and conda (environment.yml) are provided for local execution.

Index

Templates

Starting points for common use cases. Use them to ramp up a project quickly.

templates/etl Download a data file, upload it to a database, process it, and plot with Python and R.
templates/exploratory-analysis Sample pipeline that explores penguins data.
templates/google-cloud Use Google Cloud and Ploomber to develop a scalable and production-ready pipeline.
templates/ml-advanced ML pipeline using the Python API. Shows how to create a Python package, test it with pytest, and train models in parallel.
templates/ml-basic Download data, clean it, generate features and train a model.
templates/ml-intermediate Training and serving ML pipelines with integration testing to evaluate training data quality.
templates/ml-online Load data, generate features, train a model, and deploy model with flask.
templates/mlflow Train a grid of models and log them to MLflow.
templates/python-api Loads, clean, and plot data using the Python API.
templates/pytorch Using GPUs to train models in Ploomber Cloud.
templates/shell Create a pipeline with shell scripts as tasks.
templates/spec-api-directory Create a pipeline from a directory with scripts (without a pipeline.yaml file).
templates/spec-api-r Load, clean and plot data with R.
templates/spec-api-sql Use SQL scripts to manipulate data in a database, dump a table, and plot it with Python.

Cookbook

Short and to-the-point examples showing how to use a specific feature.

cookbook/dynamic-params Pipeline parameters whose values are computed at runtime.
cookbook/file-client Upload task's products upon execution (local, S3, GCloud storage)
cookbook/grid An example showing how to create a grid of tasks to train models with different parameters.
cookbook/hooks Task hooks
cookbook/incremental A pipeline that processes new records from a database and uploads them.
cookbook/nested-cv Nested cross-validation for model selection and hyperparameter tuning.
cookbook/python-load Load pipeline.yaml file in a Python session to customize initialization.
cookbook/report-generation Generating HTML/PDF reports.
cookbook/serialization Shows how to use the serializer and unserializer decorators.
cookbook/sql-dump A minimal example showing how to dump a table from a SQL database.
cookbook/variable-number-of-products Shows how to create tasks whose number of products depends on runtime conditions.

Guides

In-depth tutorials for learning. These are part of the documentation.

guides/cron This guide shows how to schedule Ploomber pipelines using cron.
guides/debugging Tutorial showing techniques for debugging pipelines.
guides/first-pipeline Introductory tutorial to learn the basics of Ploomber.
guides/intro-to-ploomber Introductory tutorial to learn the basics of Ploomber.
guides/logging Tutorial showing how to add logging to a pipeline.
guides/monitoring To show the capabilities we'll run our pipeline monitoring through a pre-built Ploomber template: ml-basic.
guides/parametrized Tutorial showing how to parametrize pipelines and change parameters from the command-line.
guides/refactor Using Soorgeon to convert a notebook into a Ploomber pipeline.
guides/serialization Tutorial explaining how the serializer and unserializer fields in a pipeline.yaml file work.
guides/sql-templating Introductory tutorial teaching how to develop modular SQL pipelines.
guides/testing Tutorial showing how to use a task's on_finish hook to test data quality.
guides/versioning A tutorial showing how to version pipeline products.

Python API

The simplest way to get started with Ploomber is via the Spec API, which allows you to describe pipelines using a pipeline.yaml file, most examples on this repository use the Spec API. However, if you want more flexibility, you may write pipelines with Python.

The templates/python-api/ directory contains a project written using the Python API. And the python-api-examples/ includes some tutorials and more examples.

Micro-pipelines

In Ploomber 0.21, we introduced a simplified API to write pipelines in a single Jupyter notebook (or .py) file. This is a great option for small projects.

You can find the examples in the micro-pipelines/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 538 Commits
.github/workflows		.github/workflows
_pkg		_pkg
_static		_static
cookbook		cookbook
guides		guides
micro-pipelines		micro-pipelines
python-api-examples		python-api-examples
templates		templates
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ipynb		README.ipynb
README.md		README.md
_category.json		_category.json
_index.csv		_index.csv
_source.md		_source.md
environment.yml		environment.yml
index.csv		index.csv
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ploomber sample projects

Running examples

How to read the examples

Index

Templates

Cookbook

Guides

Python API

Micro-pipelines

About

Releases

Packages

Languages

License

booysej/projects

Folders and files

Latest commit

History

Repository files navigation

Ploomber sample projects

Running examples

How to read the examples

Index

Templates

Cookbook

Guides

Python API

Micro-pipelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages