100% found this document useful (2 votes)
161 views39 pages

Airflow 2 X

The document discusses Apache Airflow 2.x, a new workflow orchestrator. It provides an overview of Airflow's modern UI, strong DAG authoring capabilities, extensibility as an open source platform, and solid infrastructure. Airflow 2.0 became a top-level Apache project in 2022 after incubating since 2018. The document outlines Airflow's timeline and many of its new features, including its UI, task dependencies, deferrable operators, notifiers, open standards integrations, and security improvements planned for the future.

Uploaded by

tungnv229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
161 views39 pages

Airflow 2 X

The document discusses Apache Airflow 2.x, a new workflow orchestrator. It provides an overview of Airflow's modern UI, strong DAG authoring capabilities, extensibility as an open source platform, and solid infrastructure. Airflow 2.0 became a top-level Apache project in 2022 after incubating since 2018. The document outlines Airflow's timeline and many of its new features, including its UI, task dependencies, deferrable operators, notifiers, open standards integrations, and security improvements planned for the future.

Uploaded by

tungnv229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

`New` Workflow Orchestrator in town:

"Apache Airflow 2.x"

Jarek Potiuk
Apache Airflow Committer
https://github.com/potiuk

December 12-14 2023 December 12-14 2023


Airflow as ..
● Modern UI

● With great DAG Authoring capabilities

● Being extensible platform

● Being True Open Source with as strong community you can get

● Solid infrastructure

● Shortly - modern orchestrator of your choice :)

OSA CON | December 12-14 2023 2


Airflow 2 timeline

2.4 2.5 2.6 2.7 2.8


Airflow 2.0 2.1 2.2 2.3

ASF
ASF Top Level
AirBnB Incubator Project

10th Anniversary next year:


Unofficial
Airflow Summit 2024, September, Bay Area, 1000+ attendees

OSA CON | December 12-14 2023 3


Modern UI

OSA CON | December 12-14 2023 4


Grid View

OSA CON | December 12-14 2023 5


Graph View

OSA CON | December 12-14 2023 6


Log view

OSA CON | December 12-14 2023 7


Gantt
view

OSA CON | December 12-14 2023 8


Cluster
Activity

OSA CON | December 12-14 2023 9


DAG Authoring

OSA CON | December 12-14 2023 10


Handling dependencies
● An issue in early Airflow 2.0 days - much less nowadays

● Multiple options to handle it


○ Python Virtualenv Operator, External Python Operator, Docker Operator, Kubernetes Pod Operator,

Multiple Docker Images + Celery Queues

○ Coming soon -> Multi-tenancy setup with per-tenant dependencies

● Mastering dependencies: The Airflow Way talk from Airflow Summit 2023

● Plays super-well with Task Flow

OSA CON | December 12-14 2023 11


TaskFlow

OSA CON | December 12-14 2023 12


Task Flow cases
Core: Providers:
● @dag ● @task.docker
● @task.python ● @task.kubernetes
● @task.sftp_sensor
● @task.virtualenv
● …
● @task.external_python ● Providers can provide their own
● @task.sensor
… and
● @task.branch
● @task.short_circuit ● @task_group
● @task.bash (coming)

OSA CON | December 12-14 2023 13


Task Groups

OSA CON | December 12-14 2023 14


Dynamic Task and Group mapping

● Map Reduce - kind of workflows if you want Airflow to also “do stuff”
● You can parallelise even complex workflows

OSA CON | December 12-14 2023 15


Dynamic Task mapping

OSA CON | December 12-14 2023 16


Deferrable (AsyncIO) operators

80% - 90%
performance
improvements
● no worker slots while waiting (other jobs can run)
● multiple 100s of Deferrable Operators out-of-the-box
● 10s of Triggers available
● you can roll your own Trigger

OSA CON | December 12-14 2023 17


Setup/Teardown

OSA CON | December 12-14 2023 18


Notifiers

● easily reusable notifiers when your task fails (or not)

OSA CON | December 12-14 2023 19


Object storage - FsSpec (Coming in Airflow 2.8)
● Open standard
● Integrates with all object storages
● Modern Pythonic way of interacting
○ Pathlib
● Supported by:
○ Pandas, Polars, Parquet, DuckDB, Iceberg,
PyArrow
● One way to rule them all

OSA CON | December 12-14 2023 20


Data-aware scheduling

● Micropipelines concept
● Still early days
● But mind-boggling things are coming (Object storage integration, Partial
Datasets, Data aware triggering, Open Lineage)

OSA CON | December 12-14 2023 21


LLM Operators

Donated by Astronomer (yay!)

● Open AI
● Cohere
● Weviate
● pgvector
● Pinecone
● OpenSearch

Powering @AskAstro:
https://ask.astronomer.io/

OSA CON | December 12-14 2023 22


Airflow as a platform

OSA CON | December 12-14 2023 23


Open Lineage
● Integrated in Airflow

● Column level lineage

● Better TaskFlow
support in works

● Great adoption as
open standard

OSA CON | December 12-14 2023 24


Open Lineage

OSA CON | December 12-14 2023 25


Open Telemetry
● Integrated in Airflow

● Adopted by
everyone

● Still early days

● Traces, Log support


in the works

OSA CON | December 12-14 2023 26


Astronomer’s Cosmos

OSA CON | December 12-14 2023 27


Fully fledged REST API

OSA CON | December 12-14 2023 28


Engineering friendliness
● Workflow as a code front and center
● Tests
○ airflow task test
○ airflow dag test
○ unit test guidelines
○ system tests support
● Running Airlow locally
○ airflow standalone
○ docker compose
○ airflowctl - by Kaxil, Airflow PMC member

OSA CON | December 12-14 2023 29


Airflow IS Open Source
(and always will)

OSA CON | December 12-14 2023 30


Community - part of Apache Software Foundation
● The largest project in ASF (for contributors count) >2700

● Licencing ASF, permissive licence (that will NEVER change)

● Well established, strong governance

● 61 committers, 32 PMC members

● Stakeholders/Managed services/Vendor neutrality


○ Astronomer, Amazon, Google, Microsoft, …

● Security / Release process / Maintenance certainty

OSA CON | December 12-14 2023 31


Tools integrating
with Airflow
● DAG visual editors
● Declarative DAG authoring
● IDE integration
● CLIs to manage Airflow
● Debugging aids
● UI extensions
● …

OSA CON | December 12-14 2023 32


Solid Infrastructure

OSA CON | December 12-14 2023 33


Public Interface of Airflow

OSA CON | December 12-14 2023 34


Providers
● Can upgrade/downgrade separately
● Can provide:
○ Hooks/Operators/Sensors,Extra-links, Connection types
○ Secret Backends, Triggers, Log Handlers,
○ Executors, Notifications, Configuration, Decorators
○ Filesystems (2.8)
● Full lifecycle of providers defined
○ Approval by community (or not)
○ Support lifecycle for multiple Airflow versions
○ Suspension/Resuming/Removal
● 3rd-party providers and registries

OSA CON | December 12-14 2023 35


Extensible user management

OSA CON | December 12-14 2023 36


Security - coming soon for everyone

● Regulations are coming (CRA act just agreed in EU Trilogue)


● Airflow is part of the HackerOne OSS Bounty
● Highly functional Security Team ~50 reports handled
● 4 Airflow contributors: Sovereign Tech Fund funding for security
○ Security Model and Security Policy
○ SBOM generated
○ Securing release process (reproducible builds)
○ Component Isolation (Multi-tenancy in progress)

OSA CON | December 12-14 2023 37


Summary

● Airflow is a modern, solid orchestrator with strong foundations

● New, slick ways to interact with the Modern Data Stack

● True Open Source

● Community is huge, strong and supportive

● More, exciting things are coming. Fast.

OSA CON | December 12-14 2023 38


Q&A
https://github.com/potiuk

https://www.linkedin.com/in/jarekpotiuk

OSA CON | December 12-14 2023 39

You might also like