0% found this document useful (0 votes)
10 views3 pages

Airflow

Airflow is an open-source workflow management tool that uses directed acyclic graphs (DAGs) to define data pipelines. Tasks in the DAG can interface with external systems and are monitored by Airflow's scheduling engine. It allows users to address issues like retries, notifications, monitoring, and logging that traditional ETL pipelines lack.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

Airflow

Airflow is an open-source workflow management tool that uses directed acyclic graphs (DAGs) to define data pipelines. Tasks in the DAG can interface with external systems and are monitored by Airflow's scheduling engine. It allows users to address issues like retries, notifications, monitoring, and logging that traditional ETL pipelines lack.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

1

Airflow: (2.5.0)
 Airflow is an open-source tool used to create, schedule, and monitor workflows.
 It was developed by Airbnb and is now maintained by the ASF.
 Airflow allows users to define their workflows as code, which can be versioned,
tested, and reused.
 Airflow uses Directed Acyclic Graphs (DAGs) to define the structure of workflows,
with each task in the DAG representing a discrete unit of work.
 Tasks can be defined using a variety of operators, which can perform tasks such as
running SQL queries, executing scripts, and sending emails.
 Airflow's scheduling engine allows users to specify when tasks should be run, with
support for both cron-like schedules and more complex calendar-based schedules.
 Airflow also provides a web interface for monitoring the progress of workflows,
viewing logs, and manually triggering tasks.
 Airflow has become popular for its ability to integrate with a wide variety of
external systems and tools, such as databases, message queues, and cloud platforms.

 Traditional ETL data pipelines are sequentially executed.


 Hence any failure in middle of steps, we need to Re-run entire pipeline again which
will be time and resource consuming process.
 Also, there is No mechanism to address below points.

1. Retry and Notify


2. Monitoring
3. Logging

What is Airflow?
2

 Airflow is a workflow management tool. (NOT an ETL tool)


 Uses Directed Acyclic Graphs (DAG’s) to create data pipelines.
 Python is the language used in Airflow.
 It is highly scalable.

What is DAG?

 DAG stands for Directed Acyclic Graph which is directed and without cycles
connecting the other edges.
 The edges of DAG will go only one way.
 Airflow is a workflow management tool and DAG is used to create workflows.

 By default, Airflow uses SQLite, which is intended for development purposes only
since it supports single Read/ Write.
 Hence we cannot execute multiple workflows at a time.
 In PROD we need to use either Postgres/ My SQL.
3

You might also like