0% found this document useful (0 votes)
15 views7 pages

Airflow

Apache Airflow is a platform for authoring, scheduling, and monitoring workflows represented as Directed Acyclic Graphs (DAGs). It consists of key components including a Web Server for UI, a Scheduler for orchestrating tasks, and Executors for executing those tasks. Airflow allows users to define tasks in Python, manage task instances, and utilize various operators to execute commands or functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Airflow

Apache Airflow is a platform for authoring, scheduling, and monitoring workflows represented as Directed Acyclic Graphs (DAGs). It consists of key components including a Web Server for UI, a Scheduler for orchestrating tasks, and Executors for executing those tasks. Airflow allows users to define tasks in Python, manage task instances, and utilize various operators to execute commands or functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Airflow

Author , schedule and monitor


workflows

A Workflow is a sequence of tasks that processes a set of data ,


Starts on a scheduled date or triggered manually

• Define tasks in python


•View of past and present runs
•Nice UI
Airflow DAG

• Workflows represented as DAG


•DAGs are composed of tasks
Apache Airflow Architecture Components:
• Web Server:
– This is the UI of Airflow, that can be used to get an overview and visualizing of different components and states of each
DAG.

• Scheduler:
– This is the most important part of Airflow, which orchestrates (organize/arrange) various DAGs and their tasks, taking
care of their interdependencies, limiting the number of runs of each DAG so that one DAG doesn’t overwhelm the entire
system and makes it easy for users to schedule and run DAGs on Airflow.

• Executor:
– While the Scheduler orchestrates the tasks, the executors are the components that actually execute tasks.
– There are various types of executors that come with Airflow, such as SequentialExecutor, LocalExecutor,
CeleryExecutor and the KubernetesExecutor.

• Metadata Database:
– Airflow uses SQLAlchemy an Object Relational Mapping (ORM) written in Python to connect to the
metadata database. This means that any database supported by SQLALchemy can be used to store all the
Airflow metadata by default it uses SQLite database.

– This database stores metadata about DAGs, their runs, and other Airflow configurations like users, roles,
and connections.

– The Web Server shows the DAGs’ states and its runs from the database. The Scheduler also updates this
information in this metadata database.
Basic Airflow concepts
• Task: a defined unit of work (these are called operators in Airflow)
• Task instance: an individual run of a single task. Task instances also have
an indicative state, which could be “running”, “success”, “failed”,
“skipped”, “up for retry”, etc.
• DAG: Directed acyclic graph, a set of tasks (operators) with explicit
execution order, beginning, and end
• DAG run: individual execution/run of a DAG

Some basic operators


•BashOperator – used to execute bash commands
•PythonOperator – takes any python function as an input and calls the same
(this means the function should have a specific signature as well)
•SimpleHttpOperator – makes an HTTP request that can be used to trigger
actions on a remote system.
A basic dag
When a Dag is triggered
• DAGs are triggered at the end of schedule period rather than the
beginning
• Time at which DagRun is triggered = start_date + schedule_interval

Dag will be triggered at 2021-05-14 T – 00:00:00


Backfill and catchup
• The scheduler will backfill alll DagRun entries of a DAG whose time
dependency have been met if catchup is enabled
• Only the latest DagRun will be executed if catchup is disabled

You might also like