0% found this document useful (0 votes)
139 views33 pages

Airflow Chapter1

This document provides an introduction to Apache Airflow, including: - What is Airflow and how it can be used to program and schedule workflows. Airflow implements workflows as directed acyclic graphs (DAGs). - How DAGs represent the tasks that make up a workflow and their dependencies. A simple example DAG definition is shown. - How the Airflow web interface can be used to view DAGs and their details, schedules, recent task runs, and logs. The command line can also be used to interact with DAGs.

Uploaded by

massyweb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views33 pages

Airflow Chapter1

This document provides an introduction to Apache Airflow, including: - What is Airflow and how it can be used to program and schedule workflows. Airflow implements workflows as directed acyclic graphs (DAGs). - How DAGs represent the tasks that make up a workflow and their dependencies. A simple example DAG definition is shown. - How the Airflow web interface can be used to view DAGs and their details, schedules, recent task runs, and logs. The command line can also be used to interact with DAGs.

Uploaded by

massyweb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to

Airflow
INTRODUCTION TO AIRFLOW IN PYTHON

Mike Metzger
Data Engineer
What is data engineering?
Data engineering is:

Taking any action involving data and turning it into a reliable, repeatable, and maintainable
process.

INTRODUCTION TO AIRFLOW IN PYTHON


What is a workflow?
A work ow is:

A set of steps to accomplish a given data


engineering task
Such as: downloading les, copying data,
ltering information, writing to a
database, etc

Of varying levels of complexity

A term with various meaning depending on


context

INTRODUCTION TO AIRFLOW IN PYTHON


What is Airflow?
Air ow is a platform to program work ows,
including:

Creation

Scheduling

Monitoring

INTRODUCTION TO AIRFLOW IN PYTHON


Airflow continued...
Can implement programs from any
language, but work ows are wri en in
Python

Implements work ows as DAGs: Directed


Acyclic Graphs

Accessed via code, command-line, or via


web interface

1 h ps://air ow.apache.org/docs/stable/

INTRODUCTION TO AIRFLOW IN PYTHON


Other workflow tools
Other tools:

Luigi

SSIS

Bash scripting

INTRODUCTION TO AIRFLOW IN PYTHON


Quick introduction to DAGs
A DAG stands for Directed Acyclic Graph

In Air ow, this represents the set of tasks


that make up your work ow.

Consists of the tasks and the dependencies


between tasks.

Created with various details about the


DAG, including the name, start date, owner,
etc.

Further depth in the next lesson.

INTRODUCTION TO AIRFLOW IN PYTHON


DAG code example
Simple DAG de nition:

etl_dag = DAG(
dag_id='etl_pipeline',
default_args={"start_date": "2020-01-08"}
)

INTRODUCTION TO AIRFLOW IN PYTHON


Running a workflow in Airflow
Running a simple Air ow task

airflow run <dag_id> <task_id> <start_date>

Using a DAG named example-etl, a task named download- le and a start date of 2020-01-10:

airflow run example-etl download-file 2020-01-10

INTRODUCTION TO AIRFLOW IN PYTHON


Let's practice!
INTRODUCTION TO AIRFLOW IN PYTHON
Airflow DAGs
INTRODUCTION TO AIRFLOW IN PYTHON

Mike Metzger
Data Engineer
What is a DAG?
DAG, or Directed Acyclic Graph:

Directed, there is an inherent ow


representing dependencies between
components.

Acyclic, does not loop / cycle / repeat.

Graph, the actual set of components.

Seen in Air ow, Apache Spark, Luigi

1 h ps://en.m.wikipedia.org/wiki/Directed_acyclic_graph

INTRODUCTION TO AIRFLOW IN PYTHON


DAG in Airflow
Within Air ow, DAGs:

Are wri en in Python (but can use components wri en in other languages).

Are made up of components (typically tasks) to be executed, such as operators, sensors,


etc.

Contain dependencies de ned explicitly or implicitly.


ie, Copy the le to the server before trying to import it to the database service.

INTRODUCTION TO AIRFLOW IN PYTHON


Define a DAG
Example DAG:

from airflow.models import DAG

from datetime import datetime


default_arguments = {
'owner': 'jdoe',
'email': '[email protected]',
'start_date': datetime(2020, 1, 20)
}

etl_dag = DAG( 'etl_workflow', default_args=default_arguments )

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs on the command line
Using airflow :

The airflow command line program contains many subcommands.

airflow -h for descriptions.

Many are related to DAGs.

airflow list_dags to show all recognized DAGs.

INTRODUCTION TO AIRFLOW IN PYTHON


Command line vs Python
Use the command line tool to: Use Python to:

Start Air ow processes Create a DAG

Manually run DAGs / Tasks Edit the individual properties of a DAG

Get logging information from Air ow

INTRODUCTION TO AIRFLOW IN PYTHON


Let's practice!
INTRODUCTION TO AIRFLOW IN PYTHON
Airflow web
interface
INTRODUCTION TO AIRFLOW IN PYTHON

Mike Metzger
Data Engineer
DAGs view

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view DAGs

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view schedule

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view owner

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view recent tasks

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view last run

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view last three

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view links

INTRODUCTION TO AIRFLOW IN PYTHON


DAGs view example_dag

INTRODUCTION TO AIRFLOW IN PYTHON


DAG detail view

INTRODUCTION TO AIRFLOW IN PYTHON


DAG graph view

INTRODUCTION TO AIRFLOW IN PYTHON


DAG code view

INTRODUCTION TO AIRFLOW IN PYTHON


Logs

INTRODUCTION TO AIRFLOW IN PYTHON


Web UI vs command line
In most cases:

Equally powerful depending on needs

Web UI is easier

Command line tool may be easier to access depending on se ings

INTRODUCTION TO AIRFLOW IN PYTHON


Let's practice!
INTRODUCTION TO AIRFLOW IN PYTHON

You might also like