Developing Elegant Workflows in Python Code With Apache Airflow
Developing Elegant Workflows in Python Code With Apache Airflow
• I blog at http://michal.karzynski.pl
• sequence of tasks
• data warehousing
• A/B testing
• anomaly detection
with dag:
• should be idempotent
@apply_defaults
def __init__(self, my_param, *args, **kwargs):
self.task_param = my_param
super(MyFirstOperator, self).__init__(*args, **kwargs)
with dag:
my_first_task = MyFirstOperator(my_param='This is a test.',
task_id='my_task')
AIRFLOW CONCEPTS: SENSORS
XCom Pull:
def execute(self, context):
...
task_instance = context['task_instance']
sensors_minute = task_instance.xcom_pull('sensor_task_id', key='sensors_minute')
log.info('Valid minute as determined by sensor: %s', sensors_minute)
SCAN FOR INFORMATION UPSTREAM
task_instance = context['task_instance']
upstream_tasks = self.get_flat_relatives(upstream=True)
upstream_task_ids = [task.task_id for task in upstream_tasks]
upstream_database_ids = task_instance.xcom_pull(task_ids=upstream_task_ids, key='db_id')
• sane defaults
Operators
XCom
Sensor
CONDITIONAL EXECUTION:
BRANCH OPERATOR
def choose():
return 'first'
with dag:
branching = BranchPythonOperator(task_id='branching', python_callable=choose)
branching >> DummyOperator(task_id='first')
branching >> DummyOperator(task_id='second')
CONDITIONAL EXECUTION:
AIRFLOW SKIP EXCEPTION
def execute(self, context):
...
if not conditions_met:
log.info('Conditions not met, skipping.')
raise AirflowSkipException()
• all other exceptions cause retries and ultimately the task to fail
BashOperator(
task_id='templated',
bash_command=templated_command,
params={'my_param': 'Value I passed in'},
dag=dag)
AIRFLOW PLUGINS
• Subclass of AirflowPlugin
michal.karzynski.pl
THANK YOU