0% found this document useful (0 votes)

960 views17 pages

Apache Airflow Certification - Study Guide For DAG Authoring

This study guide covers the topics needed to pass the Astronomer Certification DAG Authoring for Apache Airflow exam. It discusses variables, pools, trigger rules, and DAG dependencies in Airflow including common use cases and best practices for each.

Uploaded by

Don

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

960 views17 pages

Apache Airflow Certification - Study Guide For DAG Authoring

Uploaded by

Don

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Home Blog About Contact 

Data Pipelines Apache Airflow Study Guide

Study Guide for DAG

Authoring for Apache
Airflow Certification
Orest Tokovenko – 27 December 2021

This study guide covers the Astronomer Certification DAG Authoring for Apache
Airflow. Apache Airflow is the leading orchestrator for authoring, scheduling, and
monitoring data pipelines. The exam consists of 75 questions, and you have 60
minutes to write it. The study guide below covers everything you need to know for it.
The exam includes scenarios (both text and images of Python code) where you need
to determine what the output will be, if any at all. To study for this exam I watched the
official Astronomer preparation course, I highly recommend it.
According to Astronomer, this exam will test the following:

“You have to show your capabilities of understanding the different features that
Airflow brings to create DAGs. What are the pros and cons of each one as well as their
limitations. You should be comfortable for recommending settings and design choices
for data pipelines according to different use cases. You should know the most
common operators as well as the specificities of others allowing to define DAG
dependencies, choose different branches, wait for events and so on”

Study Guide

• Variables

◦ Variables are a generic way to store and retrieve arbitrary content or settings as a
simple key value store within Airflow. Variables can be listed, created, updated and
deleted from the UI. Airflow uses Fernet to encrypt variables stored in the
metastore database. It guarantees that without the encryption password, content
cannot be manipulated or read without the key. For information on configuring
Fernet, look at Fernet.

◦ Variables can be viewed using the Airflow UI under Variables

Variables in Airflow UI

◦ Best practice: try to fetch the variables within the tasks, to avoid making useless
connections every 30 seconds

◦ You can use JSON values to call multiple variables in an efficient manner, using the
Jinja Template Engine. You can create the variable list using the Airflow UI

{extract = PythonOperator(
task_id="extract".
python_callable=_extract,
op_args=["{{ var.json.my_dag_partner.name }}"])

• Pools

◦ Airflow pools are used to limit the execution parallelism on arbitrary sets of tasks.
Each time a task is running, a slot is given to that task throughout its execution.
Once the task is finished, the slot is free again and ready to be given to another
task. A slot is given regardless of the resources a task needs. It’s really just a slot, 1
task = 1 slot. If there is no more slots available, the tasks will be queued and so
the number of queued slots will increase. By default, you can execute at most 128
tasks at the same time as the default pool has 128 slots, and it is running in the
default_pool.

◦ The concurrency or the limit to the maximum number of running tasks for a given
DAG is set to 16 by default (dag_concurrency / concurrency). In other words, you
can run 16 tasks at the same time for a same DAG.

◦ 1 task running at a time is equal to 1 worker

◦ There are three ways you can create and manage a pool in Airflow.
▪ Create a pool through the Airflow UI
▪ Go to Admin → Pools and add a new record. You can define a name, the
number of slots, and a description.

▪ Create a pool using the Airflow CLI with the airflow pools command

▪ Create a pool using the Airflow REST API

◦ By default, all tasks in Airflow get assigned to the default_pool which has 128 slots.
You can modify this value, but you can’t remove the default pool. Tasks can be
assigned to any other pool by updating the pool parameter. This parameter is part
of the BaseOperator, so it can be used with any operator.

task_a = PythonOperator(
task_id='task_a',
python_callable=sleep_function,
pool='single_task_pool'
)

• Trigger Rules

◦ Basically, a trigger rule defines why a task gets triggered, on which condition. By
default, all tasks have the same trigger rule all_success set which means, if all
parents of a task succeed, then the task gets triggered. Only one trigger rule at a
time can be specified for a given task

◦ Here is a list of common trigger rules

▪ all_success: the task gets triggered when all upstream tasks (parents) have
succeeded.

▪ all_failed: gets triggered if all the parent tasks fail

▪ all_done: You just want to trigger your task once all upstream tasks (parents)
are done with their execution whatever their state. good for emails and Slack
messages, regardless of success or failure

▪ one_failed and one_success: if one of the parents succeeds or fails, not

waiting for all the parents to complete. Can be useful if you have some long-
running tasks and want to do something as soon as one fails.

▪ none_failed: if the parents are skipped or succeeded. Won’t trigger if one of

the parents has failed or has a status of upstream_failed. Only useful if you
want to handle the skipped status.

▪ none_skipped: With this simple trigger rule, your task gets triggered if no
upstream tasks are skipped. If they are all in success or failed. Use case if you use
branch python operator

▪ none_failed_or_skipped: trigger tasks if at least one of the parents has

succeeded and all the parents have been completed

▪ dummy: does nothing, just triggers

• DAG Dependencies

◦ ExternalTaskSensor: Waits for a different DAG or a task in a different DAG to

complete for a specific execution_date
▪ external_dag_id:The dag_id that contains the task you want to wait for

▪ external_task_id:The task_id that contains the task you want to wait for. If
None the sensor waits for the DAG

▪ allowed_states: list of allowed states, default is success

▪ execution_delta: time difference with the previous execution to look at, the
default is the same execution_date as the current task or DAG. For yesterday, use
positive datetime.timedelta(days=1). Either execution_delta or
execution_date_fn can be passed to ExternalTaskSensor, but not both.

▪ execution_date_fn: function that receives the current execution date and

returns the desired execution dates to query. Either execution_delta or
execution_date_fn can be passed to ExternalTaskSensor, but not both.

▪ check_existence: Set to True to check if the external task exists (when

external_task_id is not None) or check if the DAG to wait for exists (when
external_task_id is None), and immediately cease waiting if the external task or
DAG

▪ failed_states: expects a list of statuses. By default it is empty, so if the task

you are waiting for fails, the task will waitng forever. So you need to define this.
You can put failed and skipped

▪ allowed_states: expects a list of statuses. You can say success, so if the task
you are waiting for succeeded, then it works

◦ TriggerDagRunOperator: An easy way to implement cross-DAG dependencies.

This operator allows you to have a task in one DAG that triggers another DAG in
the same Airflow environment. The TriggerDagRunOperator is ideal in situations
where you have one upstream DAG that needs to trigger one or more
downstream DAGs, or if you have dependent DAGs that have both upstream and
downstream tasks in the upstream DAG (i.e. the dependent DAG is in the middle
of tasks in the upstream DAG). Because you can use this operator for any task in
your DAG, it is highly flexible. It’s also an ideal replacement for SubDAGs. Below is
an example DAG that implements the TriggerDagRunOperator to trigger the
dependent-dag between two other tasks.
▪ Here is a list of arguments that it takes

▪ trigger_dag_id: The dag_id to trigger (templated).

▪ trigger_run_id: The run ID to use for the triggered DAG run (templated). If
not provided, a run ID will be automatically generated.
▪ conf: Configuration for the DAG run.

▪ execution_date: Execution date for the dag (templated).

▪ reset_dag_run: Whether or not clear existing dag run if already exists. This is
useful when backfill or rerun an existing dag run. When reset_dag_run=False
and dag run exists, DagRunAlreadyExists will be raised. When
reset_dag_run=True and dag run exists, existing dag run will be cleared to
rerun.

▪ wait_for_completion: Whether or not wait for dag run completion. (default:

False)

▪ poke_interval: Poke interval to check dag run status when

wait_for_completion=True. (default: 60)

▪ allowed_states: List of allowed states, default is success

▪ failed_states: List of failed or dis-allowed states, default is None.

from airflow import DAG

from airflow.operators.python import PythonOperator
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from datetime import datetime, timedelta

def print_task_type(**kwargs):
"""
Dummy function to call before and after dependent DAG.
"""
print(f"The {kwargs['task_type']} task has completed.")

# Default settings applied to all tasks

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}

with DAG('trigger-dagrun-dag',
start_date=datetime(2021, 1, 1),
max_active_runs=1,
schedule_interval='@daily',
default_args=default_args,
catchup=False
) as dag:

start_task = PythonOperator(
task_id='starting_task',
python_callable=print_task_type,
op_kwargs={'task_type': 'starting'}
)

trigger_dependent_dag = TriggerDagRunOperator(
task_id="trigger_dependent_dag",
trigger_dag_id="dependent-dag",
wait_for_completion=True
)

end_task = PythonOperator(
task_id='end_task',
python_callable=print_task_type,
op_kwargs={'task_type': 'ending'}
)

start_task >> trigger_dependent_dag >> end_task

◦ There are two ways of defining dependencies

▪ Old way, which is not recommended
▪ set_upstream

▪ set_downstream

▪ New way, which is the current standard

▪ >> and <<

▪ If you want to create dependencies between two lists, you need to use cross
dependencies
▪ cross_downstream([t1,t2,t3],[t4,t5,t6])

▪ You cannot add anything to this, you would need to define on a new line

▪ [t4,t5,t6] >> t7

◦ Chain Dependencies
▪ What if you want to set multiple parallel cross-dependencies? Unfortunately,
Airflow can’t parse dependencies between two lists (e.g. [t0, t1] » [t2, t3] throws
an error). If you need to set dependencies in this manner, you can use Airflow’s
chain function:

from airflow import DAG

from airflow.operators.dummy_operator import DummyOperator
from airflow.models.baseoperator import chain

with DAG('dependencies',
) as dag:

t0 = DummyOperator(task_id='t0')
t1 = DummyOperator(task_id='t1')
t2 = DummyOperator(task_id='t2')
t3 = DummyOperator(task_id='t3')
t4 = DummyOperator(task_id='t4')
t5 = DummyOperator(task_id='t5')
t6 = DummyOperator(task_id='t6')

chain(t0, t1, [t2, t3], [t4, t5], t6)

▪ it produces this graph

chain function

▪ depends_on_past: When set to True, keeps a task from getting triggered if the
previous schedule for the task hasn’t succeeded. Unless a previous run of your
DAG has failed, the depends_on_past should not be a factor, it will not affect the
current run at all if the previous run executed the tasks successfully. It does not
prevent the next DAG run from running if the previous DAG run did not succeed.
There will be no status.

• Idempotency

◦ When designing data pipelines, always aim for idempotence and determinism
▪ Determintistic: When you execute your task with a certain input, you will always
get the same output

▪ Idempotence: When you execute multiple times, your task will always produce
the same result
▪ PostgresOperator(task_id='create_table', sql='CREATE TABLE
my_table;') is not idempotent

▪ PostgresOperator(task_id='create_table', sql='CREATE TABLE IF

NOT EXISTS my_table;') is idempotent

• Dynamic DAGs

◦ In Airflow, DAGs are defined as Python code. Airflow executes all Python code in
the dags_folder and loads any DAG objects that appear in globals(). The simplest
way of creating a DAG is to write it as a static Python file. However, sometimes
manually writing DAGs isn’t practical. Maybe you have hundreds or thousands of
DAGs that do similar things with just a parameter changing between them. Or
maybe you need a set of DAGs to load tables, but don’t want to manually update
DAGs every time those tables change. In these cases, and others, it can make more
sense to dynamically generate DAGs.
◦ Whenever you have multiple DAGs that have the same tasks for which only the
inputs change, then it might be better to generate those DAGs dynamically.

◦ There are two methods:

▪ Single-File Method
▪ One method for dynamically generating DAGs is to have a single Python file
which generates DAGs based on some input parameter(s) (e.g. a list of APIs or
tables). A common use case for this is an ETL or ELT-type pipeline where there
are many data sources or destinations. This requires creating many DAGs that
all follow a similar pattern.

▪ Benefits:
▪ It’s simple and easy to implement

▪ It can accommodate input parameters from many different sources (see a

few examples below)

▪ Adding DAGs is nearly instantaneous since it requires only changing the

input parameters.

▪ Drawbacks:
▪ Since a DAG file isn’t actually being created, your visibility into the code
behind any specific DAG is limited.

▪ Since this method requires a Python file in the dags_folder, the generation
code will be executed every time the dag is parsed

▪ Process

from airflow import DAG

from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def create_dag(dag_id,
schedule,
dag_number,
default_args):

def hello_world_py(*args):
print('Hello World')
print('This is DAG: {}'.format(str(dag_number)))

dag = DAG(dag_id,
schedule_interval=schedule,
default_args=default_args)

with dag:
t1 = PythonOperator(
task_id='hello_world',
python_callable=hello_world_py,
dag_number=dag_number)

return dag

▪ Multi-File Method:
▪ This time, instead of having one single Python file in charge of generating your
DAGs , you are going to use a script that will create a file of each generated
DAG. At the end you will get one Python File per generated DAG.

▪ Benefits:
▪ It’s more scalable than single-file methods. Because the DAG files aren’t
being generated by parsing code in the dags_folder, the DAG generation
code isn’t executed on every scheduler heartbeat.

▪ Since DAG files are being explicitly created before deploying to Airflow, you
have full visibility into the DAG code, including from the Code button in the
Airflow UI.

▪ Drawbacks:
▪ It can be complex to set up.

▪ Changes to DAGs or additional DAGs won’t be generated until the script is

run, which in some cases requires a deployment.

▪ Process
▪ To start, we will create a DAG ‘template’ file that defines the DAG’s structure.
This looks just like a regular DAG file, but we have added specific variables
where we know information is going to be dynamically generated, namely
the dag_id, scheduletoreplace, and querytoreplace.

from airflow import DAG

from airflow.operators.postgres_operator import PostgresOperator
from datetime import datetime

default_args = {'owner': 'airflow','start_date': datetime(2021, 1, 1)}

dag = DAG(dag_id,
schedule_interval=scheduletoreplace,
default_args=default_args,
catchup=False)

with dag:
t1 = PostgresOperator(
task_id='postgres_query',
postgres_conn_id=connection_id
sql=querytoreplace)

▪ Next we create a dag-config folder that will contain a JSON config file for
each DAG. The config file should define the parameters that we noted above,
the DAG Id, schedule interval, and query to be executed.

{
"DagId": "dag_file_1",
"Schedule": "'@daily'",
"Query":"'SELECT * FROM table1;'"
}

▪ Finally, we create a Python script that will create the DAG files based on the
template and the config files. The script loops through every config file in the
dag-config/ folder, makes a copy of the template in the dags/ folder, and
overwrites the parameters in that file with the ones from the config file.

import json
import os
import shutil
import fileinput

config_filepath = 'include/dag-config/'
dag_template_filename = 'include/dag-template.py'

for filename in os.listdir(config_filepath):

f = open(filepath + filename)
config = json.load(f)

new_filename = 'dags/'+config['DagId']+'.py'
shutil.copyfile(dag_template_filename, new_filename)

for line in fileinput.input(new_filename, inplace=True):

line.replace("dag_id", "'"+config['DagId']+"'")
line.replace("scheduletoreplace", config['Schedule'])
line.replace("querytoreplace", config['Query'])
print(line, end="")

▪ Now to generate our DAG files, we can either run this script ad-hoc or as
part of our CI/CD workflow. After running the script, our final directory would
look like the example below, where the include/ directory contains the files
shown above, and the dags/ directory contains the two dynamically
generated DAGs:

dags/
├── dag_file_1.py
├── dag_file_2.py
include/
├── dag-template.py
├── generate-dag-files.py
└── dag-config
├── dag1-config.json
└── dag2-config.json

• Versioning

◦ Versioning does not currently exist in Airflow

◦ The issue is that if you remove a task, you will not be able to view this task in the
past dag_run logs

◦ There is a temporary way around this, by adding a suffix to your dag_id like
_1_0_0

• DAG Scheduling

◦ Important Parameters
▪ start_date: Date at which your tasks are being scheduled

▪ schedule_interval: Interval of time from the min(start_date) at which the

DAG is triggered. If it is set to None, the DAG must be triggered manually
Execution Flow Chart

◦ Timedelta vs CRON
▪ CRON expressions are stateless (as specified)

▪ TimeDelta is stateful (according to the previous execution date)

▪ Use case: When you want to trigger your DAG every three days, TimeDelta is
simpler because you don’t need to set the actual dates

• Templating

◦ Airflow takes advantage of the power of Jinja Templating and this is a powerful
tool to use in combination with macros. Jinja templating allows providing dynamic
content using python code to otherwise static objects such as strings. In other
words, it allows you to fetch a specific chunk of data based on a parameter, rather
than fetching that exam same data every time. Since airflow Macros are evaluated
while the task gets run, it is possible to provide parameters that can change
during execution. For example, passing the result of one operator to another one
that runs after it.

◦ Also, parameters such as execution dates can be passed to fields. All operators
define some of the fields that are template-able, and only those fields can take
macros as inputs.

◦ Allows you to inject data at runtime using double curly brackets

◦ The argument must be compatible with templating, you can check the
documentation

◦ ds is a predefined parameter corresponding to the execution date

• TaskFlow API

◦ TaskFlow takes care of moving inputs and outputs between your Tasks using
XComs for you, as well as automatically calculating dependencies - when you call
a TaskFlow function in your DAG file, rather than executing it, you will get an
object representing the XCom for the result (an XComArg), that you can then use
as inputs to downstream tasks or operators.
Taskflow API

◦ TaskFlow API Features

▪ XCom Args, which allow task dependencies to be abstracted and inferred as a
result of the Python function invocation

▪ A task decorator that automatically creates PythonOperator tasks from Python

functions and handles variable passing
▪ Support for Custom XCom Backends

@dag(schedule_interval='@daily', default_args=default_args, catchup=False)

def taskflow():

@task
def extract_bitcoin_price() -> Dict [str,float):
return requests.get (API).json()['bitcoin']

@task(multiple outputs=True)
def process_data(response: Dict[str, float]) -> Dict[str, float]:
logging.info(response)
return {'usd': response['usd'], 'change': response['usd_24h_change']}

@task
def store_data(data: Dict[str, float]):
logging. info(f"Store: (data['usd']} with change {data['change']}")

store data(process data(extract_bitcoin_price()))

◦ You cannot use the TaskFlow API between the parent DAG and the sub DAG. It
tries to make the dependencies automatically for you, which means you are trying
to create dependencies from your DAG and from your subDAG, which is not
possible.
Taskflow API

◦ multiple_outputs: if you want one XCOM for one value, without using
xcom_push more than once, you can set this argument to True. By specifying this
arguemnt to true, you are saying this XCOM is not one XCOM with the diciotnary
as a value, but two XCOMs with different values and keys.

◦ -> Dict[str, str]: same as multiple_outputs, and better

• XCOMs

◦ XCOMs allow you to share data between tasks

▪ ti task instance object, access XCOM using it

▪ Limited in size,
▪ 2 GB for SQLite,

▪ 1 GB for PostgreSQL,

▪ 64 KB for MySQL

◦ You can push multiple XCOMs by returning a JSON dictionary

◦ Use Case:

XCOM Use Case

▪ This data pipeline trains different machine learning models based on a dataset
and the last task selects the model having the highest accuracy. The question is,
how can we get the accuracy of each model in the task Choosing Model to
choose the best one? Using XCOMs!

▪ You can think of an XCOM as an object that is stored in the metadata database
of Airflow with the following fields:
▪ The key is the identifier of your XCom. No need to be unique and is used to
get back the XCOM from a given task.

▪ The value is the value of your XCom. What you want to share. Keep in mind
that your value must be serializable in JSON or pickable. Notice that serializing
with pickle is disabled by default to avoid RCE exploits/security issues.

▪ The timestamp is the data at which the XCom was created.

▪ That execution date corresponds to the execution date of the DagRun having
generated the XCom. That’s how Airflow avoid fetching an XCom coming from
another DAGRun

▪ The simplest way to create a XCOM is by returning a value from an operator

▪ The task_id of the task where the XCom was created.

▪ The dag_id of the dag where the XCom was created.

▪ Pushing an XCOM with xcom_push

▪ The xcom_push method is only accessible from a task instance object. With the
PythonOperator we can access it by passing the parameter ti to the python
callable function.

def _training_model(ti):
accuracy = uniform(0.1, 10.0)
print(f'model\'s accuracy: {accuracy}')
return accuracy

▪ Pulling an XCOM with xcom_pull

▪ In order to pull a XCOM from a task, you have to use the xcom_pull method.
Like xcom_push, this method is available through a task instance object.
xcom_pull expects 2 arguments:
▪ task_ids, only XComs from tasks matching ids will be pulled

▪ key, only XComs with matching key will be returned

def _choose_best_model(ti):
fetched_accuracy = ti.xcom_pull(key='model_accuracy', task_ids=['training_model
_A'])
print(f'choose best model: {fetched_accuracy}')

▪ Pulling XCOMs from multiple tasks

▪ You just need to specify the task_ids in xcom_pull

def _choose_best_model(ti):
fetched_accuracies = ti.xcom_pull(key='model_accuracy', task_ids=['training_model_
A', 'training_model_B', 'training_model_C'])
print(f'choose best model: {fetched_accuracies}')

• SubDAGs / TaskGroups

◦ If you have a lot of tasks in your dag, It might be hard to understand what’s going
on. You can group the tasks that belong together using SubDags and TaskGroups

◦ SubDAGs
▪ When a SubDAG is triggered, the SubDAG and child tasks take up worker slots
until the entire SubDAG is complete. This can delay other task processing and,
depending on your number of worker slots, can lead to deadlocking.

▪ SubDAGs have their own parameters, schedule, and enabled settings. When
these are not consistent with their parent DAG, unexpected behavior can occur.

◦ TaskGroups
▪ Unlike SubDAGs, Task Groups are just a UI grouping concept. Starting in Airflow
2.0, you can use Task Groups to organize tasks within your DAG’s graph view in
the Airflow UI. This avoids the added complexity and performance issues of
SubDAGs, all while using less code
▪ You can use dependency operators (« and ») on Task Groups in the same way
that you can with individual tasks. Dependencies applied to a Task Group are
applied across its tasks. In the following code, we’ll add additional dependencies
to t0 and t3 to the Task Group, which automatically applies the same
dependencies across t1 and t2

t0 = DummyOperator(task_id='start')

# Start Task Group definition

with TaskGroup(group_id='group1') as tg1:
t1 = DummyOperator(task_id='task1')
t2 = DummyOperator(task_id='task2')

t1 >> t2
# End Task Group definition

t3 = DummyOperator(task_id='end')

# Set Task Group's (tg1) dependencies

t0 >> tg1 >> t3

• Branching

◦ You are able to choose one task or another based on a condition. The
BranchPythonOperator allows you to choose one branch among branches of
your DAG
▪ If the condition is true, you will return the task_id corresponding to the task
you want to execute next

▪ One of this simplest ways to implement branching in Airflow is to use the

BranchPythonOperator. Like the PythonOperator, the BranchPythonOperator
takes a Python function as an input. However, the BranchPythonOperator’s input
function must return a list of task IDs that the DAG should proceed with based
on some logic.

▪ For example, we can pass the following function that returns one set of task IDs
if the result is greater than 0.5 and a different set if the result is less than or
equal to 0.5:

def choose_branch(**kwargs, result):

if result > 0.5:
return ['task_a', 'task_b']
return ['task_c']

▪ When you trigger the BranchPythonOperator, one task is trigger next and the
others are skipped.

• SLAs

◦ An SLA, or a Service Level Agreement, is an expectation for the maximum time a

Task should take. If a task takes longer than this to run, then it becomes visible in
the “SLA Misses” part of the user interface, as well as sending an email of all the
tasks that missed their SLA.

◦ SLA vs. Timeout

▪ When you define an SLA to a task, then you just receive a notification. It does
not do anything to the task.

◦ Note: If you trigger your DAG manually, your SLA’s won’t be checked

◦ sla_miss_callback: specify a function to call when reporting SLA timeouts

◦ They operate relative to the DAG execution date

• Miscellaneous

◦ The Scheduler will only parse files that contain DAG or Airflow

◦ Setting Catchup=False is the best practice

◦ .airflowignore is the same idea as .gitignore

◦ Each task can have its own start_date

◦ group_id must be unique, however you can set

add_suffix_on_collision=True to add a number to differentiate them

◦ The use of tags, owners, and permission roles

◦ retry_exponential_backoff is used to make the retry process more dynamic

and avoid overlaoding an API

◦ If you change the start_date in a DAG to be in the past, the new past DAG runs
won’t be triggered automatically

◦ priority_weight is used to organize the sequence of tasks in a DAG

Once you pass, you’ll earn the beautiful badge below!

DAG Authoring for Airflow Badge

 

10 Jan 2022 30 Dec 2021 22 Dec 2021

Study Guide for AWS Data Study Guide for GitLab Study Guide for Apache
Analytics Specialty Certified Associate Airflow Fundamentals
Certification Certification Certification
This study guide covers AWS Certification This study guide covers the GitLab Certified This study guide covers Astronomer
for Data Analytics Specialty. This ... Associate Certification. It is a... Certification for Apache Airflow
Fundament...

 Twitter  Github  Instagram  Linkedin

Data Contracts Early Release 042024
No ratings yet
Data Contracts Early Release 042024
52 pages
Apache Airflow 1741977651
No ratings yet
Apache Airflow 1741977651
83 pages
AWS DevOps Engineer Professional Exam - Free Exam Q&as, Page 1 - ExamTopics
No ratings yet
AWS DevOps Engineer Professional Exam - Free Exam Q&as, Page 1 - ExamTopics
377 pages
Introduction To GIS Programming and Fundamentals With Python and ArcGIS
100% (7)
Introduction To GIS Programming and Fundamentals With Python and ArcGIS
381 pages
Aws Certified Data Engineer Slides
100% (1)
Aws Certified Data Engineer Slides
696 pages
Krishna Gopal Devops - 1
No ratings yet
Krishna Gopal Devops - 1
5 pages
Airflow Notes
No ratings yet
Airflow Notes
10 pages
Namdev Rathod DevOps Engineer Resume
No ratings yet
Namdev Rathod DevOps Engineer Resume
3 pages
Istio Certified Associate
No ratings yet
Istio Certified Associate
5 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Associate Data Practitioner Google Cloud Dumps Questions
No ratings yet
Associate Data Practitioner Google Cloud Dumps Questions
7 pages
Real Google Cloud Associate Data Practitioner Study Questions by Brady
No ratings yet
Real Google Cloud Associate Data Practitioner Study Questions by Brady
8 pages
CiTRANS 610A Packet Transfer Platform User Manual
50% (2)
CiTRANS 610A Packet Transfer Platform User Manual
286 pages
2.3 Resource Management
No ratings yet
2.3 Resource Management
23 pages
Associate Data Practitioner Exam Dumps
No ratings yet
Associate Data Practitioner Exam Dumps
6 pages
11 Prometheus Interview Question & Answers
No ratings yet
11 Prometheus Interview Question & Answers
9 pages
Score Report For Harness Certified Expert - Continuous Delivery & GitOps Developer
No ratings yet
Score Report For Harness Certified Expert - Continuous Delivery & GitOps Developer
2 pages
Taj Mahal Professional Resume Template Violet
No ratings yet
Taj Mahal Professional Resume Template Violet
3 pages
ArgoCD Cluster Connectivity Issue
No ratings yet
ArgoCD Cluster Connectivity Issue
12 pages
Argocd
No ratings yet
Argocd
14 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
Airflow Web UI and CLI
No ratings yet
Airflow Web UI and CLI
51 pages
PCA Set1
No ratings yet
PCA Set1
67 pages
Github Copilotexamdumps 250503112034 6fb2eb95
No ratings yet
Github Copilotexamdumps 250503112034 6fb2eb95
23 pages
Gitlabcimeetup 220330181442
No ratings yet
Gitlabcimeetup 220330181442
37 pages
Vishwa Resume
No ratings yet
Vishwa Resume
1 page
Vamshi Krishna Resume SRE
No ratings yet
Vamshi Krishna Resume SRE
11 pages
AWS Certified Data Engineer - Cheat Sheet - MyDE
No ratings yet
AWS Certified Data Engineer - Cheat Sheet - MyDE
87 pages
Study Guide For Apache Airflow Fundamentals Certification
No ratings yet
Study Guide For Apache Airflow Fundamentals Certification
6 pages
PCA Set2
No ratings yet
PCA Set2
21 pages
Complete Guide To Freelancing in 2023
No ratings yet
Complete Guide To Freelancing in 2023
1 page
Java Coding Interview Questions + Answers (With Code Examples) - Zero To Mastery
No ratings yet
Java Coding Interview Questions + Answers (With Code Examples) - Zero To Mastery
71 pages
306 Fillable Resume Template
No ratings yet
306 Fillable Resume Template
3 pages
GCP
No ratings yet
GCP
612 pages
Argo CD
No ratings yet
Argo CD
7 pages
HashiCorp Certified - Terraform Associate
No ratings yet
HashiCorp Certified - Terraform Associate
1 page
End-To-End DevSecOps Pipeline With Jenkins
No ratings yet
End-To-End DevSecOps Pipeline With Jenkins
36 pages
Latest - DevOps Coding Assessment
No ratings yet
Latest - DevOps Coding Assessment
2 pages
COM-421-Lecture-Notes-7 - Open Stack
No ratings yet
COM-421-Lecture-Notes-7 - Open Stack
24 pages
(MAKE A COPY) Wonsulting Resume Template + Resources
No ratings yet
(MAKE A COPY) Wonsulting Resume Template + Resources
6 pages
Sample DevOps Resume
No ratings yet
Sample DevOps Resume
2 pages
TF On Spark
No ratings yet
TF On Spark
35 pages
AZ-104 Exam - Free Actual Q&As, Page 3 ExamTopics
No ratings yet
AZ-104 Exam - Free Actual Q&As, Page 3 ExamTopics
7 pages
Creative Google Docs Resume Template
No ratings yet
Creative Google Docs Resume Template
1 page
Asad Ahmad: Summary
No ratings yet
Asad Ahmad: Summary
2 pages
The Ai Millionaire Checklist
No ratings yet
The Ai Millionaire Checklist
21 pages
Tapan Banker, Tapan Nayan Banker Cloud Architect, Enterprise
No ratings yet
Tapan Banker, Tapan Nayan Banker Cloud Architect, Enterprise
2 pages
04 - Google BigQuery Pricing
No ratings yet
04 - Google BigQuery Pricing
18 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
24 pages
Akash Resume
No ratings yet
Akash Resume
7 pages
Programming Assignment 2: Priority Queues and Disjoint Sets
No ratings yet
Programming Assignment 2: Priority Queues and Disjoint Sets
11 pages
Notes GCP PCA Preparation
No ratings yet
Notes GCP PCA Preparation
7 pages
Donald Ngandeu 1
No ratings yet
Donald Ngandeu 1
6 pages
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
No ratings yet
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
34 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
Aditya Jha Senior Data Engineer Resume
No ratings yet
Aditya Jha Senior Data Engineer Resume
1 page
Chapter - 1-OOP Concepts
No ratings yet
Chapter - 1-OOP Concepts
63 pages
Senior Devops Engineer Resume Example
No ratings yet
Senior Devops Engineer Resume Example
1 page
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
Documentation and ASCII Archives: Simatic S7
No ratings yet
Documentation and ASCII Archives: Simatic S7
12 pages
AWS Certified SysOps Administrator
No ratings yet
AWS Certified SysOps Administrator
3 pages
Cloudera Hive
No ratings yet
Cloudera Hive
106 pages
Ace Updated 20 2 23
No ratings yet
Ace Updated 20 2 23
1 page
Jan Garaj 2019 CV Devops
No ratings yet
Jan Garaj 2019 CV Devops
2 pages
Lab 1 - Network Fundamentals - DSU
No ratings yet
Lab 1 - Network Fundamentals - DSU
13 pages
Cloud Dataproc Workflow Animation
No ratings yet
Cloud Dataproc Workflow Animation
2 pages
Google Certified Professional Data Engineer
No ratings yet
Google Certified Professional Data Engineer
4 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Resume Solutions Architect Russ Gordon
No ratings yet
Resume Solutions Architect Russ Gordon
3 pages
Fort I Client
No ratings yet
Fort I Client
4 pages
MP800
No ratings yet
MP800
9 pages
CRM Practices of Amazon
No ratings yet
CRM Practices of Amazon
45 pages
Traffic Highway Engineering 5057 2012 s1
No ratings yet
Traffic Highway Engineering 5057 2012 s1
11 pages
Networking Devices - Introductory Summary
100% (1)
Networking Devices - Introductory Summary
22 pages
A Quick Guide To Report Writing
No ratings yet
A Quick Guide To Report Writing
10 pages
DBMS Module1-5
No ratings yet
DBMS Module1-5
7 pages
Network Strategy Whitepaper en
No ratings yet
Network Strategy Whitepaper en
21 pages
Implementing Self-Service Procurement in ERP - SAP Documentation
No ratings yet
Implementing Self-Service Procurement in ERP - SAP Documentation
3 pages
Sad Notes Updated Final
No ratings yet
Sad Notes Updated Final
47 pages
Update Instruction - UV88
No ratings yet
Update Instruction - UV88
6 pages
160.102 Linear Mathematics - Massey - Exam - S1 2011
No ratings yet
160.102 Linear Mathematics - Massey - Exam - S1 2011
7 pages
159.101 Computer Science Fundamentals - Massey - Exam - S1 2017
No ratings yet
159.101 Computer Science Fundamentals - Massey - Exam - S1 2017
5 pages
158.337 Database Development - Massey - Exam - S1 2014
No ratings yet
158.337 Database Development - Massey - Exam - S1 2014
18 pages
Tybsc It 26072019
No ratings yet
Tybsc It 26072019
91 pages
160.102 Linear Mathematics - Massey - Exam - SEMESTER ONE 2019
No ratings yet
160.102 Linear Mathematics - Massey - Exam - SEMESTER ONE 2019
26 pages
Engineering Surveying Engg5056 2008 s1
No ratings yet
Engineering Surveying Engg5056 2008 s1
6 pages
160.102 Linear Mathematics - Massey - Exam - S1 2007
No ratings yet
160.102 Linear Mathematics - Massey - Exam - S1 2007
8 pages
159.102 Computer Science Fundamentals - Massey - Exam - S2 2018
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S2 2018
5 pages
159.102 Computer Science Fundamentals - Massey - Exam - S2 2019
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S2 2019
5 pages
160.103 METHODS OF MATHEMATICS - Massey - Exam - Summer S3 2007
No ratings yet
160.103 METHODS OF MATHEMATICS - Massey - Exam - Summer S3 2007
6 pages
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1201
No ratings yet
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1201
9 pages
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1603
No ratings yet
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1603
11 pages
160.103 METHODS OF MATHEMATICS - Massey - Exam - Semester 1-Final Exam-2006
No ratings yet
160.103 METHODS OF MATHEMATICS - Massey - Exam - Semester 1-Final Exam-2006
6 pages
160.102 Linear Mathematics - Massey - Exam - S2 2011
No ratings yet
160.102 Linear Mathematics - Massey - Exam - S2 2011
6 pages
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1401
No ratings yet
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1401
9 pages
Pybullet Quickstartguide
No ratings yet
Pybullet Quickstartguide
88 pages
Internship
No ratings yet
Internship
14 pages
Intermediate Structures 5065 2011 s1
No ratings yet
Intermediate Structures 5065 2011 s1
8 pages
160.103 METHODS OF MATHEMATICS - Massey - Exam - Summer School 2006
No ratings yet
160.103 METHODS OF MATHEMATICS - Massey - Exam - Summer School 2006
6 pages
159.101 Computer Science Fundamentals - Massey - Exam - S1 2019
No ratings yet
159.101 Computer Science Fundamentals - Massey - Exam - S1 2019
6 pages
Software Quality & Testing Lecture 8
No ratings yet
Software Quality & Testing Lecture 8
20 pages
160.102 Linear Mathematics - Massey - Exam - S1 2012
No ratings yet
160.102 Linear Mathematics - Massey - Exam - S1 2012
7 pages
159.102 Computer Science Fundamentals - Massey - Exam - S2 2012
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S2 2012
6 pages
159.101 Computer Science Fundamentals - Massey - Exam - S1 2018
No ratings yet
159.101 Computer Science Fundamentals - Massey - Exam - S1 2018
6 pages
160.102 Linear Mathematics - Massey - Exam - S2 - November 2010
No ratings yet
160.102 Linear Mathematics - Massey - Exam - S2 - November 2010
6 pages
159.101 Computer Science Fundamentals - Massey - Exam - S1 2016
No ratings yet
159.101 Computer Science Fundamentals - Massey - Exam - S1 2016
5 pages
159.102 Computer Science Fundamentals - Massey - Exam - S1 2019
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S1 2019
5 pages
159.102 Computer Science Fundamentals - Massey - Exam - S1 2017
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S1 2017
5 pages
159.102 Computer Science Fundamentals - Massey - Exam - S2 2016
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S2 2016
5 pages
159.101 Computer Science Fundamentals - Massey - Exam - S1 2013
No ratings yet
159.101 Computer Science Fundamentals - Massey - Exam - S1 2013
5 pages
Research Proposal
No ratings yet
Research Proposal
16 pages
CSS-visual Rules - Cheatsheet-From Code Academy
No ratings yet
CSS-visual Rules - Cheatsheet-From Code Academy
3 pages
Microcontrollers: Multiplexed External Bus Interface (MEBI)
No ratings yet
Microcontrollers: Multiplexed External Bus Interface (MEBI)
34 pages
Dere 0922
No ratings yet
Dere 0922
7 pages
Engineering Surveying Engg5051 2005 s2
No ratings yet
Engineering Surveying Engg5051 2005 s2
4 pages
Transport Layer Protocols and Services: October 2016
No ratings yet
Transport Layer Protocols and Services: October 2016
5 pages
Cpe 2023 06132024
No ratings yet
Cpe 2023 06132024
3 pages
Web Assignment (Pair)
No ratings yet
Web Assignment (Pair)
5 pages
Individual Assignment
No ratings yet
Individual Assignment
2 pages