0% found this document useful (0 votes)
854 views7 pages

Apache Airflow Fundamentals Study Guide

Uploaded by

Don
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
854 views7 pages

Apache Airflow Fundamentals Study Guide

Uploaded by

Don
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Certification Exam Guide

Apache Airflow
Fundamentals
Introduction

Welcome to the Apache Airflow Fundamentals Certification Exam study guide! We are very excited you have

decided to get certified with us. This guide will give you an overview of the certification exam to help you

determine how to study and when to take it.

This guide covers the latest version of the exam, which was last updated on September 1, 2023. If you have

any questions or want to talk to our team, reach out to us at: [email protected].

Preparation Expectations

This exam is designed for data practitioners who want to establish a strong foundation in Airflow. Whether

you're looking to enhance your skills, validate your expertise, or advance your career, this exam is tailored to

help you achieve those goals.

To attain this certification, you must demonstrate an understanding of Apache Airflow's core concepts, such

as architecture, DAGs, the task lifecycle, and the scheduling process. You should be comfortable

recommending use cases, architectural needs, settings, and design choices for data pipelines. You should be

able to trigger, debug, and retry DAGs (and their associated tasks) and use the correct views in the UI to

monitor them.

At a minimum, it's recommended that you do the following:

Complete the Airflow 101 Astronomer Academy cours

Review the list of covered exam topics in this guid

Leverage the additional resources mentioned in this guide to learn more about each topic

Additionally, most individuals that successfully pass the exam have at-least 3 months of Airflow experience.

Exam Details

Format: 75 Multiple-choice question


Time Allotted:60 Minute
Passing Score: 70% (53 correct out of 75
Cost: $150 US

Language: Englis

Important Notes:
◦ Once enrolled in the exam, you will have 30 days to complete it. After 30 days, the exam expires
and must be purchased again.

◦ Once the exam is completed (both the exam itself and any other modules), our badge vendor,
Credly, will issue your digital badge to the email associated with your Astronomer Academy

account.

1
Additional Resources
This guide is intended to be only one of the many resources you can use to prepare for the certification
exam. We also recommend leveraging the following resources:

Astronomer Academy: A large catalog of free Airflow courses taught by the Astronomer experts behind
the project.
◦ Airflow 101 Learning Path: A curated learning path that guides you through the foundational skills
and knowledge you need to start with Apache Airflow.
Astronomer Docs: The official place to learn everything you need to know about Astro and Apache
Airflow.
◦ Airflow Concepts: Learn about the fundamentals of Apache Airflow.
◦ Airflow Tutorials: Step-by-step guides for writing DAGs and running Airflow.
Airflow Community Resources: A collection of resources for the Airflow community, including a newsletter
and Slack.

Exam Topics
This exam covers a variety of topics about Apache Airflow. The exam randomizes questions from a pool of
over 90+ questions that are categorized by topic. This means that you may encounter a different set of
questions each time you take the exam, but all topics will be covered. Use the learning outcomes below to
guide your study and prepare for the exam.

Topic 1: Airflow Use Cases Topic 5: DAG Scheduling


Given a specific scenario, identify if Airflow is an Identify the purpose of each DAG scheduling
applicable solution parameter:
◦ catchup
Topic 2: Dependencies
◦ start_date
Identify the purpose of task-level dependencies
◦ end_date
in Airflo
◦ schedule_interval
Identify where DAG dependencies are set up in
Airflo Identify which tools/commands make it possible
to backfill DAGs when the catchup parameter is
Compare and contrast DAG task dependency
set to fals
relationships for equivalenc
Identify the default value for the start_date
Match DAG task dependency graphs to their
paramete
equivalent DAG dependency code
Identify the valid values a DAG can accept for its
Topic 3: Operators schedule_interval paramete
Identify what a Transfer Operator doe Given a specific DAG scheduling goal (e.g.,
Identify what a Sensor Operator doe Schedule a DAG every day at 2 PM), identify the
correct DAG scheduling parameter values to
Identify the purpose of the Airflow
accomplish the goa
PythonOperator
Given specific DAG scheduling parameter
Topic 4: DAG Runs values, identify if a specific scheduling goal will
Given a specific DAG scheduling scenario or be accomplished
DAG code, identify the number of DAG runs that
will occur

2
Topic 6: Debugging Topic 10: Best Practices
Given a specific Airflow issue or completed DAG Given specific DAG code, identify ways to
code, identify the cause of it improve the code using Airflow best practices
Topic 7: XComs Topic 11: Connections
Identify the purpose of each XCom method: Identify the different ways to create an Airflow
◦ xcom push connectio
◦ xcom pull Identify the correct way to create an Airflow
Identify the limitations of using XComs connection in a .env fil
Given a specific Airflow connection string,
Topic 8: Airflow CLI identify the connection ID
Identify the purpose of specific Airflow CLI Topic 12: Tasks
commands:
◦ airflow tasks test Given DAG code, identify the number of tasks
that will run when the DAG is scheduled
◦ airflow db init
◦ airflow info Topic 13: Sensors
◦ airflow tasks test Identify the default timeout value of a senso
◦ airflow config list Identify the mode to use in a DAG when the
◦ airflow cheat-sheet DAG’s poke_interval parameter value is set to
◦ airflow variables specific duration
◦ airflow users Given the code for a sensor in a DAG, identify if
◦ airflow standalone the sensor is properly configured to accomplish
a specific goal
◦ airflow version
Topic 14: Variables
Topic 9: Airflow Concepts
Identify the purpose of an Airflow variabl
Identify which folder the Airflow Scheduler parses
when searching for new DAG file Identify the types of data that can be stored in a
variabl
Identify what an Airflow provider i
Identify the correct way to define a variable with
Identify what a DAG run i a specific value in Airflo
Identify the role of a worker in Airflo Identify how to fetch the value of an Airflow
Identify which programming language Airflow variable in specific formats (e.g., JSON
primarily use Given a specific variable name, Identify if a
Identify the purpose of an XCo variable will be visible in the Airflow UI
Identify the purpose of a DA
Identify the purpose of the default_args DAG Topic 15: Airflow UI
paramete Identify the most helpful Airflow UI view to use
Identify the default time zone of an Airflow for real-world scenarios.
instanc Identify the default time for DAGs to appear in
Identify the role of an executor in Airflo the Airflow U
Identify the core architectural components of Identify the result of deleting a DAG using the
Airflo Airflow U
Identify the typical journey of a tas Identify the purpose of core Airflow UI
Identify what happens when two DAGs share the components (e.g., The Last Run Column
same dag_id valu Given a specific Airflow UI, identify solutions to
Identify optional and non-optional DAG parameter common issues
Identify what each of the task lifecycle stages doe
Identify valid ways to define a DAG in Airflow

3
Sample Questions

To help give you a sense of the types of questions that will be asked on the exam, we are providing five

sample questions. These questions are modified versions of similar questions you might find on the exam.

The answer key can be found at the end of this section.

Sample Exam Questions 1 - DAG Scheduling

What would be the Cron value of the schedule_interval parameter of a DAG if it needed to be triggered

every two hours but only on weekends?

a. 0 */2 * * 0,6

b. 0 2 * * 6,7

c. 0 */2 * * 6,0

d. 0 0/2 * * 6,7

Sample Exam Question 2 - Airflow Concepts

What is the role of the Airflow scheduler?

a. To execute tasks.

b. To both trigger scheduled workflows and submit tasks to the executor to run.

c. To define how tasks are executed and on which system.

d. To define the interval of when a task is expected to be executed.

Sample Exam Question 3 - Dependencies

What task dependency relationship results in the following DAG?

a. a >> b >> c >> d

b. a >> bc >> d

c. a >> [b,c,d]

d. a >> [b,c] >> d

4
Sample Exam Question 4 - Debugging
Examine the following DAG:

from airflow import DAG

from airflow.operators.python_operator import PythonOperator

from datetime import datetime

with DAG(

'example_dag',

schedule_interval= '@daily',

catchup=False

): 

task_1 = PythonOperator(

python_callable=lambda: print("Task 1 executed.")

task_2 = PythonOperator(

python_callable=lambda: print("Task 2 executed.") ) task_1 >> task_2

 

task_2 << task_1

Which of the following are issues with this DAG? (select all that apply)
a. The DAG is missing a start_date parameter
b. Both tasks are missing a task_id
c. The DAG has a cycle
d. The tasks are not assigned to the dag

Sample Exam Question 5 - DAG runs


Assume a DAG is set to run daily but is paused on 2023/05/11 at 08:00 UTC. The DAG was then unpaused on
2023/05/15 at 10:00 UTC. How many DAG runs will occur if the DAG catchup parameter value is set to False?

a. 0
b. 1
c. 4
d. 5

5
Answer Key
1. A
2. B
3. D
4. B & C
5. B

You might also like