0% found this document useful (0 votes)

87 views38 pages

Airflow - Interview - Question - Answers - Manual 1

The document provides a comprehensive overview of Apache Airflow, detailing its components such as DAGs, Operators, and the Scheduler. It explains key concepts like task dependencies, task instances, and how to manage workflows, including scheduling, retries, and logging. Additionally, it covers the use of various operators, including the PythonOperator and TriggerDagRunOperator, and discusses the significance of Airflow Variables and the Airflow CLI.

Uploaded by

Shobhit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views38 pages

Airflow - Interview - Question - Answers - Manual 1

Uploaded by

Shobhit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Basic Level Questions

1. What is Apache Airflow, and why is it used?

 Answer:

o Apache Airflow is an open-source platform used to

programmatically author, schedule, and monitor workflows. It
allows you to define workflows as directed acyclic graphs (DAGs),
where tasks are executed in a specific order based on their
dependencies. Airflow is commonly used for data pipeline
automation, ETL processes, and managing batch jobs.

2. What are Directed Acyclic Graphs (DAGs) in Apache Airflow?

 Answer:

o A DAG (Directed Acyclic Graph) is a collection of tasks organized

in such a way that each task has a defined execution order. It
represents the workflow in Airflow. Each task is a node, and the
dependencies between tasks are represented as directed edges.
"Acyclic" means there are no loops or cycles in the graph,
ensuring a clear, one-way flow of task execution.

3. What are Operators in Apache Airflow?

 Answer:

o Operators are the building blocks of Airflow DAGs. They define

what work needs to be done. Common types of operators
include:

 Action Operators (e.g., BashOperator, PythonOperator):

These execute specific actions, such as running bash
commands or Python scripts.

 Transfer Operators (e.g., S3ToRedshiftOperator): These

move data between different systems or platforms.

 Sensor Operators (e.g., FileSensor): These wait for a

condition to be true, such as the existence of a file.
4. What is the difference between a Task and a DAG in Apache
Airflow?

 Answer:

o A Task is a single unit of work in Airflow, defined by an operator.

It represents a step or action that is performed in the workflow.

o A DAG is a collection of tasks organized with dependencies. The

DAG defines the order of task execution and the relationships
between them.

5. How do you schedule a DAG in Apache Airflow?

 Answer:

o You can schedule a DAG in Airflow using the schedule_interval

parameter. This can be set to a cron expression, a timedelta
object, or predefined strings like @daily, @hourly, etc. The
schedule determines how often the DAG should run.

o Example:

dag = DAG('my_dag', schedule_interval='0 12 * * *')

o This would run the DAG every day at 12:00 PM.

6. What is the role of the Scheduler in Apache Airflow?

 Answer:

o The Scheduler is responsible for managing the scheduling of

tasks. It monitors all DAGs and triggers tasks to run when their
scheduled times arrive. It also checks the dependencies between
tasks and determines which tasks are ready to execute.

7. What is the Airflow Web UI, and what are its key features?

 Answer:

o The Airflow Web UI is a user interface that allows users to

monitor, manage, and debug their workflows. Key features of the
Web UI include:
 Viewing DAGs and their current status.

 Inspecting the details of individual tasks, including logs and

execution history.

 Manually triggering tasks or pausing/resuming DAGs.

 Accessing task logs to help debug failures.

8. What are XComs in Apache Airflow?

 Answer:

o XComs (short for "Cross-communication") are a way for tasks in

Airflow to share data with each other. A task can push data to
XComs, and another task can pull that data using XComs. This is
useful when you need to pass small amounts of data between
tasks, such as parameters or results.

9. What is the role of the Executor in Apache Airflow?

 Answer:

o The Executor in Airflow is responsible for executing the tasks in

the DAG. It manages the parallelism of task execution. There are
different types of executors in Airflow, such as:

 SequentialExecutor: Executes tasks one at a time (mostly

used for testing or development).

 LocalExecutor: Executes tasks in parallel on a local

machine.

 CeleryExecutor: Allows for distributed task execution using

Celery.

 KubernetesExecutor: Runs tasks on Kubernetes clusters.

10. What is a Task Instance in Apache Airflow?

 Answer:
o A Task Instance is a specific run of a task in a DAG. It represents
the state of a task for a given execution of the DAG. Each task
instance has a status (e.g., success, failed, running, skipped) and
may contain logs and other metadata related to that specific run.

11. What is a Trigger Rule in Apache Airflow?

 Answer:

o A Trigger Rule in Airflow determines when a task should run

based on the state of its upstream tasks. The default trigger rule
is all_success, which means a task will run only if all of its
upstream tasks have succeeded. Other trigger rules include:

 all_failed: Task runs if all upstream tasks have failed.

 one_success: Task runs if at least one upstream task

succeeds.

 none_failed: Task runs if no upstream tasks have failed.

12. How does Apache Airflow handle retries for failed tasks?

 Answer:

o In Airflow, tasks can be configured to retry automatically when

they fail. The retries parameter specifies how many times a task
should be retried, and the retry_delay parameter defines the
time between retries. This allows workflows to be more resilient
to intermittent failures.

task = PythonOperator(

task_id='example_task',

python_callable=my_function,

retries=3,

retry_delay=timedelta(minutes=5),

dag=dag

)
13. What is the difference between a DAG and a SubDag in Apache
Airflow?

 Answer:

o A DAG is a complete workflow, whereas a SubDag is a smaller,

reusable workflow that is embedded within another DAG.
SubDags are useful when you need to encapsulate and reuse a
portion of a workflow that may have its own tasks and
dependencies.

o SubDags are executed as tasks in the parent DAG and are

treated like any other task, but they have their own DAG
definition.

14. What are the different ways to trigger a DAG in Apache Airflow?

 Answer:

o You can trigger a DAG in Apache Airflow in several ways:

 Manually through the Airflow Web UI or CLI (airflow dags

trigger <dag_id>).

 Automatically based on a schedule defined by the

schedule_interval parameter.

 Using the API to trigger a DAG programmatically.

 From within another DAG, using a TriggerDagRunOperator

to trigger a DAG execution.

15. What are the common states of a Task Instance in Apache

Airflow?

Answer:

o A Task Instance can have several states, including:

 queued: The task is waiting for execution.

 running: The task is currently executing.

 success: The task has completed successfully.

 failed: The task has failed.

 skipped: The task was skipped, typically due to a trigger

rule or a condition.

 up_for_retry: The task is set to retry after failure.

 up_for_reschedule: The task has been rescheduled.

16. What is the role of the Airflow Scheduler?

 Answer:

o The Scheduler in Apache Airflow is responsible for triggering the

execution of tasks within a DAG at the appropriate time. It
continuously monitors the schedule and determines which tasks
are ready to be run based on their dependencies, schedule
intervals, and execution times. It ensures that tasks are executed
in the correct order according to the DAG's dependencies.

17. What is the difference between a Task and a DAG Run in Apache Airflow?

 Answer:

o A Task is a single unit of work in a DAG, which represents a

specific operation, like running a Python function, transferring
data, etc.

o A DAG Run represents a specific execution of a DAG, capturing a

point in time when the DAG was triggered to run. A single DAG
can have multiple runs, and each run consists of multiple task
instances, representing individual executions of tasks.

18. What is the significance of Airflow Variables?

 Answer:

o Airflow Variables are key-value pairs used to store configuration

values or dynamic data that can be accessed by tasks in a DAG.
They are helpful for parameterizing tasks, storing credentials,
and making workflows more flexible. Variables can be set and
fetched programmatically or via the Airflow UI.
19. What is the Airflow CLI, and how is it used?

 Answer:

o The Airflow Command Line Interface (CLI) allows users to interact

with the Airflow environment through commands executed in the
terminal. It provides several functionalities like triggering DAGs,
managing tasks, checking logs, and configuring the Airflow
environment. Common CLI commands include airflow dags list,
airflow tasks run, and airflow db init.

20. What is the purpose of the PythonOperator in Apache Airflow?

 Answer:

o The PythonOperator is one of the most commonly used operators

in Airflow. It allows you to execute Python functions as tasks in a
DAG. You define the Python function to be executed using the
python_callable parameter, and the operator takes care of
executing that function when the task is run.

21. How can you handle task dependencies in Apache Airflow?

 Answer:

o Task dependencies in Airflow are defined using the

set_upstream() and set_downstream() methods or using the >>
and << bitshift operators. These methods define the execution
order of tasks. A task will only run after its upstream tasks have
been completed successfully.

Example:

python

task1 >> task2 # task2 will run after task1 completes

22. What is the role of the Airflow Web Server?

 Answer:
o The Airflow Web Server provides the user interface (UI) for
managing and monitoring DAGs and tasks. It allows users to view
the status of DAG runs, task instances, logs, and other metadata
related to their workflows. The web server is typically accessed
via a web browser and is an essential tool for debugging and
interacting with Airflow.

23. How do you create and manage connections in Airflow?

 Answer:

o Connections in Airflow are used to store information required to

connect to external systems, such as databases, APIs, cloud
services, etc. Connections are configured through the Airflow UI
under the "Admin" tab or using the CLI (airflow connections
command). Each connection stores credentials and configuration
details that tasks can use to interact with external systems.

24. What are Sensors in Apache Airflow, and when would you use
them?

 Answer:

o Sensors are a type of operator in Airflow that waits for a certain

condition to be true before continuing execution. They are
typically used to wait for the presence of a file, a change in a
database, or an external signal before proceeding with further
tasks. For example, a FileSensor can be used to wait for a file to
appear in a directory before starting a data processing task.

25. What is the TriggerDagRunOperator, and when would you use it?

 Answer:

o The TriggerDagRunOperator allows you to trigger the execution

of another DAG from within a DAG. This is useful for creating
complex workflows where the completion of one DAG triggers
another DAG to run. You can define conditions and pass
parameters when triggering the second DAG.
26. What are Task Instances, and how do you monitor their status?

 Answer:

o A Task Instance is a specific execution of a task in a DAG. It is a

combination of a task and a DAG run, with its own execution
status (e.g., success, failed, running). Task instances are tracked
in the Airflow database and can be monitored via the Airflow Web
UI or through the CLI. You can check the status of task instances,
view logs, and retry tasks if needed.

27. What is the purpose of SubDAGs in Apache Airflow?

 Answer:

o A SubDAG is a DAG defined within another DAG. It is useful for

organizing complex workflows by breaking them down into
smaller, reusable pieces. SubDAGs allow you to manage tasks
that need to be grouped together logically and can be used to
reduce the complexity of large DAGs. SubDAGs are executed as a
task in the parent DAG.

28. What are the different Executor types in Apache Airflow, and
what are their use cases?

 Answer:

o The Executor in Airflow determines how tasks are executed.

There are several types of executors:

 SequentialExecutor: Executes tasks one at a time, useful

for development or testing.

 LocalExecutor: Executes tasks locally in parallel on a single

machine.

 CeleryExecutor: Uses Celery to distribute tasks across

multiple worker nodes, ideal for scaling Airflow in a
distributed environment.
 KubernetesExecutor: Executes tasks on a Kubernetes
cluster, suitable for cloud-native environments where tasks
need to be run in isolated containers.

29. How do you define task retries in Apache Airflow?

 Answer:

o Task retries can be defined in Airflow using the retries and

retry_delay parameters. The retries parameter specifies how
many times a task should be retried if it fails, while retry_delay
sets the time interval between retries.

o Example:

python

task = PythonOperator(

task_id='my_task',

python_callable=my_function,

retries=3,

retry_delay=timedelta(minutes=10),

dag=dag

30. What is Airflow’s "Backfilling" feature?

 Answer:

o Backfilling in Airflow refers to the process of automatically

running tasks for past execution dates when the DAG is triggered
for the first time or after a schedule delay. If the DAG was not run
for a certain period, Airflow will backfill the missed runs for that
period to ensure the workflow is consistent and complete.

31. How does Airflow handle failed tasks?

 Answer:
o When a task fails in Airflow, the system typically attempts to
retry the task based on the configured retries and retry_delay
parameters. If a task fails beyond the number of allowed retries,
its status is marked as failed. You can also use trigger rules to
customize behavior for failed tasks, such as running downstream
tasks only if upstream tasks fail.

32. What is the default DAG execution order in Apache Airflow?

 Answer:

o By default, Airflow runs DAGs and their tasks in the order

specified by the task dependencies (i.e., task order is determined
by the >> and << operators or set_upstream() and
set_downstream() methods). Airflow ensures that a task can only
run after its upstream tasks have been completed successfully.

33. How can you trigger a task within a DAG using the
TriggerDagRunOperator?

 Answer:

o The TriggerDagRunOperator is used to trigger another DAG run

from within the current DAG. It allows for creating more complex
workflows by chaining multiple DAGs together.

o Example:

python

trigger = TriggerDagRunOperator(

task_id='trigger_another_dag',

trigger_dag_id='another_dag',

dag=dag

34. What is a DAG Run in Apache Airflow?

 Answer:

o A DAG Run represents a specific execution of a DAG for a

particular scheduled time or trigger. Each time the scheduler
triggers a DAG, a new DAG Run is created. This allows for
tracking the execution of all tasks in the workflow for that
specific run.

35. What is the difference between upstream and downstream tasks

in Airflow?

 Answer:

o An upstream task is a task that must complete before the current

task can run. In other words, it is a dependency for the current
task.

o A downstream task is a task that runs after the current task

completes. It depends on the current task.

36. How do you view logs for a task in Airflow?

 Answer:

o Task logs in Airflow can be viewed from the Airflow Web UI. To
view logs:

 Navigate to the DAG's task instance page.

 Click on a task that has run (e.g., success, failed, or

running).

 Click the log button to view detailed logs for that specific
task instance.

37. What is the use of the schedule_interval in a DAG?

 Answer:

o The schedule_interval parameter defines how frequently a DAG

should run. It can be set using:

 A cron expression (e.g., '0 12 * * *' for running at noon

every day).

 Presets, like @daily, @hourly, @once, etc.

 A timedelta object (e.g., timedelta(days=1) for running
once every day).

o If the schedule_interval is not provided, the DAG is considered to

run on-demand.

Medium Level Questions

1. Explain the concept of DAGs in Airflow. How are they different

from tasks?

 Answer: A DAG (Directed Acyclic Graph) in Airflow is a collection of

tasks organized in a way that defines the order of execution. A DAG
represents the workflow, whereas tasks are the individual units of work
that are executed. The DAG determines how tasks are scheduled and
executed in Airflow, while tasks themselves define the operations to be
performed.

2. What is the role of the Airflow Scheduler, and how does it work?

 Answer: The Airflow Scheduler is responsible for triggering tasks based

on the schedule defined in the DAG. It monitors the DAGs and
determines when tasks need to be executed. It checks the conditions
for task execution (such as dependencies and schedules) and pushes
the tasks into the execution queue.

3. How can you handle task dependencies in Airflow?

 Answer: Task dependencies in Airflow can be set using the

set_upstream() and set_downstream() methods or by using the bitshift
operators (>> and <<). These define the order in which tasks should
be executed, ensuring that tasks are run only when their upstream
dependencies are complete.

4. What are Airflow Operators? Can you give examples of commonly

used ones?

 Answer: Operators in Airflow are pre-defined templates that define

what a task will do. Some common operators are:

o BashOperator: Executes bash commands.

o PythonOperator: Runs Python functions.

o HttpOperator: Makes HTTP requests.

o BranchPythonOperator: Allows branching logic in workflows.

o PostgresOperator: Executes SQL queries in a PostgreSQL

database.

5. What is the difference between Task Instance and DAG Run in

Airflow?

 Answer:

o A Task Instance represents a specific execution of a task in a

particular DAG run. It holds metadata such as execution date,
status, and logs.

o A DAG Run refers to an instance of the execution of the DAG

itself. It represents the entire execution of the DAG with all tasks
that belong to it.

6. How would you handle retries for a failed task in Airflow?

 Answer: Airflow provides the retries parameter in task definitions to

specify the number of retry attempts for a failed task. You can also
specify the retry_delay parameter to define the wait time between
retries. Additionally, the retry_exponential_backoff option can be set to
apply an exponentially increasing backoff between retries.

7. How do you manage task failures and retries in Airflow?

 Answer: Task failures can be managed using:

o Retries: By configuring the retries and retry_delay parameters in

the task definition.

o Callback functions: Airflow allows you to define callback functions

like on_failure_callback to send notifications or take corrective
actions when a task fails.

o Alerting and Monitoring: Integrating with monitoring systems

(e.g., Slack, email) to alert when tasks fail.

8. What is the role of Airflow's XCom?

 Answer: XCom (short for "Cross-Communication") is used to exchange

data between tasks in Airflow. Tasks can push and pull data to/from
XComs, which allows tasks to share information. For example, one task
can push a result to XCom, and a subsequent task can pull the result
for further processing.
9. How would you trigger a DAG manually in Airflow?

 Answer: A DAG can be triggered manually using the Airflow UI, CLI, or
API:

o UI: Go to the DAGs page and click the "Trigger DAG" button for
the desired DAG.

o CLI: Use the airflow dags trigger <dag_id> command.

o API: You can send a POST request to the Airflow REST API to
trigger a DAG.

10. How does Airflow handle parallelism and concurrency?

 Answer: Airflow has several parameters for controlling parallelism and

concurrency:

o DAG-level concurrency: Controlled by dag_concurrency

(maximum number of task instances allowed to run
simultaneously in a DAG).

o Task-level concurrency: Controlled by task_concurrency

(maximum number of task instances allowed to run
simultaneously per task).

o Global parallelism: Controlled by the parallelism setting in the

airflow.cfg file (maximum number of task instances allowed to
run across all DAGs).

11. What is the purpose of Pool in Airflow?

 Answer: Pools are a mechanism to limit the number of concurrent task

instances running in a given pool. This is useful when there are limited
resources, and you want to avoid overwhelming a particular service,
like a database. Tasks are assigned to a pool, and the number of
concurrent tasks within a pool is controlled by the pool's size.

12. What is the difference between TriggerDagRunOperator and

SubDagOperator?

 Answer:

o TriggerDagRunOperator: This operator is used to trigger another

DAG as part of the current DAG execution. The triggered DAG
runs independently.
o SubDagOperator: This operator is used to define a sub-DAG
within a parent DAG. It allows for nesting DAGs and reusing
workflows.

13. What is the Airflow web server, and how does it interact with
other components?

 Answer: The Airflow web server provides a web-based UI to interact

with Airflow components, monitor DAG runs, check task statuses, and
manage workflows. It interacts with the metadata database to display
information about DAGs, tasks, and logs. It also allows triggering DAGs,
checking logs, and scheduling tasks.

14. Explain the concept of "Backfilling" in Airflow.

 Answer: Backfilling is the process of executing missed or delayed tasks

for past dates in a DAG. If a DAG run is skipped (due to downtime or
other reasons), Airflow can be configured to "backfill" the missed tasks
when the DAG runs again, ensuring the tasks are not missed.

15. What is the significance of start_date and end_date in a DAG

definition?

 Answer: The start_date parameter defines when the DAG should start
running. It doesn't mean the DAG will start at that exact time, but it
indicates the first date the scheduler should consider for scheduling
the DAG. The end_date specifies the date when the DAG should stop
running.

16. What are some common performance optimizations you can

apply to Airflow?

 Answer: Common optimizations include:

o Task Parallelism: Adjusting the parallelism and

dag_concurrency settings to allow more tasks to run
concurrently.

o Executor Selection: Using a more scalable executor like the

CeleryExecutor or KubernetesExecutor for distributed execution,
especially for larger workloads.

o Database Optimization: Tuning the Airflow metadata database,

such as optimizing the database indexes and increasing its
performance to handle large volumes of task metadata.
o Task Size Management: Breaking larger tasks into smaller
ones to improve execution times and reduce contention.

o Efficient XCom Usage: Avoiding large data exchanges via

XComs to prevent excessive database storage usage.

17. What is the difference between airflow.trigger_dag() and

TriggerDagRunOperator?

 Answer:

o airflow.trigger_dag(): This is a Python method that triggers a

DAG run directly within a script or Python code.

o TriggerDagRunOperator: This is an Airflow operator used

within a DAG to trigger another DAG as part of its execution. It's
typically used when you want to include DAG triggering as part of
a task’s workflow.

18. What are task states in Airflow, and what do they mean?

 Answer: Common task states include:

o queued: The task is waiting to be executed.

o running: The task is currently being executed.

o failed: The task has failed.

o success: The task has successfully completed.

o up_for_retry: The task is eligible for retry based on the defined

retry logic.

o skipped: The task has been skipped, typically due to branching

logic.

o upstream_failed: The task has not run due to the failure of an

upstream task.

19. What is Airflow's on_failure_callback and how do you use it?

 Answer: The on_failure_callback is a callback function that gets

executed when a task fails. It can be used to send alerts, trigger
compensation logic, or log additional information. For example, it might
send an email or notify a monitoring system when a task fails.

20. How can you avoid running a task twice in Airflow?

 Answer: Several ways to prevent task duplication include:

o Unique Task IDs: Ensure that task IDs are unique across all
DAGs.

o Use of depends_on_past: This parameter ensures that tasks

don’t run if the previous task in a previous DAG run failed.

o Use of wait_for_downstream: This ensures that a task only

runs when all downstream tasks have successfully completed.

21. What are Airflow Connections and how do you define them?

 Answer: Connections in Airflow store information required for external

systems like databases, APIs, and message queues. They contain
credentials, hostnames, and other details required to authenticate and
connect to these systems. Connections can be defined through the
Airflow UI, via the airflow connections CLI, or through environment
variables.

22. What is an Airflow "sensor" and how is it used?

 Answer: A sensor is a special type of operator in Airflow that waits for

a certain condition to be met (e.g., a file being available, a database
record being inserted, etc.). Sensors are typically long-running tasks
that poll for the condition and do not complete until the condition is
true.

23. How do you perform a "rolling upgrade" on an Airflow cluster?

 Answer: A rolling upgrade involves upgrading Airflow components one

at a time, ensuring that the rest of the cluster continues to run while
the upgrade takes place. This involves:

o Stopping one worker or component at a time.

o Upgrading the component.

o Restarting it and ensuring it's healthy.

o Moving to the next component. This ensures that the system

remains operational throughout the upgrade process.

24. Explain the use of retry_delay and max_retry_delay in task

retries.

 Answer:
o retry_delay: Specifies the fixed amount of time to wait between
retry attempts.

o max_retry_delay: Defines the maximum delay between retries,

used when exponential backoff is enabled. The delay between
retries will not exceed this value.

25. How would you schedule a DAG to run every 10 minutes, but
only on weekdays?

 Answer: You can define the schedule in the schedule_interval

parameter using cron expressions. For example:

python

dag = DAG(

'my_dag',

schedule_interval='/10 * * 1-5', # every 10 minutes on weekdays

This cron expression */10 * * * 1-5 will run the DAG every 10 minutes on
Monday through Friday.

26. What is the purpose of the catchup parameter in Airflow?

 Answer: The catchup parameter controls whether or not Airflow

should backfill tasks for all the missing intervals between the
start_date and the current date. By default, Airflow will try to run all
missed DAG runs (backfilling). If you set catchup=False, it will only run
for the current and future intervals, skipping the backfilling.

27. How would you implement dynamic task generation in Airflow?

 Answer: Dynamic task generation in Airflow can be achieved by

iterating over a list of items (e.g., a list of parameters or files) and
creating tasks programmatically. For example:

python

for item in items:

task = PythonOperator(

task_id=f'task_{item}',

python_callable=my_function,
op_args=[item],

dag=dag,

This would dynamically create a task for each item in the list.

28. What is the difference between the LocalExecutor and the

CeleryExecutor?

 Answer:

o LocalExecutor: Executes tasks in parallel within a single

machine. It is simpler and doesn’t require external components,
but it’s less scalable.

o CeleryExecutor: Distributes task execution across multiple

worker nodes, providing horizontal scalability. It requires setting
up a message broker like Redis or RabbitMQ and is suitable for
handling large workloads in a distributed environment.

29. How would you handle secrets management in Airflow?

 Answer: Airflow can integrate with secrets management solutions like

HashiCorp Vault, AWS Secrets Manager, or Google Cloud Secret
Manager. You can use the Secrets Backend in Airflow to pull sensitive
information like API keys, passwords, and other secrets at runtime,
instead of storing them directly in the code or Airflow metadata.

30. What is a “task timeout” and how do you configure it in Airflow?

 Answer: A task timeout specifies the maximum amount of time a task

is allowed to run before being terminated. You can set it using the
execution_timeout parameter in the task definition:

python

task = PythonOperator(

task_id='my_task',

python_callable=my_function,

execution_timeout=timedelta(minutes=30),

dag=dag,

)
If the task exceeds the specified time, Airflow will terminate it and mark it as
failed.

31. Explain the concept of "task instance lifecycle" in Airflow.

 Answer: A task instance in Airflow has a lifecycle defined by the

following states:

o Created: Task has been created but not yet scheduled.

o Queued: Task is waiting to be picked up by a worker.

o Running: Task is currently executing.

o Success: Task has successfully completed.

o Failed: Task execution failed.

o Up for Retry: Task has failed but is awaiting retry based on retry
parameters.

o Skipped: Task was skipped due to conditional logic (e.g.,

branching).

Advance Level Questions

1. How would you optimize the performance of an Airflow system

handling high-volume workloads?

 Answer: Optimizing Airflow for high-volume workloads involves:

o Executor choice: Using a distributed executor such as

CeleryExecutor or KubernetesExecutor to handle large-scale
parallelism.

o Parallelism and concurrency tuning: Adjusting the parallelism,

dag_concurrency, and task_concurrency settings to balance the
load across resources.

o Database optimization: Using a high-performance backend

database (e.g., PostgreSQL or MySQL), tuning database queries,
optimizing indexes, and ensuring efficient transaction handling.

o Task splitting: Breaking large tasks into smaller tasks to increase

parallelism and reduce task execution time.
o Resource allocation: Assigning sufficient resources (e.g., CPU,
memory) to worker nodes and scaling horizontally to distribute
the load.

o Caching and task dependency management: Using cached

results or intermediate outputs to avoid redundant work and
optimizing task dependencies to reduce unnecessary reruns.

2. What strategies would you employ to ensure fault tolerance and

high availability in an Airflow deployment?

 Answer: Strategies for ensuring fault tolerance and high availability in

Airflow include:

o Database high availability: Deploying a highly available metadata

database with failover mechanisms (e.g., using replication or
clustering in PostgreSQL).

o Executor failover: Using distributed executors like CeleryExecutor

with multiple worker nodes to ensure redundancy. If one worker
fails, others can take over.

o Web server redundancy: Deploying multiple instances of the

Airflow web server behind a load balancer to ensure availability.

o Health checks and monitoring: Setting up monitoring for Airflow

components (scheduler, workers, web server) to ensure they are
running correctly and to receive alerts in case of failures.

o Task retries and alerting: Configuring task retries appropriately

and setting up callback functions (e.g., on_failure_callback) for
alerting and recovery actions.

o Backups: Regularly backing up the Airflow metadata database

and other critical components to prevent data loss.

3. How does Airflow handle dynamic scaling in a cloud environment

(e.g., Kubernetes)?

 Answer: In a cloud environment like Kubernetes, Airflow can

dynamically scale the number of worker pods based on the workload:

o KubernetesExecutor: This executor allows Airflow tasks to run on

dynamically provisioned Kubernetes pods. The number of pods
can scale up or down based on the number of tasks in the queue,
allowing for efficient resource allocation and workload
distribution.

o Horizontal Pod Autoscaling (HPA): Kubernetes supports

autoscaling, and you can configure HPA to automatically scale
the number of Airflow worker pods based on CPU or memory
usage.

o Custom Kubernetes Resources: Airflow can specify resource limits

(CPU, memory) for each task and dynamically scale resources to
match the needs of the workload.

o Pod Restart Policy: If tasks fail or are interrupted, Kubernetes can

automatically restart the pods as per the defined policy to ensure
resiliency.

4. What are the advantages and disadvantages of using

SubDagOperator versus TriggerDagRunOperator for workflow
orchestration?

 Answer:

o SubDagOperator:

 Advantages:

 Ideal for nesting a set of tasks that need to be

logically grouped together within a larger DAG.

 Helps in reusing workflows and managing complex

task dependencies.

 Disadvantages:

 Harder to monitor due to limited visibility in the UI

(the sub-DAG execution appears as a single task).

 Can complicate debugging, as you need to look into

sub-DAG logs separately.

 Risk of increased complexity if sub-DAGs are too

large or nested deeply.

o TriggerDagRunOperator:

 Advantages:
 Triggers an entirely separate DAG, allowing you to
decouple workflows and run them independently.

 Each triggered DAG can be scheduled, run, and

monitored independently, leading to better isolation.

 Disadvantages:

 Potential overhead from triggering DAGs externally

(could require additional configuration or setup).

 More difficult to track execution status across

different DAGs since the triggered DAG is not
integrated into the parent DAG’s task flow.

5. How do you handle managing secrets and credentials securely in

Airflow?

 Answer: Secrets and credentials management in Airflow can be

handled securely in the following ways:

o Airflow Secrets Backend: Integrating with external secret

management services like HashiCorp Vault, AWS Secrets
Manager, or Google Secret Manager. These services securely
store and retrieve credentials, and Airflow can access them at
runtime through the Secrets Backend.

o Environment Variables: Storing sensitive information in

environment variables, which are then accessed by Airflow.
However, this should be used with caution as it may expose
secrets in certain configurations.

o Encrypted Connections: Storing sensitive connection information

(e.g., database passwords) in the Airflow metadata database
with encryption enabled.

o Masking Credentials: Masking sensitive credentials in logs by

using Airflow's connection interface or custom masking functions.

6. How would you monitor and log Airflow's performance and task
execution?

 Answer: Monitoring and logging can be achieved by:

o Airflow's built-in logs: Using the task logs available in the Airflow
UI. Each task instance records detailed logs of its execution,
including errors, warnings, and standard output.

o External monitoring tools: Integrating with tools like Prometheus,

Grafana, or Datadog to monitor Airflow’s resource usage (CPU,
memory, worker availability) and task performance.

o Custom metrics: Using Airflow's custom metrics API to collect

performance metrics on task success rates, duration, retries, and
system health.

o Alerting: Configuring alerting systems (e.g., email, Slack, or

PagerDuty) through on_failure_callback or on_success_callback to
receive notifications in case of task failures, retries, or critical
events.

o Airflow's built-in health check: Monitoring the status of Airflow

components (scheduler, workers, web server) and ensuring their
health.

o External log aggregation: Using tools like ELK Stack

(Elasticsearch, Logstash, Kibana) or Splunk to aggregate and
analyze logs for more extensive querying and alerting.

7. Explain the differences between TaskQueue and TaskPool in

Airflow.

 Answer:

o TaskQueue: Refers to a task scheduling mechanism that handles

task execution by placing tasks in a queue for execution. It is
primarily used in distributed systems where multiple workers are
available.

o TaskPool: Airflow’s task pool feature helps limit the number of

concurrent executions for certain tasks. By defining a pool, you
can control the concurrency for tasks that use the same pool.
This is useful when you want to restrict the number of tasks
accessing limited resources, such as a database or an API.

8. How does Airflow handle task scheduling and execution order in a

highly concurrent environment?

 Answer: In a highly concurrent environment, Airflow uses:

o Task Dependencies: Airflow relies on the DAG structure and task
dependencies to determine the execution order. Tasks with
unmet dependencies will not run until their upstream tasks have
finished successfully.

o Task Queues: Tasks are assigned to queues based on the

executor configuration (e.g., Celery or Kubernetes). Workers pull
tasks from these queues and execute them when resources are
available.

o Task Concurrency and Parallelism: You can control the number of

tasks that can run in parallel within a DAG using the
dag_concurrency setting or limit the number of concurrent tasks
per worker using the task_concurrency parameter.

o Executor-Specific Scheduling: Different executors (e.g.,

CeleryExecutor, KubernetesExecutor) have different strategies
for managing task scheduling and task execution. The scheduler
distributes tasks across available workers, and each executor
handles concurrency in a distributed manner.

9. What are some best practices for writing production-ready

Airflow DAGs?

 Answer:

o Modular and reusable code: Keep DAG code modular, using

functions and external scripts for reusable logic.

o Error handling and retries: Implement retries, failure callbacks

(on_failure_callback), and proper error handling within tasks.

o Logging: Use structured logging to capture important runtime

information. Leverage Airflow's logging mechanisms to track task
progress and diagnose issues.

o Configuration as code: Store Airflow configuration and

credentials in version-controlled files. Avoid hardcoding sensitive
data in the DAG code.

o Testing: Use unit tests or integration tests to ensure that the

DAGs work as expected. Test with different parameters and
configurations.
o Clear task dependencies: Ensure that task dependencies are
clearly defined and that tasks only run when their dependencies
are completed.

10. Can you explain Airflow's "backfilling" mechanism? How does it

work in scenarios of missed or delayed executions?

 Answer: Backfilling is the process of automatically filling in missed or

delayed task runs in Airflow. If a DAG is scheduled to run but does not
execute for any reason (e.g., due to the system being down or if
catchup=True), Airflow will backfill and run tasks for the missed
intervals. This process can be controlled with the catchup parameter,
and you can prevent it by setting catchup=False. Backfilling can
consume considerable resources if not controlled properly, especially
for large DAGs with many tasks, so it should be carefully managed in
production systems.

11. How do you handle DAG concurrency and task parallelism in a

large-scale Airflow deployment?

 Answer: To manage concurrency and parallelism effectively in a large-

scale Airflow deployment:

o dag_concurrency: This parameter controls the maximum number

of task instances that can run simultaneously within a DAG. If
you have a large number of tasks, you can fine-tune this to
prevent overwhelming the system.

o Task-level concurrency: Using task_concurrency, you can restrict

the number of parallel task instances for a specific task. For
instance, if you're dealing with limited resources like an external
database or API, you can use this to avoid overloading the
service.

o Resource-based queues: Set up different task queues for

different resource types (e.g., database-heavy tasks, CPU-heavy
tasks). This allows workers to prioritize and pull tasks that fit
their available resources.

o parallelism setting: The global setting parallelism limits the total

number of task instances that can run concurrently across all
DAGs.
o Executor choice: For larger scale deployments, choose
distributed executors like CeleryExecutor or KubernetesExecutor
that can scale horizontally across multiple workers.

12. Explain the internal workings of Airflow’s scheduler and how it

determines when a task should be executed.

 Answer: The Airflow scheduler is responsible for determining which

tasks should be executed and when:

o DAG Parsing: The scheduler continuously parses the DAG files to

determine the schedule and dependencies. It checks whether
tasks are eligible to run based on the execution date,
dependencies, and the start_date.

o Triggering Tasks: The scheduler uses the schedule_interval and

the start_date to compute the next scheduled run of the DAG. For
each task, the scheduler checks if the upstream tasks are
complete, any retry_delay or execution_timeout have passed,
and if any time constraints (like end_date) are met.

o Task Queueing: Once the task is ready to run, it is queued for

execution on a worker. The worker then executes the task and
reports back to the scheduler when completed.

o Task State Management: The scheduler updates the task's state

to queued once it’s ready for execution, and to running once the
task has been picked up by a worker.

13. How does Airflow handle DAG and task dependency resolution in
case of failure or retries?

 Answer:

o Task Dependencies: Airflow ensures that tasks only run if their

upstream dependencies have been successfully completed. If a
task fails, its downstream tasks will not be executed unless
certain conditions are met (e.g., trigger_rule is set to all_failed).

o Retries: If a task fails, Airflow will retry it based on the configured

retry logic (retries, retry_delay, max_retry_delay). During retries,
Airflow will attempt to run the task again, and each retry will
follow the same task dependency rules.
o Failure Handling: The failure of a task triggers Airflow's task
failure mechanism. Depending on the failure callback
(on_failure_callback), the system may alert the user, trigger a
different task (e.g., compensation logic), or proceed to the next
task (if ignore_first_depends_on_past is set).

o depends_on_past: If depends_on_past is set to True, Airflow

ensures that a task can only run if its previous run succeeded.
This prevents running tasks if their previous iterations failed,
maintaining the task execution flow.

14. How would you implement and manage complex branching logic
in Airflow?

 Answer: Airflow provides several ways to implement complex

branching logic:

o BranchPythonOperator: This operator allows you to conditionally

decide which path to take based on the output of a Python
function. The function should return the task ID of the next task
that should execute, and the others will be skipped.

o ShortCircuitOperator: This operator can short-circuit the

execution of downstream tasks based on a condition, preventing
them from running if the condition evaluates to False.

o TriggerRule: Task execution in Airflow can be controlled using

different trigger rules. By default, tasks only run if all upstream
tasks are successful, but you can change this with trigger rules
like one_failed, none_failed, or all_failed to implement more
complex logic.

o PythonOperator with branching logic: You can implement

conditional logic in a custom Python function that executes and
dynamically decides the next task to run.

15. How do you configure and use custom operators in Airflow?

 Answer: To configure and use custom operators in Airflow:

o Defining a Custom Operator: You can create a custom operator

by subclassing BaseOperator. Override the execute method to
implement your custom logic:

python
from airflow.models import BaseOperator

from airflow.utils.decorators import apply_defaults

class MyCustomOperator(BaseOperator):

@apply_defaults

def init(self, param1, param2, *args, **kwargs):

super().__init__(*args, **kwargs)

self.param1 = param1

self.param2 = param2

def execute(self, context):

# Implement your logic here

print(f"Running custom operator with {self.param1} and

{self.param2}")

o Using the Custom Operator: Once defined, the custom operator

can be used like any other operator within a DAG:

python

custom_task = MyCustomOperator(

task_id='my_custom_task',

param1='value1',

param2='value2',

dag=dag

16. What are some common pitfalls in Airflow when scaling for high
throughput, and how can they be avoided?

 Answer: Some common pitfalls and solutions include:

o Metadata Database Bottleneck: In large-scale Airflow

deployments, the metadata database can become a bottleneck
as it stores information on task statuses, logs, and more. To avoid
this:

 Use a highly available and horizontally scalable database

(e.g., PostgreSQL or MySQL with replication).

 Consider sharding the database or using a distributed

cache like Redis for non-critical data.

o Task Queue Overload: If too many tasks are queued up and there
aren’t enough workers, task execution can be delayed. This can
be mitigated by:

 Using multiple queues (e.g., for resource-heavy tasks).

 Scaling up or scaling out the workers using Kubernetes or

Celery.

o Long-running Tasks: Tasks that run for a long time may

monopolize worker resources. Mitigate this by:

 Breaking tasks into smaller, more granular units of work.

 Using task timeouts and retries to manage long-running

operations.

o Task Retry Storms: If many tasks are retried simultaneously, it

can overwhelm the system. This can be managed by:

 Using exponential backoff for retries (retry_delay,

max_retry_delay).

 Limiting the number of retries (retries).

17. How does Airflow handle time zones, and how would you ensure
consistency across different environments?

 Answer:

o Time Zone Handling: Airflow’s time zone handling is controlled by

the timezone setting in the DAG. You can set it to UTC or a local
time zone (e.g., 'America/New_York'). Airflow supports both naive
and aware datetime objects for scheduling.

o Consistent Time Zones: To ensure consistency across

environments:
 Always use UTC in production environments to avoid
daylight saving time (DST) issues.

 Set default_timezone='UTC' in the airflow.cfg file and

ensure all DAGs use the same time zone settings.

 Ensure that the worker and scheduler servers are

synchronized with a reliable time source (e.g., NTP).

18. What is the difference between the AirflowExecutor and the

KubernetesExecutor, and when would you choose one over the
other?

 Answer:

o AirflowExecutor (e.g., LocalExecutor, CeleryExecutor):

 Best for non-containerized or legacy environments.

 Suitable for smaller to medium-scale workflows where

distributed execution on worker machines is needed.

 Less overhead in terms of setup and configuration

compared to KubernetesExecutor.

o KubernetesExecutor:

 Best for cloud-native or containerized environments.

 Dynamically provisions Kubernetes pods for each task,

providing isolated environments for task execution.

 Ideal for large-scale deployments with fluctuating

workloads that need dynamic resource allocation.

 KubernetesExecutor is more scalable and allows for greater

resource isolation and efficient resource utilization.

19. How do you deal with DAGs that have many tasks and complex
interdependencies in terms of maintainability and performance?

 Answer:

o DAG Modularity: Split large DAGs into smaller, more manageable

sub-DAGs or trigger other DAGs using TriggerDagRunOperator.
This improves readability and reduces complexity.
o Use of Pools: Define task pools for resource-heavy operations to
limit concurrency on specific tasks and avoid overloading the
system.

o Task Grouping: Use TaskGroup to logically group related tasks in

a DAG.

20. Explain Airflow's "Executor" mechanism and how to choose the

right executor for your use case.

 Answer:

o Executor Types: Airflow supports several executors that

determine how and where tasks are executed.

 SequentialExecutor: This is the default executor and

runs tasks sequentially (useful for testing).

 LocalExecutor: Allows parallel task execution on a single

machine, suitable for small to medium-sized deployments.

 CeleryExecutor: Uses Celery to distribute tasks across

multiple worker nodes, making it suitable for large
distributed systems.

 KubernetesExecutor: Launches each task in a separate

Kubernetes pod. This is highly scalable and ideal for cloud-
native deployments.

 DaskExecutor: A newer option that uses Dask for parallel

task execution. It can scale horizontally and is designed for
machine learning workflows.

o Choosing the Right Executor: The choice of executor depends

on the scale of the system and the architecture:

 For smaller systems, LocalExecutor or SequentialExecutor

may suffice.

 For larger, distributed systems, CeleryExecutor or

KubernetesExecutor is recommended.

 For cloud-native applications with scalable infrastructure,

KubernetesExecutor is preferred.

 For machine learning and data science workloads,

DaskExecutor may be beneficial.
21. How would you handle a situation where DAG execution is being
delayed due to insufficient worker capacity?

 Answer:

o Scaling Workers: Ensure that the number of workers is

dynamically adjustable. For a distributed executor like
CeleryExecutor, you can add more worker nodes or scale up
resources to meet demand.

o Resource Allocation and Prioritization: Use task_queues and

pools to better distribute tasks across workers and limit
concurrency for specific tasks. This can prevent task bottlenecks
when certain tasks require more resources.

o Dynamic Scaling with Kubernetes: If using

KubernetesExecutor, leverage Kubernetes’ Horizontal Pod
Autoscaler (HPA) to automatically scale the number of worker
pods based on resource usage (e.g., CPU or memory utilization).

o Task Prioritization: Use priority weights (priority_weight) to

prioritize certain tasks over others, ensuring that critical tasks
are picked up first.

22. What is the role of the "DAGrun" in Airflow, and how does it
relate to task execution?

 Answer:

o DAGRun: A DAGRun represents an instantiation of a DAG for a

specific execution time (execution_date). It contains information
such as the execution date, the status of tasks, and the
configuration for that specific run.

o Relationship to Task Execution: Tasks within a DAG are linked

to the DAGRun. When a DAG is triggered (via schedule or manual
run), a new DAGRun is created, and tasks within the DAG are
executed based on the run's context (such as execution_date).

o Handling Multiple Runs: Multiple DAGRun instances can exist

concurrently if catchup=True. This can lead to issues with task
parallelism, so it’s essential to manage task concurrency and
DAGRun configurations carefully to prevent overloading the
system.
23. Explain how you would implement and manage retries for tasks
in Airflow, especially in high-failure environments.

 Answer:

o Task Retries: Each task in Airflow has a retries parameter that

defines the maximum number of retry attempts after a failure.
The retry_delay parameter defines the time between retries, and
max_retry_delay controls the maximum time between retries
(helpful in preventing long delays).

o Exponential Backoff: To avoid retry storms, exponential backoff

can be implemented by increasing the delay time after each
retry.

o Custom Retry Logic: You can implement custom retry logic

using the on_retry_callback parameter, which can trigger an alert
or logging function every time a task is retried.

o Failure Handling: Use the on_failure_callback to trigger

notifications or compensation logic when a task fails.
Additionally, tasks with critical dependencies should have a
failure policy that propagates to upstream tasks to halt execution
if a failure is encountered.

24. How would you handle DAG scheduling when a DAG has a large
number of tasks with complex dependencies, and you need to
ensure that it does not overwhelm the scheduler?

 Answer:

o DAG and Task Modularity: Break up the DAG into smaller sub-
DAGs using the SubDagOperator to reduce complexity and
manage dependencies more easily.

o catchup=False: When setting up DAGs with a large number of

historical data points (e.g., hourly, daily), setting catchup=False
can help prevent backfilling unnecessary past runs, which can
overwhelm the scheduler.

o Task Grouping: Use TaskGroup to logically group tasks in the UI,

making complex DAGs more manageable and readable.
o DAG Concurrency: Use dag_concurrency to limit the number of
concurrent task instances within a single DAG, reducing load on
the scheduler and workers.

o Resource Pools: Create resource pools for specific tasks that

require shared resources (e.g., database connections) to limit the
number of concurrent executions of these tasks.

25. How does Airflow handle backfilling, and what happens if a DAG
run is missed or delayed?

 Answer:

o Backfilling: Backfilling in Airflow happens when a scheduled run

is missed or delayed, and Airflow automatically triggers the
execution of tasks for the missed time periods based on the
start_date and catchup setting.

o Missed or Delayed DAG Runs: If catchup=True, Airflow will

backfill for all the missed intervals between the start_date and
the current date. This can put strain on the scheduler and
workers, so it should be used judiciously.

o Controlling Backfilling: You can set catchup=False to disable

backfilling, ensuring that only the latest scheduled DAG run is
executed. If backfilling is necessary, breaking the DAG into
smaller units or using the TriggerDagRunOperator can help
control the flow.

26. What are Airflow’s mechanisms for task state management, and
how would you handle task failures in production?

 Answer:

o Task States: Airflow maintains several states for tasks such as

queued, running, success, failed, up_for_retry, and skipped.
These states are critical for determining the task's execution flow
and are stored in the metadata database.

o Task Failure Management: In a production environment:

 Use retries and retry_delay to automatically retry failed

tasks.

 Set on_failure_callback to notify stakeholders (e.g., via

email, Slack, or PagerDuty) when a task fails.
 If tasks frequently fail, investigate and address root causes
(e.g., database connection issues, resource limits).

 Use trigger_rule to control downstream tasks when tasks

fail (e.g., using all_failed to execute tasks only if all
upstream tasks fail).

o Task Recovery: Tasks that fail due to external systems (e.g.,

APIs, databases) can be retried with exponential backoff or
compensation logic (e.g., send an alert and run alternative
recovery tasks).

27. What are some advanced strategies for dealing with data skew
when running Airflow in distributed environments?

 Answer:

o Task Partitioning: Break tasks into smaller, more manageable

pieces. For example, when processing large datasets, partition
the data based on logical splits (e.g., by time or categories) to
ensure tasks run in parallel and are not resource-intensive.

o Dynamic Task Generation: Instead of hardcoding tasks, use

dynamic task generation (using loops or operators like
PythonOperator to dynamically create tasks for each partition).

o Resource Pools: In distributed environments, use resource

pools to allocate a limited number of resources to specific tasks.
This can help prevent certain tasks from overwhelming available
resources.

o Caching or Preprocessing: Preprocess or cache data in

smaller, consistent chunks to avoid having to process large
amounts of data in one task.

o Adjusting Task Parallelism: Tune task_concurrency and

dag_concurrency for specific tasks that are heavily skewed,
ensuring that the system does not become overwhelmed when
too many tasks run simultaneously.

28. How do you ensure security and data privacy in a production-

level Airflow deployment?

 Answer:
o Role-based Access Control (RBAC): Airflow provides RBAC to
restrict access to DAGs and their components. This ensures only
authorized users can trigger, pause, or edit DAGs.

o Airflow Connections and Secrets Management: Store

sensitive information such as database credentials, API keys, and
passwords using Airflow's connection UI or external secret
backends like AWS Secrets Manager, HashiCorp Vault, or Google
Secret Manager.

o TLS/SSL: Ensure that the Airflow web server and any

communication between workers, schedulers, and databases are
encrypted using TLS/SSL.

o Auditing: Airflow’s logging mechanism helps track and audit all

DAG runs, task executions, and errors, making it easier to detect
unauthorized or suspicious activity.

o Environment Isolation: Run Airflow in isolated environments

(e.g., Kubernetes, Docker) to limit access to external services
and resources, ensuring that only necessary components can
interact with sensitive data.

Apache Airflow 1741977651
No ratings yet
Apache Airflow 1741977651
83 pages
Microsoft Excel Formulas and Functions (Office 2021 and Microsoft 365) 1st Edition - Ebook PDFPDF Download
100% (2)
Microsoft Excel Formulas and Functions (Office 2021 and Microsoft 365) 1st Edition - Ebook PDFPDF Download
35 pages
Airflow Notes
No ratings yet
Airflow Notes
10 pages
The Ultimate Guide To Apache Airflow DAGs
No ratings yet
The Ultimate Guide To Apache Airflow DAGs
135 pages
Apache Airflow
No ratings yet
Apache Airflow
24 pages
2.airflow 2
No ratings yet
2.airflow 2
17 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Airflow - Notes
No ratings yet
Airflow - Notes
82 pages
Dags Definitive Guide Mobile
No ratings yet
Dags Definitive Guide Mobile
176 pages
Airflow
No ratings yet
Airflow
97 pages
Apache Airflow 50
100% (1)
Apache Airflow 50
50 pages
CH1 Path D&R Agam
100% (1)
CH1 Path D&R Agam
34 pages
Intro To Apache Airflow
No ratings yet
Intro To Apache Airflow
14 pages
Abundance Meditation
75% (12)
Abundance Meditation
15 pages
Apache Airflow - A Python Hands-On Guide
No ratings yet
Apache Airflow - A Python Hands-On Guide
9 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Apache Airflow Documentation
No ratings yet
Apache Airflow Documentation
101 pages
GuideToApacheAirflow PDF
100% (1)
GuideToApacheAirflow PDF
6 pages
Apacheairflow 160827123852
No ratings yet
Apacheairflow 160827123852
25 pages
Airflow Documentation
No ratings yet
Airflow Documentation
3 pages
Study Guide For Apache Airflow Fundamentals Certification
No ratings yet
Study Guide For Apache Airflow Fundamentals Certification
6 pages
98 Exploring DAG Design Patterns in Apache Airflow
No ratings yet
98 Exploring DAG Design Patterns in Apache Airflow
32 pages
Aws Ques
No ratings yet
Aws Ques
62 pages
Airflow Web UI and CLI
No ratings yet
Airflow Web UI and CLI
51 pages
Airflow Interview Questions
No ratings yet
Airflow Interview Questions
4 pages
Group Work Project: Mscfe 660 Case Studies in Risk Management
100% (1)
Group Work Project: Mscfe 660 Case Studies in Risk Management
7 pages
Best Practices Apache Airflow
100% (1)
Best Practices Apache Airflow
28 pages
Interview Questions Apache Spark Kafka Airflow Druid
No ratings yet
Interview Questions Apache Spark Kafka Airflow Druid
4 pages
Running Airflow Reliably With Kubernetes
100% (1)
Running Airflow Reliably With Kubernetes
47 pages
Apache Airflow Certification - Study Guide For DAG Authoring
No ratings yet
Apache Airflow Certification - Study Guide For DAG Authoring
17 pages
Developing Elegant Workflows in Python Code With Apache Airflow
100% (1)
Developing Elegant Workflows in Python Code With Apache Airflow
35 pages
Palm 11
No ratings yet
Palm 11
8 pages
1) Housing Estates in The Baltic Countries, The Legady of Central Planning in Estonia, Latvia, Lithuania
No ratings yet
1) Housing Estates in The Baltic Countries, The Legady of Central Planning in Estonia, Latvia, Lithuania
383 pages
At Oz Mentors
No ratings yet
At Oz Mentors
2 pages
Tooooo
No ratings yet
Tooooo
92 pages
Week 6. Airflow Overview
No ratings yet
Week 6. Airflow Overview
71 pages
What Is Apache Airflow
No ratings yet
What Is Apache Airflow
22 pages
Apache Airflow For Data Science
No ratings yet
Apache Airflow For Data Science
23 pages
Etalab Talk Apache Airflow Embulk
No ratings yet
Etalab Talk Apache Airflow Embulk
29 pages
2 - Apache Airflow
No ratings yet
2 - Apache Airflow
5 pages
Airflow Best Practices
No ratings yet
Airflow Best Practices
34 pages
Scenario Based Airflow Interview Questions
No ratings yet
Scenario Based Airflow Interview Questions
4 pages
Extrema and Average Rates of Change+
No ratings yet
Extrema and Average Rates of Change+
63 pages
Airflow
No ratings yet
Airflow
7 pages
Airflow Notes
No ratings yet
Airflow Notes
5 pages
Airflow DAG - Best Practices: DAG As Configuration File
100% (1)
Airflow DAG - Best Practices: DAG As Configuration File
6 pages
DUO CONE SEALS-install, Caterpillar
No ratings yet
DUO CONE SEALS-install, Caterpillar
16 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
Section N Notes With Answers
No ratings yet
Section N Notes With Answers
4 pages
Airflow Dag Bash
No ratings yet
Airflow Dag Bash
6 pages
Airflowintroduction 190217155729
No ratings yet
Airflowintroduction 190217155729
21 pages
Day10 Airflow
No ratings yet
Day10 Airflow
5 pages
Notes Airflow MQTT
No ratings yet
Notes Airflow MQTT
6 pages
Airflow
No ratings yet
Airflow
3 pages
Overview - DAg Structure and Operators-1
No ratings yet
Overview - DAg Structure and Operators-1
6 pages
Apache Airflow For Data Engineering - The Ultimate Guide - by Vijay Gadhave - Mar, 2025 - Medium
No ratings yet
Apache Airflow For Data Engineering - The Ultimate Guide - by Vijay Gadhave - Mar, 2025 - Medium
18 pages
Sid Anand Qcon Ai 2018 v2 PDF
No ratings yet
Sid Anand Qcon Ai 2018 v2 PDF
35 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
Lecture Notes - Automating Machine Learning Workflows
No ratings yet
Lecture Notes - Automating Machine Learning Workflows
12 pages
Airflow
No ratings yet
Airflow
7 pages
ETL Pipeline, Class Notes
No ratings yet
ETL Pipeline, Class Notes
2 pages
Apache Airflow
50% (2)
Apache Airflow
8 pages
Appache Airflow
No ratings yet
Appache Airflow
5 pages
AIRFLOW
No ratings yet
AIRFLOW
4 pages
Data Engineering Assignment Report
No ratings yet
Data Engineering Assignment Report
9 pages
Apache Airflow Workflow
No ratings yet
Apache Airflow Workflow
4 pages
Arayan Raj Resume
No ratings yet
Arayan Raj Resume
2 pages
Solar Power and Solar Inverter Data
No ratings yet
Solar Power and Solar Inverter Data
6 pages
USA PCC Form Pages 2
No ratings yet
USA PCC Form Pages 2
1 page
WWP Snaplogic Steps
No ratings yet
WWP Snaplogic Steps
1 page
Ankit Pandey
No ratings yet
Ankit Pandey
9 pages
Dokumen - Tips Composite Failure 56290bd91c8e5
No ratings yet
Dokumen - Tips Composite Failure 56290bd91c8e5
28 pages
Red-Headed League (Pt.2)
No ratings yet
Red-Headed League (Pt.2)
10 pages
1 s2.0 S254252932200339X Main
No ratings yet
1 s2.0 S254252932200339X Main
33 pages
Generic Po Canduman
No ratings yet
Generic Po Canduman
3 pages
Ucl3612 Company Law I Tri 1, 2020/2021 Tutorial Topic 2: Promoters and Pre-Incorporation Contracts
No ratings yet
Ucl3612 Company Law I Tri 1, 2020/2021 Tutorial Topic 2: Promoters and Pre-Incorporation Contracts
7 pages
Recent Developments of Solar Energy in India: Perspectives, Strategies and Future Goals
No ratings yet
Recent Developments of Solar Energy in India: Perspectives, Strategies and Future Goals
22 pages
Prem Ashish CV Da
No ratings yet
Prem Ashish CV Da
2 pages
Challenges of Handicraft Industries
No ratings yet
Challenges of Handicraft Industries
11 pages
Conceptual Framework: E-Commerce Capabilities Organization Performance
No ratings yet
Conceptual Framework: E-Commerce Capabilities Organization Performance
4 pages
AM
No ratings yet
AM
4 pages
CV 2024-08-08 Vikash Kumar
No ratings yet
CV 2024-08-08 Vikash Kumar
1 page
Possession of Antiquities, Artefacts: The Legal Position: Chennai
No ratings yet
Possession of Antiquities, Artefacts: The Legal Position: Chennai
5 pages
Business Proposal: Enhancing IT Infrastructure and Integration For Simmons Medical Practice
No ratings yet
Business Proposal: Enhancing IT Infrastructure and Integration For Simmons Medical Practice
5 pages
Data Dictionary (SQL Server Database) : Filters - Dumptime
No ratings yet
Data Dictionary (SQL Server Database) : Filters - Dumptime
7 pages
Code Composer Studio
No ratings yet
Code Composer Studio
4 pages
Jurnal Referensi
No ratings yet
Jurnal Referensi
2 pages
PPC Case
No ratings yet
PPC Case
1 page
L-Gibs - Configurable Type Gibs
No ratings yet
L-Gibs - Configurable Type Gibs
1 page
On Line Audit 2
No ratings yet
On Line Audit 2
2 pages
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
JAVA: Java Programming for beginners teaching you basic to advanced JAVA programming skills!
From Everand
JAVA: Java Programming for beginners teaching you basic to advanced JAVA programming skills!
Adam Dodson
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
From Everand
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
5/5 (1)