0% found this document useful (0 votes)
152 views12 pages

Understanding DAG and Lazy Evaluation in Spark

The document discusses directed acyclic graphs (DAGs) and lazy evaluation in Spark. It explains that a DAG is like a to-do list that sets the order of tasks in Spark without loops or repetitions. Lazy evaluation allows Spark to plan tasks without immediately executing them, postponing actual work until necessary. Together, DAGs and lazy evaluation enable efficient resource utilization and performance in Spark's distributed processing of large datasets.

Uploaded by

prerna sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views12 pages

Understanding DAG and Lazy Evaluation in Spark

The document discusses directed acyclic graphs (DAGs) and lazy evaluation in Spark. It explains that a DAG is like a to-do list that sets the order of tasks in Spark without loops or repetitions. Lazy evaluation allows Spark to plan tasks without immediately executing them, postponing actual work until necessary. Together, DAGs and lazy evaluation enable efficient resource utilization and performance in Spark's distributed processing of large datasets.

Uploaded by

prerna sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Understanding DAG and

Lazy Evaluation in Spark


Let's simplify the concept of DAG and lazy evaluation in Spark for
data engineers and developers new to distributed computing.
Directed Acyclic Graph (DAG)
A Directed Acyclic Graph (DAG) is like a to-do list for Spark. Each task follows a specific order and depends on
the previous task. There are no loops or repetitions, so the tasks move forward without going back. This helps
Spark optimize data processing, making it faster and more efficient .

Directed Acyclic Optimization


Tasks follow a definite order, No loops or repetitions, tasks Helps Spark optimize data
each dependent on the previous proceed without going back. processing, making it faster and
one. efficient.
Lazy Evaluation Explained
Lazy Evaluation is like being smart and efficient with your work. Instead of doing
everything right away, Spark plans tasks in a logical order without executing them
immediately. It's like postponing the actual work until it's absolutely necessary .

Smart Work Deferred Execution

Spark plans tasks in a logical order without Actual work is postponed until it's absolutely
executing immediately. necessary.
Benefits of DAG and Lazy
Evaluation in Spark
DAG and lazy evaluation provide several advantages for data processing in
Spark. They enable efficient resource utilization, reduce unnecessary
computations, and improve overall performance. By optimizing the execution
flow, Spark can handle large-scale data processing tasks with ease.
Analogy: Road Trip Planning
Relating DAG to making a map, and lazy evaluation to not starting a car until
friends are ready to go.

Benefits of DAG and Lazy


Evaluation in Spark
DAG and lazy evaluation provide several advantages for data processing in
Spark. They enable efficient resource utilization, reduce unnecessary
computations, and improve overall performance. By optimizing the execution
flow, Spark can handle large-scale data processing tasks with ease.
Efficient Data Processing with DAG and
Lazy Evaluation
DAG and lazy evaluation work together to make data processing more efficient in Spark.

1 Organization
DAG organizes tasks and sets the precedence for computation steps.

2 Efficiency
Lazy evaluation ensures that work is executed efficiently when needed.
1. This is from where our CSV file was first read by the command “spark. read.format”.
2. Secondly, we specify whether we want to display our header and whether to use inferschema In our
code by specifying true and false option.
3. Then, we load our CSV file by specifying the path from uploading the file in upload section.
1. In this, “flight.data.repartition” is been counted to be as WIDE DEPENDENCY.
2. After importing various modules we will implement TRANSFORMATION in “flight_data.filter” and use
transfer process.
3. Here , flight_data is a dataframe.
1. We will again use WIDE DEPENDENCY in order to group the data and implement the action task .
DAG evaluation for read statement.
DAG evaluation for WIDE DEPENDENCY.

You might also like