Understanding DAG and Lazy Evaluation in Spark
Understanding DAG and Lazy Evaluation in Spark
Spark plans tasks in a logical order without Actual work is postponed until it's absolutely
executing immediately. necessary.
Benefits of DAG and Lazy
Evaluation in Spark
DAG and lazy evaluation provide several advantages for data processing in
Spark. They enable efficient resource utilization, reduce unnecessary
computations, and improve overall performance. By optimizing the execution
flow, Spark can handle large-scale data processing tasks with ease.
Analogy: Road Trip Planning
Relating DAG to making a map, and lazy evaluation to not starting a car until
friends are ready to go.
1 Organization
DAG organizes tasks and sets the precedence for computation steps.
2 Efficiency
Lazy evaluation ensures that work is executed efficiently when needed.
1. This is from where our CSV file was first read by the command “spark. read.format”.
2. Secondly, we specify whether we want to display our header and whether to use inferschema In our
code by specifying true and false option.
3. Then, we load our CSV file by specifying the path from uploading the file in upload section.
1. In this, “flight.data.repartition” is been counted to be as WIDE DEPENDENCY.
2. After importing various modules we will implement TRANSFORMATION in “flight_data.filter” and use
transfer process.
3. Here , flight_data is a dataframe.
1. We will again use WIDE DEPENDENCY in order to group the data and implement the action task .
DAG evaluation for read statement.
DAG evaluation for WIDE DEPENDENCY.