Understanding DAG and Lazy Evaluation in Spark

The document discusses directed acyclic graphs (DAGs) and lazy evaluation in Spark. It explains that a DAG is like a to-do list that sets the order of tasks in Spark without loops or repetitions. Lazy evaluation allows Spark to plan tasks without immediately executing them, postponing actual work until necessary. Together, DAGs and lazy evaluation enable efficient resource utilization and performance in Spark's distributed processing of large datasets.

Uploaded by

prerna sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

153 views12 pages

Understanding DAG and Lazy Evaluation in Spark

Uploaded by

prerna sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Understanding DAG and

Lazy Evaluation in Spark

Let's simplify the concept of DAG and lazy evaluation in Spark for
data engineers and developers new to distributed computing.
Directed Acyclic Graph (DAG)
A Directed Acyclic Graph (DAG) is like a to-do list for Spark. Each task follows a specific order and depends on
the previous task. There are no loops or repetitions, so the tasks move forward without going back. This helps
Spark optimize data processing, making it faster and more efficient .

Directed Acyclic Optimization

Tasks follow a definite order, No loops or repetitions, tasks Helps Spark optimize data
each dependent on the previous proceed without going back. processing, making it faster and
one. efficient.
Lazy Evaluation Explained
Lazy Evaluation is like being smart and efficient with your work. Instead of doing
everything right away, Spark plans tasks in a logical order without executing them
immediately. It's like postponing the actual work until it's absolutely necessary .

Smart Work Deferred Execution

Spark plans tasks in a logical order without Actual work is postponed until it's absolutely
executing immediately. necessary.
Benefits of DAG and Lazy
Evaluation in Spark
DAG and lazy evaluation provide several advantages for data processing in
Spark. They enable efficient resource utilization, reduce unnecessary
computations, and improve overall performance. By optimizing the execution
flow, Spark can handle large-scale data processing tasks with ease.
Analogy: Road Trip Planning
Relating DAG to making a map, and lazy evaluation to not starting a car until
friends are ready to go.

Benefits of DAG and Lazy

Evaluation in Spark
DAG and lazy evaluation provide several advantages for data processing in
Spark. They enable efficient resource utilization, reduce unnecessary
computations, and improve overall performance. By optimizing the execution
flow, Spark can handle large-scale data processing tasks with ease.
Efficient Data Processing with DAG and
Lazy Evaluation
DAG and lazy evaluation work together to make data processing more efficient in Spark.

1 Organization
DAG organizes tasks and sets the precedence for computation steps.

2 Efficiency
Lazy evaluation ensures that work is executed efficiently when needed.
1. This is from where our CSV file was first read by the command “spark. read.format”.
2. Secondly, we specify whether we want to display our header and whether to use inferschema In our
code by specifying true and false option.
3. Then, we load our CSV file by specifying the path from uploading the file in upload section.
1. In this, “flight.data.repartition” is been counted to be as WIDE DEPENDENCY.
2. After importing various modules we will implement TRANSFORMATION in “flight_data.filter” and use
transfer process.
3. Here , flight_data is a dataframe.
1. We will again use WIDE DEPENDENCY in order to group the data and implement the action task .
DAG evaluation for read statement.
DAG evaluation for WIDE DEPENDENCY.

Preparing For Your Professional Data Engineer Journey T GCPPDE A m5 l6 File en 33
No ratings yet
Preparing For Your Professional Data Engineer Journey T GCPPDE A m5 l6 File en 33
33 pages
Bda Unit IV
No ratings yet
Bda Unit IV
97 pages
Importer and Exporter Product For Data Analysis Based On Extract, Transform, Load (ETL) and Regular Expression With Python Programming .Teway
No ratings yet
Importer and Exporter Product For Data Analysis Based On Extract, Transform, Load (ETL) and Regular Expression With Python Programming .Teway
26 pages
Spark
No ratings yet
Spark
49 pages
Class Notes of AI (Part B Only)
No ratings yet
Class Notes of AI (Part B Only)
32 pages
BDA Unit 4
No ratings yet
BDA Unit 4
19 pages
BCS515B
0% (1)
BCS515B
2 pages
Building A Data Science Portfolio
No ratings yet
Building A Data Science Portfolio
40 pages
+Apache+Spark
No ratings yet
+Apache+Spark
37 pages
Spark
No ratings yet
Spark
15 pages
Pyspark
100% (1)
Pyspark
48 pages
TCH February 24 Newsletter
No ratings yet
TCH February 24 Newsletter
9 pages
8 Apache Spark
No ratings yet
8 Apache Spark
25 pages
Apache Spark
No ratings yet
Apache Spark
31 pages
Unit 4 (Big Data Analytics)
No ratings yet
Unit 4 (Big Data Analytics)
28 pages
Top Questions For Data Engineering Interviews 1742072752
No ratings yet
Top Questions For Data Engineering Interviews 1742072752
72 pages
Spark - Lazy Evaluation
No ratings yet
Spark - Lazy Evaluation
3 pages
Ajur, Sabace DMC
100% (2)
Ajur, Sabace DMC
108 pages
Bigdata Interview Q&A
No ratings yet
Bigdata Interview Q&A
71 pages
Homework10 Mounika
No ratings yet
Homework10 Mounika
5 pages
Data Science Course Syllabus Brochure
No ratings yet
Data Science Course Syllabus Brochure
15 pages
Topological Sorting Topological Order: U - V U V
No ratings yet
Topological Sorting Topological Order: U - V U V
3 pages
Pyspart Iq
No ratings yet
Pyspart Iq
27 pages
Common Internal Difficulties
No ratings yet
Common Internal Difficulties
3 pages
The Essential Guide To DataOps
100% (1)
The Essential Guide To DataOps
16 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Apache Spark Essentials
No ratings yet
Apache Spark Essentials
12 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
0 - Job Description - Lead Data Scientist
No ratings yet
0 - Job Description - Lead Data Scientist
2 pages
Notes
No ratings yet
Notes
3 pages
Apache Spark
No ratings yet
Apache Spark
15 pages
Notes
No ratings yet
Notes
3 pages
Spark End To End QUESTIONS
No ratings yet
Spark End To End QUESTIONS
10 pages
Manage Your Data Science Project Structure in Early Stage
No ratings yet
Manage Your Data Science Project Structure in Early Stage
7 pages
Spark Class 1 PPT
No ratings yet
Spark Class 1 PPT
33 pages
Introduction To Big Data With PySpark - Spark RDDs With PySpark Cheatsheet - Codecademy
No ratings yet
Introduction To Big Data With PySpark - Spark RDDs With PySpark Cheatsheet - Codecademy
6 pages
BDA Lec7
No ratings yet
BDA Lec7
32 pages
Spark Databricks
No ratings yet
Spark Databricks
19 pages
CISD 42 Introduction To Spark - Spark Transformation - Spark Actions
No ratings yet
CISD 42 Introduction To Spark - Spark Transformation - Spark Actions
27 pages
Ionots Submission
No ratings yet
Ionots Submission
9 pages
Module 4
No ratings yet
Module 4
29 pages
Activity 3. Mind Map. Data Science Methodology
No ratings yet
Activity 3. Mind Map. Data Science Methodology
4 pages
Fdsa PPT - Unit 1
No ratings yet
Fdsa PPT - Unit 1
19 pages
EXPLORE Teaching Philosophy Train
No ratings yet
EXPLORE Teaching Philosophy Train
15 pages
Spark Tips 1716698498
No ratings yet
Spark Tips 1716698498
7 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Notes
No ratings yet
Notes
4 pages
1 - Job Description - Lead Data Scientist
No ratings yet
1 - Job Description - Lead Data Scientist
2 pages
High Level Optimization Methods in Spark 1672230272
No ratings yet
High Level Optimization Methods in Spark 1672230272
3 pages
Stm-Lecture Notes - 0 PDF
100% (1)
Stm-Lecture Notes - 0 PDF
120 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
How To Build A Data Science Portfolio
No ratings yet
How To Build A Data Science Portfolio
17 pages
Big Data Assignment
No ratings yet
Big Data Assignment
6 pages
Email Lead
No ratings yet
Email Lead
112 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Pyqt5 PDF
No ratings yet
Pyqt5 PDF
20 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
8 pages
Robust Pole Placement Using Linear Quadratic Regulator Weight Selection Algorithm
No ratings yet
Robust Pole Placement Using Linear Quadratic Regulator Weight Selection Algorithm
5 pages
Interview Question Spark Day1
No ratings yet
Interview Question Spark Day1
3 pages
Compiler Construction Using Flex and Bison - Aaby - Anthony A
100% (1)
Compiler Construction Using Flex and Bison - Aaby - Anthony A
102 pages
GreyAtom FSDSE Brochure PDF
No ratings yet
GreyAtom FSDSE Brochure PDF
25 pages
Spark 101
No ratings yet
Spark 101
25 pages
White Paper - DataOps Is NOT DevOps For Data
No ratings yet
White Paper - DataOps Is NOT DevOps For Data
15 pages
area of Rectangle and Circle Using Method Overloading
No ratings yet
area of Rectangle and Circle Using Method Overloading
10 pages
Apach Spark With Scala Slides
No ratings yet
Apach Spark With Scala Slides
187 pages
Tellabs Manual
No ratings yet
Tellabs Manual
55 pages
Upgrad Data Streak NOV19 PDF
No ratings yet
Upgrad Data Streak NOV19 PDF
32 pages
BR 070 SD Standard Sales Process in Brazil
No ratings yet
BR 070 SD Standard Sales Process in Brazil
25 pages
LSMW Step by Step
No ratings yet
LSMW Step by Step
5 pages
Data Science Resource Package!
No ratings yet
Data Science Resource Package!
14 pages
DCCTV PocketGuide v1 Final PDF
No ratings yet
DCCTV PocketGuide v1 Final PDF
80 pages
1873 Gov Combolistfresh
No ratings yet
1873 Gov Combolistfresh
33 pages
Binary Search Trees
No ratings yet
Binary Search Trees
71 pages
Rtos Unit III Notes
No ratings yet
Rtos Unit III Notes
24 pages
Tuesday Training - Workshops
No ratings yet
Tuesday Training - Workshops
20 pages
Jillian Oliver Online Resume
No ratings yet
Jillian Oliver Online Resume
1 page
CDC 3000 Series
No ratings yet
CDC 3000 Series
14 pages
Database Systems: Traditional File System
No ratings yet
Database Systems: Traditional File System
25 pages
OOPs Concepts - What Is Aggregation in Java
No ratings yet
OOPs Concepts - What Is Aggregation in Java
15 pages
C Program - Lab - Sami
No ratings yet
C Program - Lab - Sami
15 pages
Jtable: Open Computing Institute, Inc
No ratings yet
Jtable: Open Computing Institute, Inc
39 pages
Chomsky Normal Form: By: Aamina Bilqees Ramla Nigar Muniba Afzal Asma Ishtiaq Rimsha Basharat
No ratings yet
Chomsky Normal Form: By: Aamina Bilqees Ramla Nigar Muniba Afzal Asma Ishtiaq Rimsha Basharat
16 pages
Android Screen Orientation Change
No ratings yet
Android Screen Orientation Change
11 pages
Journal Europian Research Society
No ratings yet
Journal Europian Research Society
12 pages
Sophos Enterprise Console Quick Startup Guide: 5.2 Product Version: March 2015 Document Date
No ratings yet
Sophos Enterprise Console Quick Startup Guide: 5.2 Product Version: March 2015 Document Date
28 pages
Source Code To Edith
No ratings yet
Source Code To Edith
8 pages
INFO6030 - T3 Assignment 2 (Callaghan)
No ratings yet
INFO6030 - T3 Assignment 2 (Callaghan)
3 pages
POC ZOHO App Creater V 1.0.0
No ratings yet
POC ZOHO App Creater V 1.0.0
2 pages
Massivit 1800 Brochure May 2016
No ratings yet
Massivit 1800 Brochure May 2016
2 pages

Understanding DAG and Lazy Evaluation in Spark

Uploaded by

Understanding DAG and Lazy Evaluation in Spark

Uploaded by

Understanding DAG and

Lazy Evaluation in Spark

Directed Acyclic Optimization

Smart Work Deferred Execution

Benefits of DAG and Lazy

You might also like