0% found this document useful (0 votes)

29 views2 pages

Azure Fabric

The document discusses various options for performing data transformations on data stored in Azure Data Lake Storage Gen 2 - Synapse dedicated SQL pool/serverless SQL pool, Apache Spark in Azure Synapse Analytics, and Integration Runtime. It provides a series of audience polls asking which option should be used in different transformation scenarios. It then provides a decision matrix summary with recommendations for which option to use for different transformation requirements, such as initial data exploration, transforming data on-premises, handling different file formats, and dealing with JSON or badly formatted data.

Uploaded by

cloudtraining2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views2 pages

Azure Fabric

Uploaded by

cloudtraining2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

Activity 02: Data Engineering Discussion

Audience Poll 1
Q: You need to perform some data preparation on data stored in ADLS Gen 2. Which
option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 2
Q: You are performing initial exploration of the data and experimenting with the
necessary transformations. Which option should you use to run the transformations
(pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 3
Q: You want to process a subset of files in folder filled with CSV files, all
having the same schema. Which option should you use to run the transformations
(pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 4
Q: You need to transform the data on-premises or within a specific VNET before
loading it. Which option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 5
Q: You want to flatten hierarchical fields in JSON to a tabular structure. Which
option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 6
Q: You are handling file formats other than delimited (CSV), JSON or Parquet. Which
option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 7
Q: The delimited data is badly formatted. Which option should you use to run the
transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Decision Matrix Summary

Decision Point dedicated SQL pool / serverless SQL pool Apache Spark Pool
Integration Runtime Discussion Comment
Initial exploration of the data and experimenting with the necessary transformation
X Start with T-SQL, generally
Process a folder filled with CSV files of the same schema X Use T-
SQL OPENROWSET statement
Process a subset of files in folder filled with CSV files of the same schema X
Use T-SQL OPENROWSET statement with wildcards (*) in the path
Transform the data on-premises or within a specific VNET before loading it
X Use a Self-Hosted Integration Runtime on-premises
Transform the data in a code free way X Use an Azure Integration
Runtime
Need to flatten hierarchical fields in JSON to a tabular structure X
Use Azure Synapse SQL Pools or serverless along with the T-SQL OPENJSON,
JSON_VALUE, and JSON_QUERY statements
Need to unpack or flatten deeply nested JSON X Use Spark to deal
with very complex JSON
Handling file formats other than delimited (CSV), JSON or Parquet X
Use Spark to handle the broadest set of file formats (e.g., Orc, Avro,
others)
Handling ZIP archived data files X Use Spark to unzip the files
to storage before processing
Delimited data is badly formatted X Use Spark to handle
particularly poorly formatted files
You want to leverage open source libraries to help you with the data cleansing
X Use Python or Scala opens source libraries with Spark

Atc TutorialSSIS4
No ratings yet
Atc TutorialSSIS4
2,769 pages
DP-900 Exam Simulation All Questions
No ratings yet
DP-900 Exam Simulation All Questions
183 pages
Azure Synapse Analytics
100% (2)
Azure Synapse Analytics
7,794 pages
Les - 00. Intro DWH (Ban Quyen)
No ratings yet
Les - 00. Intro DWH (Ban Quyen)
43 pages
Azure Synapse
No ratings yet
Azure Synapse
229 pages
Azure Synapse Analytics Overview
No ratings yet
Azure Synapse Analytics Overview
251 pages
Azure Synapse
No ratings yet
Azure Synapse
609 pages
Azure Synapse Course Presentation
100% (1)
Azure Synapse Course Presentation
261 pages
Azure Data Fundamentals 1
No ratings yet
Azure Data Fundamentals 1
25 pages
A Retargetable C Compiler Design and Implementation
100% (1)
A Retargetable C Compiler Design and Implementation
578 pages
Azure Data Factory
No ratings yet
Azure Data Factory
21 pages
Synapse Project Deck
No ratings yet
Synapse Project Deck
196 pages
Dynamics365 ExportToDataLake To Synapse Link TransitionGuide
No ratings yet
Dynamics365 ExportToDataLake To Synapse Link TransitionGuide
45 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
DP 900t00a Enu Powerpoint 04
No ratings yet
DP 900t00a Enu Powerpoint 04
23 pages
Designing A Modern Data Warehouse in Azure
100% (1)
Designing A Modern Data Warehouse in Azure
25 pages
James Serra Azure Synapse Analytics Overview Big Data Conference Europe
No ratings yet
James Serra Azure Synapse Analytics Overview Big Data Conference Europe
72 pages
DP 203t00a Enu Powerpoint 02
No ratings yet
DP 203t00a Enu Powerpoint 02
24 pages
Elasticsearch for Hadoop
From Everand
Elasticsearch for Hadoop
Shukla Vishal
No ratings yet
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
29 pages
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Spark Pool Vs SQL Pool
No ratings yet
Spark Pool Vs SQL Pool
8 pages
Functional Python Programming
From Everand
Functional Python Programming
Steven Lott
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
DP 203t00a Enu Powerpoint 03
No ratings yet
DP 203t00a Enu Powerpoint 03
25 pages
DP 203T00A ENU PowerPoint - 01
No ratings yet
DP 203T00A ENU PowerPoint - 01
20 pages
DP-203 Exam - Free Actual Q&as, Page 3 - ExamTopics
No ratings yet
DP-203 Exam - Free Actual Q&as, Page 3 - ExamTopics
10 pages
Data Engineering 101 - Azure Synapse Analytics
No ratings yet
Data Engineering 101 - Azure Synapse Analytics
45 pages
Azure Synapse
No ratings yet
Azure Synapse
12 pages
How To Choose Right Tool For Data Analytics
No ratings yet
How To Choose Right Tool For Data Analytics
7 pages
Airflow
No ratings yet
Airflow
37 pages
Azure Data
No ratings yet
Azure Data
6 pages
Python For Data Engineering
0% (1)
Python For Data Engineering
1 page
Data Explorer: - A Data Profiling Tool
No ratings yet
Data Explorer: - A Data Profiling Tool
18 pages
Tools Abhishek
No ratings yet
Tools Abhishek
7 pages
Azure Data Engineer Learning Path
No ratings yet
Azure Data Engineer Learning Path
12 pages
Azure Analytics: Synapse
100% (4)
Azure Analytics: Synapse
251 pages
Module 6 - MEALY & MOORE MODEL
No ratings yet
Module 6 - MEALY & MOORE MODEL
64 pages
Modern Analytics Academy - Data Modeling
No ratings yet
Modern Analytics Academy - Data Modeling
12 pages
Technical Interview Experience - Azure Data Engineer
No ratings yet
Technical Interview Experience - Azure Data Engineer
7 pages
CS8711-Cloud Computing Lab Manual
No ratings yet
CS8711-Cloud Computing Lab Manual
95 pages
Azure Synpse
No ratings yet
Azure Synpse
4 pages
R5900056 07 ReferenceGuide
No ratings yet
R5900056 07 ReferenceGuide
104 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Unit 3 Interfacing Microprocessor
No ratings yet
Unit 3 Interfacing Microprocessor
45 pages
Dataeng-Zoomcamp - 5 - Batch - Processing - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
No ratings yet
Dataeng-Zoomcamp - 5 - Batch - Processing - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
41 pages
DP-203T00 Microsoft Azure Data Engineering-02
No ratings yet
DP-203T00 Microsoft Azure Data Engineering-02
23 pages
LP-VI Handwritten Writeups
No ratings yet
LP-VI Handwritten Writeups
9 pages
DP-203T00 Data Engineering On Microsoft Azure
No ratings yet
DP-203T00 Data Engineering On Microsoft Azure
12 pages
Informatica Lab
100% (2)
Informatica Lab
34 pages
Fantasy Cricket Game Using Python Report (20BCS1062)
100% (1)
Fantasy Cricket Game Using Python Report (20BCS1062)
35 pages
Class Fundamentals in Java
No ratings yet
Class Fundamentals in Java
5 pages
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Einstein Analytics 2 Min Guide
No ratings yet
Einstein Analytics 2 Min Guide
4 pages
Presentation PON
No ratings yet
Presentation PON
26 pages
Microsoft Azure Data Engineer DP 203
From Everand
Microsoft Azure Data Engineer DP 203
Manish Soni
No ratings yet
CS 101 Quiz#1 Solution 1
No ratings yet
CS 101 Quiz#1 Solution 1
17 pages
Data Quality and Preprocessing Concepts ETL
No ratings yet
Data Quality and Preprocessing Concepts ETL
64 pages
Developing and Managing A BI Semantic Model
No ratings yet
Developing and Managing A BI Semantic Model
12 pages
ADF Course Content
No ratings yet
ADF Course Content
11 pages
Data Engineering On Microsoft Azure (DP-203T00) H9P83S
No ratings yet
Data Engineering On Microsoft Azure (DP-203T00) H9P83S
5 pages
Microsoft Azure Database Administrator DP 300
From Everand
Microsoft Azure Database Administrator DP 300
Manish Soni
No ratings yet
Aperio Image Analysis: User's Guide
No ratings yet
Aperio Image Analysis: User's Guide
46 pages
DP-203 Agenda
No ratings yet
DP-203 Agenda
8 pages
Python Assignment Application Type
No ratings yet
Python Assignment Application Type
10 pages
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
From Everand
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
Brian Knight
No ratings yet
Introduction To Analytics On AWS
No ratings yet
Introduction To Analytics On AWS
34 pages
Oracle Essbase 9 Implementation Guide
From Everand
Oracle Essbase 9 Implementation Guide
Joseph Sydney Gomez
No ratings yet
Azure DW
No ratings yet
Azure DW
2 pages
Git Cheat Sheet Education
No ratings yet
Git Cheat Sheet Education
3 pages
Is Sorter Transformation Passive or Active ?: 1. When We Want To Get Single Return Value
No ratings yet
Is Sorter Transformation Passive or Active ?: 1. When We Want To Get Single Return Value
7 pages
Airflow Git CICD
No ratings yet
Airflow Git CICD
6 pages
Drop Box
No ratings yet
Drop Box
161 pages
3.1 Hardware Storage Devices Notes by EMK
No ratings yet
3.1 Hardware Storage Devices Notes by EMK
9 pages
Sophos Technician 3.0
No ratings yet
Sophos Technician 3.0
9 pages
Azure Data Factory
No ratings yet
Azure Data Factory
18 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Laboratory Activity 1
No ratings yet
Laboratory Activity 1
18 pages
Transformation Description Examples of When Transformation Would Be Used
No ratings yet
Transformation Description Examples of When Transformation Would Be Used
7 pages
Business Requirements Document (BRD) Template
100% (1)
Business Requirements Document (BRD) Template
8 pages
Logic Circuits Switching Theory
No ratings yet
Logic Circuits Switching Theory
2 pages
Report 74 B0 C0
No ratings yet
Report 74 B0 C0
51 pages
Лекция Ruby
No ratings yet
Лекция Ruby
15 pages
Azure Databricks
No ratings yet
Azure Databricks
5 pages
Azure Synapse Analytics PoC Environment
No ratings yet
Azure Synapse Analytics PoC Environment
8 pages
Use Case Requirements Validation Checklist
No ratings yet
Use Case Requirements Validation Checklist
2 pages
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
2 pages
DP203-Certification Preparation
No ratings yet
DP203-Certification Preparation
9 pages
CSACD301 Computer System Deployment
No ratings yet
CSACD301 Computer System Deployment
14 pages
Azure Functions
No ratings yet
Azure Functions
12 pages
Intro Slides
No ratings yet
Intro Slides
21 pages
Azure DEVOPS
No ratings yet
Azure DEVOPS
4 pages
Class VII Exercise
No ratings yet
Class VII Exercise
3 pages
UNIT-III 6ETC01 - Communication Network
No ratings yet
UNIT-III 6ETC01 - Communication Network
29 pages
Azure Networking
No ratings yet
Azure Networking
6 pages
Azure Compute
No ratings yet
Azure Compute
6 pages
Senior Data Engineer JD
No ratings yet
Senior Data Engineer JD
3 pages
User Manual
No ratings yet
User Manual
3 pages
Computer Networks: Hamayun Khan
No ratings yet
Computer Networks: Hamayun Khan
40 pages
Block Ciphers and AES: Harshan Jagadeesh Department of Electrical Engineering, IIT Delhi
No ratings yet
Block Ciphers and AES: Harshan Jagadeesh Department of Electrical Engineering, IIT Delhi
22 pages
Release Notes
No ratings yet
Release Notes
6 pages
Week 5: Application Layer - HTTP Protocol - : Revision
No ratings yet
Week 5: Application Layer - HTTP Protocol - : Revision
11 pages
Azure Data Platform
No ratings yet
Azure Data Platform
5 pages
NT UNIT - I Lecture 0
No ratings yet
NT UNIT - I Lecture 0
13 pages
HRMS-Payroll - Insert Data Load
No ratings yet
HRMS-Payroll - Insert Data Load
4 pages
CIS Apache Tomcat 9 Benchmark v1.1.0 5
No ratings yet
CIS Apache Tomcat 9 Benchmark v1.1.0 5
1 page
Aws Scripts Dataload
No ratings yet
Aws Scripts Dataload
2 pages

Azure Fabric

Uploaded by

Azure Fabric

Uploaded by

Activity 02: Data Engineering Discussion

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

A) Synapse dedicated SQL pool / serverless SQL pool

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

Decision Matrix Summary

You might also like