0% found this document useful (0 votes)

27 views35 pages

pREP dOC-Azure

Uploaded by

Sarthak Bapat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views35 pages

pREP dOC-Azure

Uploaded by

Sarthak Bapat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

DATA ARCHITECTURE

By Default – Blob storage is created – to create a datalake in advanced option click – enable hierarchical
namespace
Home, Author, Monitor, Manage, Learning center
Containers – Datalake

File Share – one stop solution to store all the files throughout the org

Quesues – can be json data - helps with streaming data, like messages

Tables – No SQL Databases – for semi structured data

Connections are called – Link Services’
Hit Debug to check the pipeline

Then publish all button

Up till now we have done is building static pipeline, as we have 8 different files one way is to copy the
same pipeline and duplicate it but we will be building a dynamic pipeline

1. Relative URL will be changing – base URL will be same

2. Folder will be different
3. File will be changing

We will create 3 different parameters.

Do not put the value as we will be using the for loop to pass the values

Do same with sink

We need to pass JSON file here in Items – to pass an array – order – relative url, sink folder, and sink file

Create a dictionary within an array with 3 parameters listed to pass in the loop in VS code
Uncheck this first row only box because it will else pass only one row in the loop.

Deactivate the other activities and run only Lookup activity – the below Value key is what we need to
pass in parameters
Cut the Copy Data Activity paste withing for each activity
Add value of parameter as

Same with Sink

Lauch Databricks
Service Level Application is the Key
App Registration
Save the IDs and create secret – used in databricks

Copy the value and secret ID of Secret

Now assigning a role to this application to access the datalake

Creating notebook in databricks

Heading
Replace few things in < >

APACHE SPARK
spark.read.format(‘csv) --- gives the format of the file

option(‘header”,True) --- gives that we have a header

option(“inferSchema”,True) --- By default when we save data in csv, it will read all column as text
columns, so we want spark to infer the schema i.e. decide the schema on your own

load(‘abfss://<container_name>@<storage_acc_name>.dfs.core.windows.net/<folder_name>’)

abfss --- Azure Blob File System Secure

df.display() --- to display data

PYSPARK – TRANSFORMATIONS

df.withColumn --- create a new column or modify the existing one

Now push the data to silver layer

There are 2 formats – parquet and delta but delta is built on parquet format

There are 4 modes – append(), overwrite(), error(), ignore()

Concat function
lit() function is used to define constants

1st way

2nd way – using concat_ws which is same as COMBINEVALUES in DAX

Split() function – this time transforming same column not creating a new one

Converting date format to timestamp format

Replacing any specific letter to another ----- regexp_replace() function

Multiplication

Aggregate the data using groupby() & agg() to get number of orders in one day

Azure Synapse
Unified Platform – Synapse analytics

Below is Apache Spark Pool – with same functionality as Databricks – both managing spark clusters
Data warehousing solution – create tables

Allowing synapse analytics to Access data in data lakes – no application (key) is required as both are
azure products by SYSTEM MANAGED IDENTITY or MANAGED IDENTITY
Select Members

Now next step go back to synapse analytics

Dedicated pool – traditional way of storing – where data actually resides in database (like MS SQL Server)
– it is traditional database but on cloud – optimized for query reads, for big data, data warehousing

Serverless – data lake & lake house concept – our data resides in datalake not in database (to save cost)

One step in between – assigning role to yourself

Using openrowset() – helps us to apply abstraction layer on data residing in datalake - returns the result
in tabular format

Change blob to dfs --- as by default blob is written

GOLD Layer – creating schema

Creating VIEWS for all the other tables and click publish all

Creating external tables --- there are 3 steps

1. Creating credentials – we tell synapse analytics to pick the data using managed identity
2. Create external data source
3. Create external file format
Step 1 – credentials

Step 2 – creating external datasource

Step 3 – creating external file format

CETAS – Create External Table As Select ---- directly using VIEWS

External tables saves the data but VIEW doesn’t

Connecting Synapse to Power BI using SQL Endpoints

Get Data --→ Azure Synapse Analytics

Ultimate Big Data Masters Program Curriculum v1
No ratings yet
Ultimate Big Data Masters Program Curriculum v1
14 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
(Exam) Data Engineering Certification Prep Guide - Partners
No ratings yet
(Exam) Data Engineering Certification Prep Guide - Partners
15 pages
Azure Data Engineering Complete Guide
No ratings yet
Azure Data Engineering Complete Guide
130 pages
Azure Databricks Documentation
No ratings yet
Azure Databricks Documentation
7,197 pages
05 - Strategies For Query Processing (Ch18)
No ratings yet
05 - Strategies For Query Processing (Ch18)
50 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
Apache Spark Programming With Databricks
No ratings yet
Apache Spark Programming With Databricks
112 pages
Azure Project
No ratings yet
Azure Project
13 pages
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
119 pages
DP 203T00A ENU AssessmentGuide
No ratings yet
DP 203T00A ENU AssessmentGuide
13 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
Spark Tutorial
No ratings yet
Spark Tutorial
77 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Data Engineering With Databricks Da
100% (3)
Data Engineering With Databricks Da
232 pages
Python and Pyspark With Databricks, With Azure Project
No ratings yet
Python and Pyspark With Databricks, With Azure Project
9 pages
DP 900t00a Enu Powerpoint 04
No ratings yet
DP 900t00a Enu Powerpoint 04
23 pages
Databricks Unity Catalog - Jan 2024
No ratings yet
Databricks Unity Catalog - Jan 2024
55 pages
Final Report
No ratings yet
Final Report
22 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Databricks Guide
No ratings yet
Databricks Guide
31 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
Databricks 2
No ratings yet
Databricks 2
22 pages
SDC - Synapse Analytics
No ratings yet
SDC - Synapse Analytics
23 pages
Data Engineering - Behind The Scene of Data by Hoda Ragaie
No ratings yet
Data Engineering - Behind The Scene of Data by Hoda Ragaie
44 pages
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
No ratings yet
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
34 pages
PDF 1733662736
No ratings yet
PDF 1733662736
17 pages
Azure Databricks Mastery
No ratings yet
Azure Databricks Mastery
95 pages
Data Engineering for Beginners
No ratings yet
Data Engineering for Beginners
129 pages
Information Technology (802) - Class 12 - Lesson 1 - Mysql Commands
100% (1)
Information Technology (802) - Class 12 - Lesson 1 - Mysql Commands
35 pages
Databricks Data Engineer Associate Notes
No ratings yet
Databricks Data Engineer Associate Notes
5 pages
Data Engineering QB 14 Aug v1.0
No ratings yet
Data Engineering QB 14 Aug v1.0
40 pages
Azure Synpse
No ratings yet
Azure Synpse
4 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Introduction To Databricks A Beginneers Guide
No ratings yet
Introduction To Databricks A Beginneers Guide
20 pages
Azure Data Factory Interview Questions & Answers_Claude
No ratings yet
Azure Data Factory Interview Questions & Answers_Claude
25 pages
Azure Synapse
No ratings yet
Azure Synapse
229 pages
Snowflake - Syllubus and DBT
No ratings yet
Snowflake - Syllubus and DBT
11 pages
MYC - Data Engineering Course Details
No ratings yet
MYC - Data Engineering Course Details
4 pages
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
No ratings yet
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
219 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
Python Record
No ratings yet
Python Record
35 pages
19-Databricks
No ratings yet
19-Databricks
28 pages
Azure Data Engineer Interview Questions _ Part 1
No ratings yet
Azure Data Engineer Interview Questions _ Part 1
19 pages
Azure Data Engineering Course Interview Questions 1751484980
No ratings yet
Azure Data Engineering Course Interview Questions 1751484980
20 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
DATA STORE OBJECT (DSO) by Mani
No ratings yet
DATA STORE OBJECT (DSO) by Mani
14 pages
Azure Data Engineer Road Map
No ratings yet
Azure Data Engineer Road Map
8 pages
Azure Etl 1741608374
No ratings yet
Azure Etl 1741608374
14 pages
DP 203t00a Enu Powerpoint 03
No ratings yet
DP 203t00a Enu Powerpoint 03
25 pages
Reading 1
No ratings yet
Reading 1
4 pages
DP 203t00a Enu Powerpoint 02
No ratings yet
DP 203t00a Enu Powerpoint 02
24 pages
SQL Lab Assignment
No ratings yet
SQL Lab Assignment
6 pages
Course Content
No ratings yet
Course Content
13 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Learn Advanced SQL
No ratings yet
Learn Advanced SQL
48 pages
Azure de and Fabric de Full Edited
No ratings yet
Azure de and Fabric de Full Edited
7 pages
ASUG84475 - Callidus Contract Lifecycle Management Migration To SAP HANA in Record Time
No ratings yet
ASUG84475 - Callidus Contract Lifecycle Management Migration To SAP HANA in Record Time
39 pages
D25L2507 - Final Draft - Analyst Roadmap To Databricks - From SQL To End-to-End BI - 1747454926520001ZrON
No ratings yet
D25L2507 - Final Draft - Analyst Roadmap To Databricks - From SQL To End-to-End BI - 1747454926520001ZrON
44 pages
Servicenow Devopement Scripting 1741405410
No ratings yet
Servicenow Devopement Scripting 1741405410
28 pages
DP 900 Day 4
No ratings yet
DP 900 Day 4
40 pages
4.28 T Y B Com Computer System & Applns
No ratings yet
4.28 T Y B Com Computer System & Applns
9 pages
EoDA_Open_QA
No ratings yet
EoDA_Open_QA
1 page
Pbl2 Merged
No ratings yet
Pbl2 Merged
26 pages
Suwasetha Private Channel Center Remake
No ratings yet
Suwasetha Private Channel Center Remake
12 pages
Cloud Data Engineering
No ratings yet
Cloud Data Engineering
2 pages
How To Create Your AWS Free Tier Account
No ratings yet
How To Create Your AWS Free Tier Account
9 pages
Data Analytics Engineering Roadmap
No ratings yet
Data Analytics Engineering Roadmap
2 pages
Poet Writer
No ratings yet
Poet Writer
21 pages
Updated SQL Practical Guide Book
No ratings yet
Updated SQL Practical Guide Book
62 pages
Pass Microsoft 70-461 Exam With 100% Guarantee: Querying Microsoft SQL Server 2012/2014
No ratings yet
Pass Microsoft 70-461 Exam With 100% Guarantee: Querying Microsoft SQL Server 2012/2014
8 pages
Database Lab Manual
No ratings yet
Database Lab Manual
28 pages
Module 3 - Basic Weblogic Administration
100% (1)
Module 3 - Basic Weblogic Administration
63 pages
Types of Decision Support Systems (DSS)
No ratings yet
Types of Decision Support Systems (DSS)
2 pages
Sheet 2
No ratings yet
Sheet 2
3 pages
Fresher Resume Pattern
No ratings yet
Fresher Resume Pattern
3 pages
Relational Algebra
No ratings yet
Relational Algebra
3 pages
Codigos Error SAP
No ratings yet
Codigos Error SAP
8 pages
Columnstore Indexes
No ratings yet
Columnstore Indexes
11 pages
Technical Questions With Answers - Data Management
No ratings yet
Technical Questions With Answers - Data Management
12 pages
Whats Is A Database
No ratings yet
Whats Is A Database
10 pages
Relational Schema Exercises Answer Key
No ratings yet
Relational Schema Exercises Answer Key
5 pages
Disk Quota
0% (1)
Disk Quota
19 pages
SSRS Scale-Out Deployment Configuration Error - Jonas Widriksson
No ratings yet
SSRS Scale-Out Deployment Configuration Error - Jonas Widriksson
8 pages
Recall From Chapter 2 That Kelly S Boutique Sells Books As
No ratings yet
Recall From Chapter 2 That Kelly S Boutique Sells Books As
1 page

pREP dOC-Azure

Uploaded by

pREP dOC-Azure

Uploaded by

DATA ARCHITECTURE

Tables – No SQL Databases – for semi structured data

Then publish all button

1. Relative URL will be changing – base URL will be same

We will create 3 different parameters.

Do same with sink

Same with Sink

Copy the value and secret ID of Secret

Now assigning a role to this application to access the datalake

option(‘header”,True) --- gives that we have a header

abfss --- Azure Blob File System Secure

df.display() --- to display data

df.withColumn --- create a new column or modify the existing one

Now push the data to silver layer

There are 4 modes – append(), overwrite(), error(), ignore()

2nd way – using concat_ws which is same as COMBINEVALUES in DAX

Converting date format to timestamp format

Replacing any specific letter to another ----- regexp_replace() function

Now next step go back to synapse analytics

One step in between – assigning role to yourself

Change blob to dfs --- as by default blob is written

Creating external tables --- there are 3 steps

Step 2 – creating external datasource

Step 3 – creating external file format

CETAS – Create External Table As Select ---- directly using VIEWS

Connecting Synapse to Power BI using SQL Endpoints

Get Data --→ Azure Synapse Analytics

You might also like