100% found this document useful (4 votes)

1K views

Azure Data Engineer

This document provides an overview of key concepts and services for Azure Data Engineer role including: - Services like SQL Databases, Synapse Analytics, Data Lake Gen 2, Cosmos DB, and Data Factory are in scope. Key concepts include data distribution, partitioning, and loading in Synapse Analytics. - Pre-requisites include knowledge of Azure fundamentals, Event Hub, IOT Hub, Key Vault, and Azure AD. Skills around SQL Databases include deployment models, purchasing models, elastic pools, geo-replication, and security features. - Monitoring, optimization, security, and high availability concepts are also covered horizontally across services.

Uploaded by

parashar1505

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

1K views

Azure Data Engineer

Uploaded by

parashar1505

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Azure Data Engineer

Pre-requisite

• Azure fundamentals
• Good overview of Azure EventHub, IOT Hub, IOT edge
• Implement and Use Azure key Vault
• Very good knowledge on Azure AD
• At multiple places, azure documentation lacks links to
working code, in those cases write your own where
possible.
Skill Measured

• Services in Scope
• SQL Databases
• Azure Synapse Analytics
• Data Lake Gen2
• Cosmos DB
• Azure Databricks
• Azure Datafactory
• Stream Analytics
• Horizontals
• Azure monitor
• Diagnostics and Log Analytics
• Optimization
• Security
• High Availability
• Disaster Recovery
SQL
Databases
Implementation Models

Deployment Models

• Managed Instance
• High compatibility with SQL server
• Size upto 8TB
• Supports Private IP in VNET
• Supports BYOL
• SQL Databases – Single and Elastic Pools
• Low compatibility with SQL server
• Size upto 100TB
• Does not support Private IP
• SQL Virtual Machines
• Azure Doc – Feature comparison

Purchasing Models

• VCore
• DTU – Blended HW model
• Azure Doc – Purchasing model , Service Teirs
Azure SQL Databases

Elastic Pool

• Provides pool of resources shared by multiple single databases. Huge cost

benefits can be derived if peak loads are scattered
• How Scaling Works:-
• New compute instance is created
• Switching of connections to new instance
• <1min of disruption
• This page explains what are elastic pools, where to use them and exercise to
creaet and use them - Azure Doc - Elastic Pool overview
Azure SQL Databases

• Geo-Replication
• Replicates data to same or other region
• Supports read at secondary
• Supports Multiple replicas
• Requires connection string update
• Supports only SQL databases
• Azure Docs - overview

Business • Failover Groups

• Failover multiple databases simultaneously, use with pool

Continuity databases
• Supports both SQL databases and managed instances
• Does not support same region replication
• No need to change connection string
• Azure Docs – overview and failover group tutorials (5 tutorials
at the time of writing)
• Backup and Recovery
• Azure Docs - Overview and configure long term retention
policy
Azure SQL Databases

• Advanced data security

• Provides discovery and classification
• Provides Vulnerability protection and threat assessment
• Azure Docs - Overview
• Audit
• Audit policy can be defined at server level or database level. DB inherits
server level policy.
• Rule defined at both will cause duplicate event capture at DB
• Three policies are default at server :-
• BATCH_COMPLETION_GROUP

Data Security
• SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP
• FAILED_DATABASE_AUTHENTICATION_GROUP
• Azure Doc - Overview
• Firewalls and Virtual Networks
• Use firewall to restrict access to database from a single IP or IP ranges.
• These are at server level not DB level
• If using IP, make sure it is static IP.
• Azure Docs – Overview
• Private end point
• Create a private EP in VNET, then create network rule on server
• Private Endpoint must exist in the same region as Serve
• Azure Docs - Overview
Azure SQL Databases

• Private Link
• It provides private IP address to the configured azure service, in this
case it is Azure SQL
• Private End point requires network/firewall rule for SQL access, Private
Link does not.
• Azure Docs - Overview

• TDE
• Customer managed key, which uses Keyvault integration
• Azure Docs - Encryption with own key

Data Security
• Service managed key
• Azure Docs - Overview

• Always encrypted
• Provides encryption at rest when in database and also when it moves
between client and server
• Azure Docs – Always Encrypted

• Azure AD Authentication
• Permissions can be managed using external / AD Groups
• Link admin account to server
• Create contained users same as AAD accounts
• Azure Docs - Overview , Configure AAD
Azure SQL Databases

• Data masking functions

• Credit Card - Shows only last 4 digits - xxxx-xxxx-xxxx-1234
• Default – Fixed value 'X' for string, 0 for numbers, 01-01-1900
for date
• Email – [email protected]
• Random Number – Randomize based on From:To range of
numbers.

Data Masking • Custom Text – [Exposed Prefix][mask][Exposed suffix]

• To expose first 3 digits and last tow digits Ex. [3][XXXXX][2]

• Data Masking policy

• SQL users excluded from masking I.e they see unmasked data
• Admin users are always in exlusion list

• Masking rule maps DB columns to masking functions

• Azure Doc - Overview

• This is applicable to Synapse as well
Azure SQL Databases

Optimize /
Performance Tuning
• Azure Diagnostics: Type of telemetry available and how to export
these. Important ones are Basic, Automated Tuning and SQLInsights.
• Azure Docs - Overview
• Intelligent Insights – Azure Doc
• Automates Tuning
• Three actions available Create Index, Drop Index and Force Last
Good Plan.
• Drop Index is disabled by default
• Servers can inherit azure defaults and databases from server.
• Azure Doc - Overview and how to implement
Azure SQL Databases

Monitor

• There is overlap between this and optimize

• Azure Doc - Monitoring Overview
• Azure Doc - Azure monitor logs for Azure SQL
• Azure Doc - Monitor performance of pools
Azure
Synapse
Azure Synapse

Get Started

• What is Azure Synapse?

• Azure Doc - Overview
• Synapse Architecture
• Azure doc - Overview
• Implement Azure Synapse
• Azure Doc – Create and connect
Azure Synapse

Data Distribution
• Data is stored across compute nodes in 60 distributions (or less)
I.e if number of nodes is 60 , each nodes gets one distribution
and is expensive and if you have single node all distributions are
on that node (cheapest)
• Control node splits the query into 60 small queries to run on 60
distributions
• Choosing Distribution Column
• This column cannot be updated
• Must have many unique values, and distributes the data
evenly across 60 partitions. Partition skew can lead to
performance issues
• Use a column from Group By, not from where clause
• To optimize JOIN performance, join columns must be hash
distributed, use equal operator and must have same data type
• To change the distribution column, create new table as CTAS
with new distribution column and then collect fresh stats on the
table.
• Azure Doc – Table Distribution and replicated tables
Azure Synapse

Partitioning

• In Synapse, data is already distributed in 1-60

distributions. Partitioning further splits the data into
partitions.
• Utilize partition switching to remove old data rather Distributions 1..60 Part1
than using delete
Part2
• Partition column should be part of filter clause to
improve query performance PartN

• This is different from distribution, where column is

part of group by / aggregate clause
• Azure Doc - Table Partitioning
Azure Synapse

Data Loading

• Data Loading Best practices

• Load to staging table, which is defined as heap or round-robin
• Columnstore index requires large resources, so user should be member or medium
or large resource class
• Each rowgroup compresses 1M rows, anything less than 100K is sent to delta store
and is inefficient.
• Azure Doc – Data load guidance
• Example exercises
• Using polybase external table to load data from Data lake to Synapse
• Load with optimization
• Using COPY command to laod data
Azure Synapse

Security

• Following concepts apply same as from SQL databases:-

• AD Authentication
• Network Security
• Advanced Data Security & Auditing
• Dynamic Data Masking and TDE
• Column level security – Azure Doc
• Row level security - Azure Doc
Azure Synapse

Optimize

• Hash distribute large tables

• Columnstore index is only suitable if each
partition/distribution gets 1M rows I.e 100 partitions means
60(distributions)*100(partitions)*1M = 6Billion rows table !!!
• Best practices - Azure Doc
Azure Synapse

Monitor, Backup & DR

• Monitor and Logs – Azure Doc

• Monitor using dynamic management views – Azure Doc
• Scaling Compute – Azure doc
• Scale compute with azure functions – Azure Doc
• Backup and Recovery – Azure Doc
• Restore from GEo backup – Azure Doc
• Tuning Recommendation – Azure doc
Azure
CosmosDB
What is Cosmos DB?
• Globally distributed multi-model database. Cosmos DB guarantees single-digit-millisecond
latencies at the 99th percentile anywhere in the world, offers multiple well-defined
consistency models to fine-tune performance, and guarantees high availability with multi-
homing capabilities.
• Azure Cosmos DB is schema-agnostic. It automatically indexes all the data without requiring
you to deal with schema and index management. It's also multi-model, natively supporting
document, key-value, graph, and column-family data models.
• Azure Doc - Introduction, Implement
Azure CosmosDB

Data Distribution

• Database can be multi-regional

• Azure Doc – Data distribution overview
• Consists of containers , which are partitioned by Key
• Replicaset is with data center / partition set is across multiple DC or regions and is
composed of multiple RSs.
• Azure Doc – Data Distribution in detail
• Change from single master to multiple master is without disruption
• Azure Doc – Configure multiple regions, Using multi-master in application
• From the portal -> Replicate Data Globally -> enable multiple regions for read/write. Multi
region write needs to be enabled separately
• Automatic failover can be set to multiple regions with priority set for each 1..N
Azure CosmosDB

Partitions
• Physical partitions hold one or more logical partitions
• Logical partitions are based on partition keys ex. Userid
• In addition to partition keys, each item has index ID.
These two put together are item index
• Each physical partition provides 10000 rps throughput
and 50GB
• Hot partition issue can happen, If load is not distributed
evenly across partition key
• Partition key should have high cardinality
• For select heavy container, choose a key that appears in
filters
• Azure Doc - Overview
Azure CosmosDB

Implement CosmosDB with Geo Distribution

• How to scale throughput globally – Azure Doc

• To configure multi-master / multi –region in application:-
• Use PreferredLocations in connectionpolicy to specify list of
regions in preferential order – Azure Doc
• Use UseMultipleWriteLocations with setCurrentLocation to handle
write operations to multiple regions dynamically – Azure Doc
Azure CosmosDB

Consistency Levels

Strong The reads are guaranteed to return the most recent committed version of an item. A client never sees an
uncommitted or partial write. Users are always guaranteed to read the latest committed write.

The reads might lag behind writes by at most "K" versions (that is, "updates") of an item or by "T" time interval
Bounded
Staleness Provides strong consistency for single master , single region clients

Within a single client session reads are guaranteed to honor the consistent-prefix, monotonic reads, monotonic writes,
Session read-your-writes, and write-follows-reads guarantees.
Clients outside the session perform either with consistent prefix or eventual

Consistent Consistent prefix consistency level guarantees that read never see out-of-order writes.

Prefix
There's no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge.
Eventual
Eventual consistency is the weakest form of consistency because a client may read the values that are older than the
ones it had read before.

Azure Doc - Overview, Choose Consistency Level

Azure CosmosDB

Consistency Levels and Latency

• Read and Write latency is always commited < 10ms

• Multi-regions strong consistency is exception to this rule.
Azure CosmosDB

Selecting CosmosDB API

• Use CoreSQL for all cases except:-

• Teams are using exisiting mongo, cassandra, table or graph APIs
• There is a requirement to capture relationship among data, in this
case us Graph
• In addition, cassandra is best fit for fixed schema use cases and
mongo for flexible schema
• Azure Learn – Select cosmos db api
Azure CosmosDB

Monitor CosmosDB

• Azure monitor – Azure Doc

• Monitor Server side latency – Azure Doc
• If high latency is seen for certain operations, then use diagnostic logs for checking the size of
data returned

• Monitor Request Units – Azure Doc

• Diagnostic Logs – Azure Doc
• Using control plane logs – Azure Doc
Azure CosmosDB

Throughput

• Throughput is measured in RUs, and allocated in batch of 100 per

sec.
• Read operation – 1KB data - is 1RU, write is 2RU
• If certain logical partition consumes more RUs than allocated to
physical partition it is on, rate throttling will happen.
• Can be provisioned at both DB and container level
• It is distributed evenly among objects ie. DB -> Container -> Physical partitions
• Azure Doc - Introduction, autoscale throughput, autoscale vs manual
Azure CosmosDB

Encryption using Key Vault

• Encryption can be either service managed or customer maanged

• For customer managed, first register the DocumentDB service, then create
access policy in key vault for Cosmos DB to get/warp/unwrap key permissions.
• Finally create cosmosdb account with key URI in encryption settings. This is
at account level not DB level.
• Using CMK – Azure Doc
Azure CosmosDB

Data Access

• Access can be via AD IAM permissions or via Keys and Resource Tokens.
• Account Management activities like master key rotation,global replication etc are
available via AD only
• Keys and resource token allow control of data operations which AD does not
• Restrict user access to data operations - Azure Doc
• Master keys (primary & secondary) can be regenerated and rotated
• Move secondary to primary and then generate new secondary key. Ensure all
applications are using secondary key to connect
• Resource tokens can be generated via mid-teir for end devices like mobile
Azure CosmosDB

Secure Data Access

• IAM
• Cosmos DB operator cannot read data, but can admin account, db
and containers. Neither can he access the keys
• Cosmos DB admin changes can be locked down to prevent changes
from key based access - "disableKeyBasedMetadataWriteAccess"
• IP address whitelisting
• Access to cosmosdb can be limited to specific IP address or IP CIDR block
• By using service endpoint access can be limited to certain subnet in VNET
• This is similar to how this works for SQL Databases
Azure CosmosDB

Reference Architecture

• CosmosDB
• CosmosDB with IOT
Azure Data
Lake
Azure DataLake

Architecture

• Based on blob storage, supports hadoop filesystem (built on Yarn)

• Hierarchical storage with file as unit of storage
• Both Blob and ADLS apis supported
• Supports access teirs and lifecycle policies – Hot, Cold(30
days),archive(180 days)
• Supports Diagnostics and events
• Is supported by data factory, databricks, eventhub, logic apps,ML,
stream analytics HDInsight, Azure Data Explorer
Azure DataLake

Unsupported Features in Data Lake

• Custom domains not supported

• Logging to Azure monitor not supported
• Does not support snapshots
Azure DataLake

Data Access

• RBAC
• Shared Key and Shared Access Signature
• ACL on file and directories

• RBAC vs ACL
• RBAC is resolved first and takes precedence. If access is approved based on RBAC then no ACL check is
performed.
• RBAC does not provide file / directry level access control

• Shared Key vs SAS

• Shared key allows super user access, while SAS tokens have granular permissions along with duration
permissions attached to it
Azure DataLake

• Can use Microsoft managed or customer

managed or customer provided
• Customer managed keys are in key vault in the
same region as storage account

Blob Storage • Customer provided keys are passed along with

requests and are managed by customers either
Encryption on Az-Keyvault or any other vault

and network • Secure data transfer settings can be changed

(default is enabled) from within configuration
access • Use storage firewalls, network whitelists and
private endpoints to secure your storage
access
Azure DataLake

• LRS data is replicated 3 times in single

DC/Zone, synchronously
• ZRS data is replicated 3 times in 3 zones with
regions, synchronously
• GRS data is replicates to secondary regions in
addition to LRS. Data is not available for read
Data and is copied asynchronously

Redundancy • GZRS data is replicated to secondary region in

addition to ZRS. Data is not available for read
Options and is copied asynchronously
• RA- GRS or RA-GZRS – use this option to make
data available for reads in secondary regions
• Azure Doc – Data Redundancy Options
Azure DataLake

• On regional failover, GRS is changed to LRS

Blob Storage – • Use Last-sync-time to identify data lost

Disaster • You can change LRS again to GRS after failover

Recovery and • Manual failover options is under geo-

replication in storage account
High • This process updates the DNS entry

Availability • Azure Doc – Disaster Recovery and Failover

• Azure Doc – Design Application for HA
Azure DataFactory
Azure DataFactory

ADF Excercies

• Best way to go through ADF is to do hands-on, below are the links which cover
required range of topics:-
• ADF Overview
• ADF Create / Implement
• ADF Using CMK
• ADF – COPY Data
• ADF – Mapping Data Flows
• ADF – Use Key Vault secrets
• ADF – e2e LAB
Azure DataFactory

ADF – Integration Runtimes

• IR Overview and when to use which one

• IR – Create Azure IR

• IR – Create Self hosted IR

Azure DataFactory

ADF – Triggers

• There are 3 types of triggers in ADF

• Event based
• Only `supports Data lake Gen2. This means we can use it to orchestrate
batch but not stream processing.
• Scheduled
• Tumbling window
• It is recurring event, but alows backfill runs and concurency controls – Read
here
• Azure Doc – Event based , Scheduled , Tumbling window
Azure Databricks
Azure Databricks

Azure Databricks on Microsoft Learn

Microsoft Learn – Databricks Overview

• Covers overview, create workspace, ceate notebook and attach to
spark cluster
Microsoft Learn – Stream processing with Databricks
• Connect to Eventhub and process streaming data
Microsoft Learn – Security with KeyVault
Azure Databricks

Access Control

• Access Control in Databricks – Azure Doc

• All 6 modules in this chapter cover access control for various parts
of databricks.
• Access is at two levels, first admin has to enable access controls,
then these are used by relevant users to grant permissions. Enable
Access Control
Azure Databricks

When to use what compute

• Azure Doc – Databricks Runtime

• Read through all 4 sub-topics
Azure Stream Analytics
Azure Databricks

Overview and Implement

• Microsoft Learn - Overview and Implement azure stream

analytics
• Azure Doc – when to use ?
• For real-time alerts and dashboards, IOT Edge
• Input for stream analytics are:-
• Event Hub
• IOT hub
• Blob storage
Azure Databricks

Stream Analytics Solutions

• Build with IOT Edge - Tutorial

• Cloud part is responsible for job definition – input, output, query
• IOT edge pushes the job to device
• Stream analytics on Edge runs the job

• Process data from EventHub

• Azure Doc – Process data from eventhub. For this exercise write your own
code to send data to EventHub which SA can then process. Simple
number streamer would do.
Solutions
Solution Exercises

• Azure Doc : Azure Datalake -> Databricks -> Synapse

• Azure Doc - Using ADF to transform CosmosDB Data
• Azure Doc – Processing events with Databricks

Azure Data Engineer Interview Questions and Answers
No ratings yet
Azure Data Engineer Interview Questions and Answers
7 pages
Azure Databricks
67% (6)
Azure Databricks
69 pages
Introduction To Cphims
No ratings yet
Introduction To Cphims
32 pages
Databricks Question 1668314325
No ratings yet
Databricks Question 1668314325
104 pages
Azure Data Factory
100% (2)
Azure Data Factory
10 pages
Azure Data Factory
77% (13)
Azure Data Factory
52 pages
PYSPARK Interview Questions
100% (3)
PYSPARK Interview Questions
126 pages
Azure Databricks Interview
100% (2)
Azure Databricks Interview
35 pages
Azure Data Factory Interview Questions
0% (1)
Azure Data Factory Interview Questions
14 pages
DP 203
100% (1)
DP 203
87 pages
Microsoft Azure Fundamentals Exam Cram: Second Edition
From Everand
Microsoft Azure Fundamentals Exam Cram: Second Edition
IP Specialist
5/5 (1)
Azure Data Factory
100% (4)
Azure Data Factory
16 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
DP-203T00 Microsoft Azure Data Engineering-02
No ratings yet
DP-203T00 Microsoft Azure Data Engineering-02
23 pages
DP 203 Microsoft Azure Data Engineer Associate Exam Study Guide PDF
No ratings yet
DP 203 Microsoft Azure Data Engineer Associate Exam Study Guide PDF
23 pages
Azure Synapse Analytics
100% (1)
Azure Synapse Analytics
7,794 pages
Azure Databricks Documentation
No ratings yet
Azure Databricks Documentation
32 pages
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
Start To Finish With Azure Data Factory
100% (2)
Start To Finish With Azure Data Factory
30 pages
Azure Synapse
No ratings yet
Azure Synapse
609 pages
Azure Fundaments - MyNotes
100% (5)
Azure Fundaments - MyNotes
32 pages
DP-203T00 Microsoft Azure Data Engineering-03
No ratings yet
DP-203T00 Microsoft Azure Data Engineering-03
21 pages
Online Clinic Booking System: Database Design
100% (2)
Online Clinic Booking System: Database Design
38 pages
DP-900 Dump
67% (6)
DP-900 Dump
64 pages
Dp203 Notes
No ratings yet
Dp203 Notes
87 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
35 pages
Azure Data Solutions
No ratings yet
Azure Data Solutions
7 pages
Azure Data Platform Overview
100% (2)
Azure Data Platform Overview
57 pages
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
Databricks Lab 1
100% (3)
Databricks Lab 1
7 pages
Data Factory
No ratings yet
Data Factory
1,158 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Azure DP 203
100% (1)
Azure DP 203
57 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
Azure Data Fundamental
No ratings yet
Azure Data Fundamental
81 pages
Azure Data Lake and U-SQL
No ratings yet
Azure Data Lake and U-SQL
51 pages
Azure Data Factory - A Complete Introduction
No ratings yet
Azure Data Factory - A Complete Introduction
72 pages
Data Factory
100% (2)
Data Factory
26 pages
Azure Data Engineer Learning Path (July 2019)
No ratings yet
Azure Data Engineer Learning Path (July 2019)
1 page
Notes of Azure Data Bricks
No ratings yet
Notes of Azure Data Bricks
16 pages
ELT Architecture in The Azure Cloud
No ratings yet
ELT Architecture in The Azure Cloud
8 pages
Azure Data Factory Interview Questions
100% (1)
Azure Data Factory Interview Questions
33 pages
Azure DATA Fatcory
No ratings yet
Azure DATA Fatcory
2,982 pages
Databricks Dbutils
100% (1)
Databricks Dbutils
34 pages
AZURE DATA FACTORY Content
No ratings yet
AZURE DATA FACTORY Content
5 pages
Microsoft Certified Azure Data Engineer Associate Skills Measured
No ratings yet
Microsoft Certified Azure Data Engineer Associate Skills Measured
5 pages
Azure Data Fundamentals
No ratings yet
Azure Data Fundamentals
210 pages
PASS Azure Data Engineering Bootcamp
No ratings yet
PASS Azure Data Engineering Bootcamp
35 pages
Azure Analytics: Synapse
100% (4)
Azure Analytics: Synapse
251 pages
Azure Databricks Overview
100% (1)
Azure Databricks Overview
4 pages
Spark SQL
No ratings yet
Spark SQL
24 pages
Ssis Interview Questions
No ratings yet
Ssis Interview Questions
114 pages
Databricks
No ratings yet
Databricks
43 pages
Course Presentation DP 900 AzureDataFundamentals
100% (1)
Course Presentation DP 900 AzureDataFundamentals
142 pages
Data Lake Storage
100% (1)
Data Lake Storage
237 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Lab 7 - Orchestrating Data Movement With Azure Data Factory
No ratings yet
Lab 7 - Orchestrating Data Movement With Azure Data Factory
26 pages
Commonly Asked Snowflake
No ratings yet
Commonly Asked Snowflake
26 pages
Architecting A Data Lake
100% (7)
Architecting A Data Lake
60 pages
Azure Cosmos DB Workshop
100% (1)
Azure Cosmos DB Workshop
147 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
FM Show Report PDF
No ratings yet
FM Show Report PDF
2 pages
Delta Lake: High-Performance ACID Table Storage Over Cloud Object Stores
No ratings yet
Delta Lake: High-Performance ACID Table Storage Over Cloud Object Stores
14 pages
v8 Internals
No ratings yet
v8 Internals
108 pages
1Z0-997 - OCI Architect Professional Study Guide PDF
0% (1)
1Z0-997 - OCI Architect Professional Study Guide PDF
6 pages
1Z0-997 - OCI Architect Professional Study Guide PDF
0% (1)
1Z0-997 - OCI Architect Professional Study Guide PDF
6 pages
Mit Sap Big Data and Social Analytics Online Short Course Brochure
No ratings yet
Mit Sap Big Data and Social Analytics Online Short Course Brochure
9 pages
H2o Training Day
No ratings yet
H2o Training Day
180 pages
Foreword: Movie.)
No ratings yet
Foreword: Movie.)
1 page
Google Location History
No ratings yet
Google Location History
1 page
Festal® N: For The Use Only of A Registered Medical Practitioner or A Hospital or A Laboratory
No ratings yet
Festal® N: For The Use Only of A Registered Medical Practitioner or A Hospital or A Laboratory
1 page
80DMR
No ratings yet
80DMR
194 pages
Maven
No ratings yet
Maven
4 pages
For New Students Admitted in AY2013/2014 For Existing Students Admitted in AY2012/2013 and Before
No ratings yet
For New Students Admitted in AY2013/2014 For Existing Students Admitted in AY2012/2013 and Before
3 pages
Hive and Presto For Big Data
No ratings yet
Hive and Presto For Big Data
31 pages
DATA MODELING Notes
No ratings yet
DATA MODELING Notes
8 pages
4 Re Engineering Supply Chain
No ratings yet
4 Re Engineering Supply Chain
34 pages
Getting Started With Distributed SQ
No ratings yet
Getting Started With Distributed SQ
7 pages
04 Extract Transform Load
No ratings yet
04 Extract Transform Load
19 pages
Intelligent Data and Analytics Fabric
No ratings yet
Intelligent Data and Analytics Fabric
18 pages
Snowpro™ Advanced: Data Engineer: Exam Study Guide
No ratings yet
Snowpro™ Advanced: Data Engineer: Exam Study Guide
16 pages
Test Blanc
No ratings yet
Test Blanc
23 pages
Relational Databases and SQL: Accounting Information Systems, 9e
No ratings yet
Relational Databases and SQL: Accounting Information Systems, 9e
46 pages
Software Testing PPT 1
No ratings yet
Software Testing PPT 1
24 pages
Practice MCQ
100% (2)
Practice MCQ
3 pages
Serializability
No ratings yet
Serializability
26 pages
Oracle Normalization Simplified
100% (2)
Oracle Normalization Simplified
14 pages
IT 354 Database Management System
No ratings yet
IT 354 Database Management System
12 pages
PDF Metadata - Document Capture - Recherche Google
No ratings yet
PDF Metadata - Document Capture - Recherche Google
4 pages
1 PB
No ratings yet
1 PB
7 pages
Case Analyzer
No ratings yet
Case Analyzer
19 pages
DWH Concepts
No ratings yet
DWH Concepts
59 pages
Course Code Software Engineering LTPJ CSE3001 2 0 2 4 4 Pre-Requisite - Syllabus Version
No ratings yet
Course Code Software Engineering LTPJ CSE3001 2 0 2 4 4 Pre-Requisite - Syllabus Version
2 pages
ITIL Foundation Examination Sample A v5.1
No ratings yet
ITIL Foundation Examination Sample A v5.1
11 pages
PDF Formulir Perencanaan Pasien Pulang - Compress PDF
No ratings yet
PDF Formulir Perencanaan Pasien Pulang - Compress PDF
1 page
Metode Data Mining: Jhon Dee / General Manager
No ratings yet
Metode Data Mining: Jhon Dee / General Manager
24 pages
Practical Implementation of ISO 27001 / 27002: Security in Organizations 2011 Eric Verheul
No ratings yet
Practical Implementation of ISO 27001 / 27002: Security in Organizations 2011 Eric Verheul
84 pages
Business Process Modelling Notation - C2
No ratings yet
Business Process Modelling Notation - C2
3 pages
Test SQL
No ratings yet
Test SQL
3 pages
Data Governance Book
No ratings yet
Data Governance Book
11 pages
Commission On Elections: Republic of The Philippines
No ratings yet
Commission On Elections: Republic of The Philippines
676 pages
Normalization With Examples
No ratings yet
Normalization With Examples
21 pages
Informatica PowerCenter Data Validation Option
No ratings yet
Informatica PowerCenter Data Validation Option
3 pages

Azure Data Engineer

Uploaded by

Azure Data Engineer

Uploaded by

Azure Data Engineer

• Provides pool of resources shared by multiple single databases. Huge cost

Business • Failover Groups

• Advanced data security

• Data masking functions

Data Masking • Custom Text – [Exposed Prefix][mask][Exposed suffix]

• Data Masking policy

• Masking rule maps DB columns to masking functions

• Azure Doc - Overview

• There is overlap between this and optimize

• What is Azure Synapse?

• In Synapse, data is already distributed in 1-60

• This is different from distribution, where column is

• Data Loading Best practices

• Following concepts apply same as from SQL databases:-

• Hash distribute large tables

Monitor, Backup & DR

• Monitor and Logs – Azure Doc

• Database can be multi-regional

Implement CosmosDB with Geo Distribution

• How to scale throughput globally – Azure Doc

Azure Doc - Overview, Choose Consistency Level

Consistency Levels and Latency

• Read and Write latency is always commited < 10ms

Selecting CosmosDB API

• Use CoreSQL for all cases except:-

• Azure monitor – Azure Doc

• Monitor Request Units – Azure Doc

• Throughput is measured in RUs, and allocated in batch of 100 per

Encryption using Key Vault

• Encryption can be either service managed or customer maanged

Secure Data Access

• Based on blob storage, supports hadoop filesystem (built on Yarn)

Unsupported Features in Data Lake

• Custom domains not supported

• Shared Key vs SAS

• Can use Microsoft managed or customer

Blob Storage • Customer provided keys are passed along with

and network • Secure data transfer settings can be changed

• LRS data is replicated 3 times in single

Redundancy • GZRS data is replicated to secondary region in

• On regional failover, GRS is changed to LRS

Disaster • You can change LRS again to GRS after failover

Recovery and • Manual failover options is under geo-

Availability • Azure Doc – Disaster Recovery and Failover

ADF – Integration Runtimes

• IR Overview and when to use which one

• IR – Create Self hosted IR

• There are 3 types of triggers in ADF

Azure Databricks on Microsoft Learn

Microsoft Learn – Databricks Overview

• Access Control in Databricks – Azure Doc

When to use what compute

• Azure Doc – Databricks Runtime

Overview and Implement

• Microsoft Learn - Overview and Implement azure stream

Stream Analytics Solutions

• Build with IOT Edge - Tutorial

• Process data from EventHub

• Azure Doc : Azure Datalake -> Databricks -> Synapse

You might also like