0% found this document useful (0 votes)

171 views

Azure - Implementation Notes

Azure Storage provides cloud storage services that are highly available, secure, durable, scalable and redundant. To create an Azure Storage account, you need to specify a subscription, resource group, storage account name, region and performance/redundancy options. Designing an effective partition strategy is important and involves choosing a partition key, scheme (range or hash), defining the strategy and distributing data accordingly.

Uploaded by

harinivedhavalli

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views

Azure - Implementation Notes

Uploaded by

harinivedhavalli

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Azure Data Engineer

Azure Storage:

Azure Storage is a Microsoft-managed service providing cloud storage that is highly available, secure,
durable, scalable, and redundant. Azure Storage includes Azure Blobs (objects), Azure Data Lake Storage
Gen2, Azure Files, Azure Queues, and Azure Tables

Azure account refers to the Azure Billing account---> mapped to the email id that you used
to sign up for Azure--->An account can contain multiple subscriptions; each of these
subscriptions can have multiple resource groups and the resource groups, in turn, can have
multiple resources.
---> billing is done at the level of subscriptions

To Create an Azure Storage Account:

Basics:
1.) Subscription (there is no limit to the number of storage accounts you can create per subscription
in Azure)
2.) Resource group (A Resource group is a container that holds related resources for an Azure
solution)
3.) Storage account name (Globally Unique)
4.) Region (Proximity to Users, Compliance Requirements, Redundancy and Disaster Recovery,
Pricing, Service Availability based Region, Network Performance between your applications and
the chosen region. Review the SLAs for Azure Storage services in different regions)
Azure Data Engineer

5.) Performance (Standard and Premium)

6.) Redundancy (LRS, GRS, ZRS, GZRS)
LRS---> Replicates data within a single data center
GRS---> Replicates data to a secondary region for disaster recovery
ZRS---> Replicates data across different availability zones
GZRS---> Combines GRS and ZRS for maximum redundancy.

Advanced:

1.) Require secure transfer for REST API operations---> HTTP, HTTPS are performed securely using
SSL/TLS encryption
2.) Allow enabling public access on individual containers-----> By default, containers within a storage
account are private. Enabling this option allows you to grant public access to specific containers if
needed.
3.) Enable storage account key access----> allows you to access the storage account using the
account keys
4.) Default to Azure Active Directory authorization in the Azure portal---> allows you to use Azure
Active Directory (AD) for authentication and authorization instead of storage account keys. It
provides more secure and granular access control to your storage account resources.
5.) Minimum TLS version- Transport Layer Security and Choosing a higher version ensures stronger
encryption and better security.
6.) Enable hierarchical namespace
7.) ACCESS PROTOCOLS - Enable SFTP and network file system v3----> Enabling these protocols
allows you to access your storage account using SFTP (Secure File Transfer Protocol) and NFS
(Network File System) v3.
8.) BLOB STORAGE - Access across tenant Replication, and Access Tier
9.) AZURE FILES - Enable Large File Shares

Networking

1.) Network access------> 1. Enable public access from all networks

2. Enable public access from selected virtual networks and IP addresses
3. Disable public access and use private access
Virtual networks

Network routing

Routing Preferences ------> Microsoft network routing and Internet routing

Microsoft network routing ensures that traffic between Azure resources within the same
region stays within the Azure network, while Internet routing allows traffic to flow through
the internet.

Data Protection
Azure Data Engineer

1.) Enable point-in-time restore for containers

2.) Enable soft delete for blobs [Days to retain deleted blobs and Soft delete enables you to recover
blobs that were previously marked for deletion, including blobs that were overwritten.]
3.) Enable soft delete for containers
4.) Enable soft delete for file shares

Tracking:

Enable versioning for blobs---> Use versioning to automatically maintain previous versions of your
blobs.

Enable blob change feed ---> Keep track of create, modification, and delete changes to blobs in your
account.

Access control:

Enable version-level immutability support

- Allows you to set time-based retention policy on the account-level that will apply to all blob
versions. Enable this feature to set a default policy at the account level. Without enabling this, you can still
set a default policy at the container level or set policies for specific blob versions. Versioning is required
for this property to be enabled.

Encryption:

Encryption type -----> 1. Microsoft Managed keys

2. Customer Managed Keys

Customer Managed Keys-------> 1. Blob and file service only, or

2. To all service types.

Customer-managed key (CMK) support can be limited to blob service and file service only,
or to all service types. After the storage account is created, this support cannot be
changed.

Designing a partition strategy for files in Azure:

1. Choose a partition key: Determine a partition key based on the characteristics of your data, such
as customer ID, date, or geographical location. This key will be used to distribute your data across
different partitions.
2. Select a partitioning scheme: Azure provides two partitioning schemes: partition by range and
partition by hash. Partition by range is suitable when you have sequential or time-based data.
Partition by hash is useful when you want to distribute data uniformly across partitions.
Azure Data Engineer

3. Define the partitioning strategy: Implement the chosen partitioning scheme by creating a
partition map. This map specifies the partition key, the partition boundaries (in the case of range
partitioning), and the number of partitions (in the case of hash partitioning).
4. Distribute the data: When writing data to Azure, include the partition key in the data. Azure will
use this key to determine the appropriate partition for storing the data.

Designing a partition strategy for files has partition key and the partition logic which are dependent on
one another. Example, if we take the partition key has Create Date then the partition logic need adhere to
this Partition key in order to store the files in exact partition.

Example for partition by range:

def get_partition_key(date):
if "2020-01-01" <= date <= "2020-06-30":
return "Partition A"
elif "2020-07-01" <= date <= "2020-12-31":
return "Partition B"
else:
return "Invalid Date Range"

# Example usage
file_date = "2020-05-15"
partition_key = get_partition_key(file_date)
print(partition_key) # Output: Partition A

Example for partition by hash:

import hashlib

def get_partition_key(file_name):
# Generate a hash value for the file name
hash_value = hashlib.md5(file_name.encode()).hexdigest()

# Extract a portion of the hash value to use as the partition key

partition_key = hash_value[:2]

return partition_key

def store_file(file_name, file_content):

partition_key = get_partition_key(file_name)
Azure Data Engineer

# Logic to store the file in the appropriate partition based on the

partition key
# For example, you can use Azure Blob Storage and create containers for each
partition

# Code to store the file in the corresponding partition container

# For example, using Azure Blob Storage SDK:
# blob_service_client =
BlobServiceClient.from_connection_string(connection_string)
# container_client = blob_service_client.get_container_client(partition_key)
# blob_client = container_client.get_blob_client(file_name)
# blob_client.upload_blob(file_content)

def access_file(file_name):
partition_key = get_partition_key(file_name)

# Logic to access the file based on the partition key

# For example, you can retrieve the file from the corresponding partition
container

# Code to access the file from the corresponding partition container

return file_content

Azure Storage uses <account name + container name + blob name> as

the partition key.

Designing a partition strategy for analytical workloads

There are three main types of partition strategies for analytical workloads. These are listed here:

 Horizontal partitioning, which is also known as sharding

 Vertical partitioning
 Functional partitioning

Horizontal partitioning

In a horizontal partition, we divide the table data horizontally, and subsets of rows are stored in
different data stores. Each of these subsets of rows (with the same schema as the parent table)
are called shards. Essentially, each of these shards is stored in different database instances.
Azure Data Engineer

NOTE

Don't try to balance the data to be evenly distributed across partitions unless specifically
required by your use case because usually, the most recent data will get accessed more
than older data. Thus, the partitions with recent data will end up becoming bottlenecks
due to high data access.

Vertical partitioning

In a vertical partition, we divide the data vertically, and each subset of the columns is stored
separately in a different data store. This is ideal for column-oriented data stores such as HBase,
Cosmos DB, and so on.
Azure Data Engineer

Functional partitioning

Functional partitions are similar to vertical partitions, except that here, we store entire tables or
entities in different data stores. They can be used to segregate data belonging to different
organizations, frequently used tables from infrequently used ones, read-write tables from read-
only ones, sensitive data from general data, and so on.
Azure Data Engineer

Designing a partition strategy for efficiency/performance

 Design effective folder structures to improve the efficiency of data reads and writes.
 Partition data such that a significant amount of data can be pruned while running
queries.
 File sizes in the range of 256 megabytes (MB) to 100 gigabytes (GB) perform really
well with analytical engines such as HDInsight and Azure Synapse, gen2 . So, aggregate
the files to these ranges before running the analytical engines on them.
 For I/O-intensive jobs, try to keep the optimal I/O buffer sizes in the range of 4 to 16
MB; anything too big or too small will become inefficient.
 Run more containers or executors per virtual machine (VM) (such as Apache Spark
executors or Apache Yet Another Resource Negotiator (YARN) containers).

Iterative query performance improvement process

1. List business-critical queries, the most frequently run queries, and the slowest queries.
2. Check the query plans for each of these queries using the EXPLAIN keyword and see the
amount of data being used at each stage (we will be learning about how to view query
plans in the later chapters).
3. Identify the joins or filters that are taking the most time. Identify the corresponding data
partitions.
4. Try to split the corresponding input data partitions into smaller partitions, or change the
application logic to perform isolated processing on top of each partition and later merge
only the filtered data.
5. You could also try to see if other partitioning keys would work better and if you need to
repartition the data to get better job performance for each partition.
6. If any particular partitioning technology doesn't work, you can explore having more than
one piece of partitioning logic—for example, you could apply horizontal partitioning
within functional partitioning, and so on.
7. Monitor the partitioning regularly to check if the application access patterns are balanced
and well distributed. Try to identify hot spots early on.
8. Iterate this process until you hit the preferred query execution time.

Designing a partition strategy for Azure Synapse Analytics

A dedicated SQL pool is a massively parallel processing (MPP) system that splits the queries
into 60 parallel queries and executes them in parallel. Each of these smaller queries runs on
something called a distribution. A distribution is a basic unit of processing and storage for a
dedicated SQL pool. There are three different ways to distribute (shard) data among
distributions, as listed here:

 Round-robin tables
 Hash tables
Azure Data Engineer

 Replicated tables

Partitioning is supported on all the distribution types in the preceding list. Apart from the
distribution types, Dedicated SQL pool also supports three types of tables: clustered
columnstore, clustered index, and heap tables.Partitioning is supported in all of these types of
tables, too.

In a dedicated SQL pool, data is already distributed across its 60 distributions, so we need to be
careful in deciding if we need to further partition the data. The clustered columnstore tables work
optimally when the number of rows per table in a distribution is around 1 million.

For example, if we plan to partition the data further by the months of a year, we are talking about
12 partitions x 60 distributions = 720 sub-divisions. Each of these divisions needs to have at least
1 million rows; in other words, the table (usually a fact table) will need to have more than 720
million rows. So, we will have to be careful to not over-partition the data when it comes to
dedicated SQL pools.

Identifying when partitioning is needed in ADLS Gen2

As we have learned in the previous chapter, we can partition data according to our requirements
—such as performance, scalability, security, operational overhead, and so on—but there is
another reason why we might end up partitioning our data, and that is the various I/O bandwidth
limits that are imposed at subscription levels by Azure. These limits apply to both Blob storage
and ADLS Gen2.

The rate at which we ingest data into an Azure Storage system is called the ingress rate, and
the rate at which we move the data out of the Azure Storage system is called the egress rate.

Resource Limit
Maximum number of storage accounts with standard endpoints per region per 250 by default,
subscription, including standard and premium storage accounts. 500 by request 1

Maximum number of storage accounts with Azure DNS zone endpoints (preview) 5000 (preview)
per region per subscription, including standard and premium storage accounts.
Default maximum storage account capacity 5 PiB 2

Maximum number of blob containers, blobs, file shares, tables, queues, entities, No limit
or messages per storage account.
Default maximum request rate per storage account 20,000 requests
per second 2
Azure Data Engineer

Resource Limit
Default maximum ingress per general-purpose v2 and Blob storage account in 60 Gbps 2

the following regions (LRS/GRS):

 Australia East
 Central US
 East Asia
 East US 2
 Japan East
 Korea Central
 North Europe
 South Central US
 Southeast Asia
 UK South
 West Europe
 West US
Default maximum ingress per general-purpose v2 and Blob storage account in 60 Gbps 2

the following regions (ZRS):

 Australia East
 Central US
 East US
 East US 2
 Japan East
 North Europe
 South Central US
 Southeast Asia
 UK South
 West Europe
 West US 2
Default maximum ingress per general-purpose v2 and Blob storage account in 25 Gbps 2

regions that aren't listed in the previous row.

Default maximum ingress for general-purpose v1 storage accounts (all regions) 10 Gbps 2

Default maximum egress for general-purpose v2 and Blob storage accounts in 120 Gbps 2

the following regions (LRS/GRS):

 Australia East
 Central US
 East Asia
 East US 2
Azure Data Engineer

Resource Limit
 Japan East
 Korea Central
 North Europe
 South Central US
 Southeast Asia
 UK South
 West Europe
 West US
Default maximum egress for general-purpose v2 and Blob storage accounts in 120 Gbps 2

the following regions (ZRS):

 Australia East
 Central US
 East US
 East US 2
 Japan East
 North Europe
 South Central US
 Southeast Asia
 UK South
 West Europe
 West US 2
Default maximum egress for general-purpose v2 and Blob storage accounts in 50 Gbps 2

regions that aren't listed in the previous row.

Maximum number of IP address rules per storage account 200
Maximum number of virtual network rules per storage account 200
Maximum number of resource instance rules per storage account 200
Maximum number of private endpoints per storage account 200

Develop data processing (40–45%) (4)

Azure Data Engineer

Ingest and transform data (Chapter 8)

Transforming data by using Apache Spark

Apache Spark supports transformations with three different Application Programming

Interfaces (APIs): Resilient Distributed Datasets (RDDs), DataFrames, and Datasets. We will
learn about RDDs and DataFrame transformations in this chapter. Datasets are just extensions of
DataFrames, with additional features like being type-safe (where the compiler will strictly check
for data types) and providing an object-oriented (OO) interface.

What are RDDs?

RDDs are an immutable fault-tolerant collection of data objects that can be operated on in
parallel by Spark.

Azure Data Engineer Interview Questions and Answers
No ratings yet
Azure Data Engineer Interview Questions and Answers
7 pages
Data Fundamentals
No ratings yet
Data Fundamentals
37 pages
Azure Databricks Monitoring
100% (1)
Azure Databricks Monitoring
22 pages
DP-203 StudyGuide ENU FY23Q2a Vnext
No ratings yet
DP-203 StudyGuide ENU FY23Q2a Vnext
13 pages
Databricks How To Data Import PDF
No ratings yet
Databricks How To Data Import PDF
16 pages
HowToCrackInterview Udemy
No ratings yet
HowToCrackInterview Udemy
58 pages
AZ304 MicrosoftAzureArchitectDesign1
No ratings yet
AZ304 MicrosoftAzureArchitectDesign1
5 pages
Pyspark Hands on
No ratings yet
Pyspark Hands on
189 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
MS Azure Data Factory Lab Overview
No ratings yet
MS Azure Data Factory Lab Overview
58 pages
Azure DataEngineering End To End Videos
No ratings yet
Azure DataEngineering End To End Videos
21 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Azure Data Factory Notes 1682135573
No ratings yet
Azure Data Factory Notes 1682135573
78 pages
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
No ratings yet
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
14 pages
Dp203 Notes
No ratings yet
Dp203 Notes
87 pages
Vijay Kanth - Azure Data Engineer
No ratings yet
Vijay Kanth - Azure Data Engineer
2 pages
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
8 pages
Azure Interview Questions
No ratings yet
Azure Interview Questions
5 pages
Azure Developer S Cheat Sheet
No ratings yet
Azure Developer S Cheat Sheet
2 pages
Azure Event Grid PDF
No ratings yet
Azure Event Grid PDF
5 pages
Databricks Project
No ratings yet
Databricks Project
1 page
About Azure Landing Zone
No ratings yet
About Azure Landing Zone
4 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
How To Land On Azure Data Engineer Job
No ratings yet
How To Land On Azure Data Engineer Job
5 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Data Engineer (Azure) Curriculum
No ratings yet
Data Engineer (Azure) Curriculum
3 pages
Azure Data Engineer Course Curriculum Nareshit
No ratings yet
Azure Data Engineer Course Curriculum Nareshit
10 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
Interactive Visual Data Exploration With Spark in Databricks Cloud
No ratings yet
Interactive Visual Data Exploration With Spark in Databricks Cloud
26 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
15 pages
Azure Data Engineering Project Part 1
No ratings yet
Azure Data Engineering Project Part 1
41 pages
Azure Data Factory tutorial
No ratings yet
Azure Data Factory tutorial
36 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Implementing An Azure Data Solution DP-200 - DumpsTool - Mansoor
No ratings yet
Implementing An Azure Data Solution DP-200 - DumpsTool - Mansoor
4 pages
Warner DP 203 Slides
No ratings yet
Warner DP 203 Slides
98 pages
Azure Question Latest
No ratings yet
Azure Question Latest
11 pages
Azure Data Engineer Learning Path (OCT 2019)
No ratings yet
Azure Data Engineer Learning Path (OCT 2019)
1 page
What Is Azure Data Engineer
No ratings yet
What Is Azure Data Engineer
74 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Azure Data Fundamentals
No ratings yet
Azure Data Fundamentals
210 pages
Data Engineering Roadmap 2023
No ratings yet
Data Engineering Roadmap 2023
1 page
Learn More About SQL Interview Questions-Ii: The Expert'S Voice in SQL Server
No ratings yet
Learn More About SQL Interview Questions-Ii: The Expert'S Voice in SQL Server
12 pages
Azure Cost Optimization
No ratings yet
Azure Cost Optimization
13 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Manage Data Access With Unity Catalog
No ratings yet
Manage Data Access With Unity Catalog
17 pages
Azure Data Engineering Course
No ratings yet
Azure Data Engineering Course
20 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
MS Azure: Online, Classroom, Corporate Mr. Khaja 45 Days
No ratings yet
MS Azure: Online, Classroom, Corporate Mr. Khaja 45 Days
15 pages
TCS Azure Data Engineer Interview Questions and Answers
No ratings yet
TCS Azure Data Engineer Interview Questions and Answers
7 pages
Azure Databricks Overview
No ratings yet
Azure Databricks Overview
23 pages
Azure Data Factory v2 (PDFDrive)
No ratings yet
Azure Data Factory v2 (PDFDrive)
78 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
Hands On Lab Guide For Data Lake PDF
No ratings yet
Hands On Lab Guide For Data Lake PDF
19 pages
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Learning Ansible
From Everand
Learning Ansible
Wayne Taylor
No ratings yet
Expert Tips for ALL Your Snowflake SnowPro Certifications
From Everand
Expert Tips for ALL Your Snowflake SnowPro Certifications
Cristian Scutaru
No ratings yet
Section 13 - Pressure Sewer Force Main Design Guideline
No ratings yet
Section 13 - Pressure Sewer Force Main Design Guideline
4 pages
One-Sample Kolmogorov-Smirnov Test: Npar Tests
No ratings yet
One-Sample Kolmogorov-Smirnov Test: Npar Tests
25 pages
Part 1 Out of 2 Cfm56
No ratings yet
Part 1 Out of 2 Cfm56
400 pages
Report 2-محول
No ratings yet
Report 2-محول
14 pages
Report Final
No ratings yet
Report Final
18 pages
Worldpremium: AC-250M/156-60S AC-255M/156-60S AC-260M/156-60S
No ratings yet
Worldpremium: AC-250M/156-60S AC-255M/156-60S AC-260M/156-60S
2 pages
Steel Structure DP Sir
No ratings yet
Steel Structure DP Sir
340 pages
Constraints in SQL Server
No ratings yet
Constraints in SQL Server
5 pages
Design of Flyover in Trichy
No ratings yet
Design of Flyover in Trichy
6 pages
Phy Viva Unit 12017-11-11 204059
No ratings yet
Phy Viva Unit 12017-11-11 204059
6 pages
Mechanical Properties of Solids
No ratings yet
Mechanical Properties of Solids
8 pages
TESLA Presentation
No ratings yet
TESLA Presentation
13 pages
Semi-Solid Metal Casting - Wikipedia, The Free Encyclopedia
No ratings yet
Semi-Solid Metal Casting - Wikipedia, The Free Encyclopedia
4 pages
Dsbda 3
No ratings yet
Dsbda 3
12 pages
Jai Paras Construction & Engg. Co. Bill of Quantity For Storm Water Line
No ratings yet
Jai Paras Construction & Engg. Co. Bill of Quantity For Storm Water Line
3 pages
Reflexive Verbs and Pronouns
No ratings yet
Reflexive Verbs and Pronouns
3 pages
Karthik Resume
No ratings yet
Karthik Resume
3 pages
4 V2 2831 PR PID 000001 - 1 - PDF
No ratings yet
4 V2 2831 PR PID 000001 - 1 - PDF
1 page
Sensors: A Stress Sensor Based On Galvanic Skin Response (GSR) Controlled by Zigbee
No ratings yet
Sensors: A Stress Sensor Based On Galvanic Skin Response (GSR) Controlled by Zigbee
30 pages
Shunt Capacitor Banks: 7 Lecture's Outline
No ratings yet
Shunt Capacitor Banks: 7 Lecture's Outline
30 pages
Unit 1 - Session 2: What's Your Last Name?
No ratings yet
Unit 1 - Session 2: What's Your Last Name?
11 pages
Scope in Programming Languages
No ratings yet
Scope in Programming Languages
14 pages
Gen Math DLL Week 6
No ratings yet
Gen Math DLL Week 6
2 pages
PSCP 1
No ratings yet
PSCP 1
84 pages
Calculation of Volume & Bulk Density of Bricks
No ratings yet
Calculation of Volume & Bulk Density of Bricks
25 pages
General Chemistry2 - Lesson3
No ratings yet
General Chemistry2 - Lesson3
3 pages
Sec Ip40 Datasheet
No ratings yet
Sec Ip40 Datasheet
2 pages
Laboratory Experiment 3
No ratings yet
Laboratory Experiment 3
14 pages
Activity 4B Impedance of RLC Circuits: Parallel RLC Circuit: Electrical Engineering Department
No ratings yet
Activity 4B Impedance of RLC Circuits: Parallel RLC Circuit: Electrical Engineering Department
10 pages
Magnetism and Electromagnetism PDF
100% (1)
Magnetism and Electromagnetism PDF
7 pages

Azure - Implementation Notes

Uploaded by

Azure - Implementation Notes

Uploaded by

Azure Data Engineer

To Create an Azure Storage Account:

5.) Performance (Standard and Premium)

1.) Network access------> 1. Enable public access from all networks

Routing Preferences ------> Microsoft network routing and Internet routing

1.) Enable point-in-time restore for containers

Enable version-level immutability support

Encryption type -----> 1. Microsoft Managed keys

2. Customer Managed Keys

Customer Managed Keys-------> 1. Blob and file service only, or

Designing a partition strategy for files in Azure:

Example for partition by range:

Example for partition by hash:

# Extract a portion of the hash value to use as the partition key

def store_file(file_name, file_content):

# Logic to store the file in the appropriate partition based on the

# Code to store the file in the corresponding partition container

# Logic to access the file based on the partition key

# Code to access the file from the corresponding partition container

Azure Storage uses <account name + container name + blob name> as

Designing a partition strategy for analytical workloads

 Horizontal partitioning, which is also known as sharding

Designing a partition strategy for efficiency/performance

Iterative query performance improvement process

Designing a partition strategy for Azure Synapse Analytics

Identifying when partitioning is needed in ADLS Gen2

the following regions (LRS/GRS):

the following regions (ZRS):

regions that aren't listed in the previous row.

the following regions (LRS/GRS):

the following regions (ZRS):

regions that aren't listed in the previous row.

Develop data processing (40–45%) (4)

Ingest and transform data (Chapter 8)

Transforming data by using Apache Spark

Apache Spark supports transformations with three different Application Programming

What are RDDs?

You might also like