0% found this document useful (0 votes)
155 views4 pages

Study Guide For Exam DP-203 - Data Engineering On Microsoft Azure - Microsoft Learn

This document provides an overview of the skills needed to pass the Azure Data Engineer Associate certification exam. It covers designing and implementing data storage, developing data processing solutions using various Azure services, and securing, monitoring, and optimizing data storage and processing.

Uploaded by

nirmalworks93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views4 pages

Study Guide For Exam DP-203 - Data Engineering On Microsoft Azure - Microsoft Learn

This document provides an overview of the skills needed to pass the Azure Data Engineer Associate certification exam. It covers designing and implementing data storage, developing data processing solutions using various Azure services, and securing, monitoring, and optimizing data storage and processing.

Uploaded by

nirmalworks93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Note SQL

Most questions cover features that are general availability (GA). The exam may contain questions on Preview features if those features Python
are commonly used.
Scala

You need to understand parallel processing and data architecture patterns. You should be proficient in using the following to create
Skills measured as of November 2, 2023 data processing solutions:

Azure Data Factory


Audience profile
Azure Synapse Analytics
As a candidate for this exam, you should have subject matter expertise in integrating, transforming, and consolidating data from
various structured, unstructured, and streaming data systems into a suitable schema for building analytics solutions. Azure Stream Analytics

As an Azure data engineer, you help stakeholders understand the data through exploration, and build and maintain secure and Azure Event Hubs

compliant data processing pipelines by using different tools and techniques. You use various Azure data services and frameworks to
Azure Data Lake Storage
store and produce cleansed and enhanced datasets for analysis. This data store can be designed with different architecture patterns
based on business requirements, including: Azure Databricks

Management data warehouse (MDW)


Skills at a glance
Big data
Design and implement data storage (15–20%)
Lakehouse architecture
Develop data processing (40–45%)
As an Azure data engineer, you also help to ensure that the operationalization of data pipelines and data stores are high-performing,
efficient, organized, and reliable, given a set of business requirements and constraints. You help to identify and troubleshoot Secure, monitor, and optimize data storage and data processing (30–35%)
operational and data quality issues. You also design, implement, monitor, and optimize data platforms to meet the data pipelines.

As a candidate for this exam, you must have solid knowledge of data processing languages, including: Design and implement data storage (15–20%)

Implement a partition strategy Transform data by using Transact-SQL (T-SQL) in Azure Synapse Analytics

Implement a partition strategy for files Ingest and transform data by using Azure Synapse Pipelines or Azure Data Factory

Implement a partition strategy for analytical workloads Transform data by using Azure Stream Analytics

Implement a partition strategy for streaming workloads Cleanse data

Implement a partition strategy for Azure Synapse Analytics Handle duplicate data

Identify when partitioning is needed in Azure Data Lake Storage Gen2 Avoiding duplicate data by using Azure Stream Analytics Exactly Once Delivery

Handle missing data


Design and implement the data exploration layer Handle late-arriving data
Create and execute queries by using a compute solution that leverages SQL serverless and Spark cluster
Split data
Recommend and implement Azure Synapse Analytics database templates
Shred JSON
Push new or updated data lineage to Microsoft Purview
Encode and decode data
Browse and search metadata in Microsoft Purview Data Catalog
Configure error handling for a transformation

Normalize and denormalize data


Develop data processing (40–45%)
Perform data exploratory analysis

Ingest and transform data


Develop a batch processing solution
Design and implement incremental loads
Develop batch processing solutions by using Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics, and Azure Data
Transform data by using Apache Spark
Factory
Use PolyBase to load data to a SQL pool Process time series data

Implement Azure Synapse Link and query the replicated data Process data across partitions

Create data pipelines Process within one partition

Scale resources Configure checkpoints and watermarking during processing

Configure the batch size Scale resources

Create tests for data pipelines Create tests for data pipelines

Integrate Jupyter or Python notebooks into a data pipeline Optimize pipelines for analytical or transactional purposes

Upsert data Handle interruptions

Revert data to a previous state Configure exception handling

Configure exception handling Upsert data

Configure batch retention Replay archived stream data

Read from and write to a delta lake


Manage batches and pipelines
Develop a stream processing solution Trigger batches

Create a stream processing solution by using Stream Analytics and Azure Event Hubs Handle failed batch loads

Process data by using Spark structured streaming Validate batch loads

Create windowed aggregates Manage data pipelines in Azure Data Factory or Azure Synapse Pipelines

Handle schema drift Schedule data pipelines in Data Factory or Azure Synapse Pipelines

Implement version control for pipeline artifacts Monitor data storage and data processing
Manage Spark jobs in a pipeline Implement logging used by Azure Monitor

Configure monitoring services


Secure, monitor, and optimize data storage and data processing (30–35%)
Monitor stream processing

Implement data security Measure performance of data movement

Implement data masking Monitor and update statistics about data across a system

Encrypt data at rest and in motion Monitor data pipeline performance

Implement row-level and column-level security Measure query performance

Implement Azure role-based access control (RBAC) Schedule and monitor pipeline tests

Implement POSIX-like access control lists (ACLs) for Data Lake Storage Gen2 Interpret Azure Monitor metrics and logs

Implement a data retention policy Implement a pipeline alert strategy

Implement secure endpoints (private and public)


Optimize and troubleshoot data storage and data processing
Implement resource tokens in Azure Databricks
Compact small files
Load a DataFrame with sensitive information
Handle skew in data
Write encrypted data to tables or Parquet files
Handle data spill
Manage sensitive information
Optimize resource management
Tune queries by using indexers Study resources Links to learning and documentation

Tune queries by using cache Follow Microsoft Learn Microsoft Learn - Microsoft Tech Community

Troubleshoot a failed Spark job Find a video Exam Readiness Zone


Data Exposed
Troubleshoot a failed pipeline run, including activities executed in external services Browse other Microsoft Learn shows

Study resources Change log


We recommend that you train and get hands-on experience before you take the exam. We offer self-study options and classroom Key to understanding the table: The topic groups (also known as functional groups) are in bold typeface followed by the objectives
training as well as links to documentation, community sites, and videos. within each group. The table is a comparison between the two versions of the exam skills measured and the third column describes the
extent of the changes.
ノ Expand table

ノ Expand table
Study resources Links to learning and documentation

Get trained Choose from self-paced learning paths and modules or take an instructor-led course Skill area prior to November 2, 2023 Skill area as of November 2, 2023 Change

Find documentation Azure Data Lake Storage Audience profile No change


Azure Synapse Analytics
Design and implement data storage Design and implement data storage No change
Azure Databricks
Data Factory
Implement a partition strategy Implement a partition strategy No change
Azure Stream Analytics
Event Hubs Design and implement the data exploration layer Design and implement the data exploration layer No change
Azure Monitor
Develop data processing Develop data processing No change
Ask a question Microsoft Q&A | Microsoft Docs
Ingest and transform data Ingest and transform data Minor
Get community support Analytics on Azure | TechCommunity
Azure Synapse Analytics | TechCommunity Develop a batch processing solution Develop a batch processing solution No change

Skill area prior to November 2, 2023 Skill area as of November 2, 2023 Change Candidates for this exam must have solid knowledge of data processing languages, including SQL, Python, and Scala, and they need to
understand parallel processing and data architecture patterns. They should be proficient in using Azure Data Factory, Azure Synapse
Develop a stream processing solution Develop a stream processing solution No change
Analytics, Azure Stream Analytics, Azure Event Hubs, Azure Data Lake Storage, and Azure Databricks to create data processing
Manage batches and pipelines Manage batches and pipelines No change solutions.

Secure, monitor, and optimize data storage and data processing Secure, monitor, and optimize data storage and data processing No change
Skills at a glance
Implement data security Implement data security No change

Design and implement data storage (15–20%)


Monitor data storage and data processing Monitor data storage and data processing No change

Optimize and troubleshoot data storage and data processing Optimize and troubleshoot data storage and data processing No change Develop data processing (40–45%)

Secure, monitor, and optimize data storage and data processing (30–35%)

Skills measured prior to November 2, 2023


Design and implement data storage (15–20%)
Audience profile
Implement a partition strategy
Candidates for this exam should have subject matter expertise in integrating, transforming, and consolidating data from various
Implement a partition strategy for files
structured, unstructured, and streaming data systems into a suitable schema for building analytics solutions.

Azure data engineers help stakeholders understand the data through exploration, and they build and maintain secure and compliant Implement a partition strategy for analytical workloads

data processing pipelines by using different tools and techniques. These professionals use various Azure data services and frameworks
Implement a partition strategy for streaming workloads
to store and produce cleansed and enhanced datasets for analysis. This data store can be designed with different architecture patterns
based on business requirements, including modern data warehouse (MDW), big data, or lakehouse architecture. Implement a partition strategy for Azure Synapse Analytics

Azure data engineers also help to ensure that the operationalization of data pipelines and data stores are high-performing, efficient, Identify when partitioning is needed in Azure Data Lake Storage Gen2
organized, and reliable, given a set of business requirements and constraints. These professionals help to identify and troubleshoot
operational and data quality issues. They also design, implement, monitor, and optimize data platforms to meet the data pipelines.
Design and implement the data exploration layer
Create and execute queries by using a compute solution that leverages SQL serverless and Spark cluster Shred JSON

Recommend and implement Azure Synapse Analytics database templates Encode and decode data

Push new or updated data lineage to Microsoft Purview Configure error handling for a transformation

Browse and search metadata in Microsoft Purview Data Catalog Normalize and denormalize data

Perform data exploratory analysis


Develop data processing (40–45%)
Develop a batch processing solution
Ingest and transform data
Develop batch processing solutions by using Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics, and Azure Data
Design and implement incremental loads Factory

Transform data by using Apache Spark Use PolyBase to load data to a SQL pool

Transform data by using Transact-SQL (T-SQL) in Azure Synapse Analytics Implement Azure Synapse Link and query the replicated data

Ingest and transform data by using Azure Synapse Pipelines or Azure Data Factory Create data pipelines

Transform data by using Azure Stream Analytics Scale resources

Cleanse data Configure the batch size

Handle duplicate data Create tests for data pipelines

Handle missing data Integrate Jupyter or Python notebooks into a data pipeline

Handle late-arriving data Upsert data

Split data Revert data to a previous state

Configure exception handling Upsert data

Configure batch retention Replay archived stream data

Read from and write to a delta lake


Manage batches and pipelines
Develop a stream processing solution Trigger batches

Create a stream processing solution by using Stream Analytics and Azure Event Hubs Handle failed batch loads

Process data by using Spark structured streaming Validate batch loads

Create windowed aggregates Manage data pipelines in Azure Data Factory or Azure Synapse Pipelines

Handle schema drift Schedule data pipelines in Data Factory or Azure Synapse Pipelines

Process time series data Implement version control for pipeline artifacts

Process data across partitions Manage Spark jobs in a pipeline

Process within one partition


Secure, monitor, and optimize data storage and data processing (30–35%)
Configure checkpoints and watermarking during processing

Scale resources
Implement data security
Create tests for data pipelines
Implement data masking

Optimize pipelines for analytical or transactional purposes


Encrypt data at rest and in motion

Handle interruptions
Implement row-level and column-level security

Configure exception handling

You might also like