Study Guide For Exam DP-203 - Data Engineering On Microsoft Azure - Microsoft Learn
Study Guide For Exam DP-203 - Data Engineering On Microsoft Azure - Microsoft Learn
Most questions cover features that are general availability (GA). The exam may contain questions on Preview features if those features Python
are commonly used.
Scala
You need to understand parallel processing and data architecture patterns. You should be proficient in using the following to create
Skills measured as of November 2, 2023 data processing solutions:
As an Azure data engineer, you help stakeholders understand the data through exploration, and build and maintain secure and Azure Event Hubs
compliant data processing pipelines by using different tools and techniques. You use various Azure data services and frameworks to
Azure Data Lake Storage
store and produce cleansed and enhanced datasets for analysis. This data store can be designed with different architecture patterns
based on business requirements, including: Azure Databricks
As a candidate for this exam, you must have solid knowledge of data processing languages, including: Design and implement data storage (15–20%)
Implement a partition strategy Transform data by using Transact-SQL (T-SQL) in Azure Synapse Analytics
Implement a partition strategy for files Ingest and transform data by using Azure Synapse Pipelines or Azure Data Factory
Implement a partition strategy for analytical workloads Transform data by using Azure Stream Analytics
Implement a partition strategy for Azure Synapse Analytics Handle duplicate data
Identify when partitioning is needed in Azure Data Lake Storage Gen2 Avoiding duplicate data by using Azure Stream Analytics Exactly Once Delivery
Implement Azure Synapse Link and query the replicated data Process data across partitions
Create tests for data pipelines Create tests for data pipelines
Integrate Jupyter or Python notebooks into a data pipeline Optimize pipelines for analytical or transactional purposes
Create a stream processing solution by using Stream Analytics and Azure Event Hubs Handle failed batch loads
Create windowed aggregates Manage data pipelines in Azure Data Factory or Azure Synapse Pipelines
Handle schema drift Schedule data pipelines in Data Factory or Azure Synapse Pipelines
Implement version control for pipeline artifacts Monitor data storage and data processing
Manage Spark jobs in a pipeline Implement logging used by Azure Monitor
Implement data masking Monitor and update statistics about data across a system
Implement Azure role-based access control (RBAC) Schedule and monitor pipeline tests
Implement POSIX-like access control lists (ACLs) for Data Lake Storage Gen2 Interpret Azure Monitor metrics and logs
Tune queries by using cache Follow Microsoft Learn Microsoft Learn - Microsoft Tech Community
ノ Expand table
Study resources Links to learning and documentation
Get trained Choose from self-paced learning paths and modules or take an instructor-led course Skill area prior to November 2, 2023 Skill area as of November 2, 2023 Change
Skill area prior to November 2, 2023 Skill area as of November 2, 2023 Change Candidates for this exam must have solid knowledge of data processing languages, including SQL, Python, and Scala, and they need to
understand parallel processing and data architecture patterns. They should be proficient in using Azure Data Factory, Azure Synapse
Develop a stream processing solution Develop a stream processing solution No change
Analytics, Azure Stream Analytics, Azure Event Hubs, Azure Data Lake Storage, and Azure Databricks to create data processing
Manage batches and pipelines Manage batches and pipelines No change solutions.
Secure, monitor, and optimize data storage and data processing Secure, monitor, and optimize data storage and data processing No change
Skills at a glance
Implement data security Implement data security No change
Optimize and troubleshoot data storage and data processing Optimize and troubleshoot data storage and data processing No change Develop data processing (40–45%)
Secure, monitor, and optimize data storage and data processing (30–35%)
Azure data engineers help stakeholders understand the data through exploration, and they build and maintain secure and compliant Implement a partition strategy for analytical workloads
data processing pipelines by using different tools and techniques. These professionals use various Azure data services and frameworks
Implement a partition strategy for streaming workloads
to store and produce cleansed and enhanced datasets for analysis. This data store can be designed with different architecture patterns
based on business requirements, including modern data warehouse (MDW), big data, or lakehouse architecture. Implement a partition strategy for Azure Synapse Analytics
Azure data engineers also help to ensure that the operationalization of data pipelines and data stores are high-performing, efficient, Identify when partitioning is needed in Azure Data Lake Storage Gen2
organized, and reliable, given a set of business requirements and constraints. These professionals help to identify and troubleshoot
operational and data quality issues. They also design, implement, monitor, and optimize data platforms to meet the data pipelines.
Design and implement the data exploration layer
Create and execute queries by using a compute solution that leverages SQL serverless and Spark cluster Shred JSON
Recommend and implement Azure Synapse Analytics database templates Encode and decode data
Push new or updated data lineage to Microsoft Purview Configure error handling for a transformation
Browse and search metadata in Microsoft Purview Data Catalog Normalize and denormalize data
Transform data by using Apache Spark Use PolyBase to load data to a SQL pool
Transform data by using Transact-SQL (T-SQL) in Azure Synapse Analytics Implement Azure Synapse Link and query the replicated data
Ingest and transform data by using Azure Synapse Pipelines or Azure Data Factory Create data pipelines
Handle missing data Integrate Jupyter or Python notebooks into a data pipeline
Create a stream processing solution by using Stream Analytics and Azure Event Hubs Handle failed batch loads
Create windowed aggregates Manage data pipelines in Azure Data Factory or Azure Synapse Pipelines
Handle schema drift Schedule data pipelines in Data Factory or Azure Synapse Pipelines
Process time series data Implement version control for pipeline artifacts
Scale resources
Implement data security
Create tests for data pipelines
Implement data masking
Handle interruptions
Implement row-level and column-level security