Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
Skills Measured
Audience Profile
Candidates for this exam should have subject matter expertise integrating, transforming, and
consolidating data from various structured and unstructured data systems into a structure that is
suitable for building analytics solutions.
Azure Data Engineers help stakeholders understand the data through exploration, and they
build and maintain secure and compliant data processing pipelines by using different tools and
techniques. These professionals use various Azure data services and languages to store and
produce cleansed and enhanced datasets for analysis.
Azure Data Engineers also help ensure that data pipelines and data stores are high-performing,
efficient, organized, and reliable, given a set of business requirements and constraints. They deal
with unanticipated issues swiftly, and they minimize data loss. They also design, implement,
monitor, and optimize data platforms to meet the data pipelines needs.
A candidate for this exam must have strong knowledge of data processing languages such as
SQL, Python, or Scala, and they need to understand parallel processing and data architecture
patterns.
Skills Measured
NOTE: The bullets that follow each of the skills measured are intended to illustrate how we
assess that skill. This list is not definitive or exhaustive.
NOTE: Most questions cover features that are General Availability (GA). The exam may contain
questions on Preview features, if those features are commonly used.
implement compression
implement partitioning
implement sharding
implement different table geometries with Azure Synapse Analytics pools
implement data redundancy
implement distributions
implement data archiving
develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure
Synapse Pipelines, PolyBase, and Azure Databricks
create data pipelines
design and implement incremental data loads
design and develop slowly changing dimensions
handle security and compliance requirements
scale resources
configure the batch size
design and create tests for data pipelines
integrate Jupyter/IPython notebooks into a data pipeline
handle duplicate data
handle missing data
handle late-arriving data
upsert data
regress to a previous state
design and configure exception handling
configure batch retention
design a batch processing solution
debug Spark jobs by using the Spark UI
develop a stream processing solution by using Stream Analytics, Azure Databricks, and
Azure Event Hubs
process data by using Spark structured streaming
monitor for performance and functional regressions
design and create windowed aggregates
handle schema drift
process time series data
process across partitions
process within one partition
configure checkpoints/watermarking during processing
scale resources
design and create tests for data pipelines
optimize pipelines for analytical or transactional purposes
handle interruptions
design and configure exception handling
upsert data
replay archived stream data
design a stream processing solution
trigger batches
handle failed batch loads
validate batch loads
manage data pipelines in Data Factory/Synapse Pipelines
schedule data pipelines in Data Factory/Synapse Pipelines
implement version control for pipeline artifacts
manage Spark jobs in a pipeline