Practice Assessment For Exam DP

Practice Assessment for Exam DP-203:
Data Engineering on Microsoft Azure

Question 1 of 50
Question: 1
You need to add permissions to an Azure Data Lake Storage Gen2 account that
allows assigning POSIX access controls.
Which role should you use?
Select only one answer.

Storage Blob Data Contributor
Storage Blob Data Delegator
Storage Blob Data Owner
Storage Blob Data Reader
Answer:
Storage Blob Data Contributor
Storage Blob Data Delegator
Storage Blob Data Owner
This answer is correct.
Storage Blob Data Reader
Storage Blob Data Owner allows for full access to Azure Blob storage containers and data,
including assigning POSIX access control.
Storage Blob Data Contributor allows for read, write, and delete access to Blob storage
containers and data.
Storage Blob Data Reader allows for read access to Blob storage containers and data.
Storage Blog Data Delegator allows for the generation of a user delegation key that can be
used to sign SAS tokens.
Azure built-in roles - Azure RBAC | Microsoft Learn
Secure your Azure Storage account - Training | Microsoft Learn
Question: 2
You need to grant an Azure Active Directory user access to write data to an Azure
Data Lake Storage Gen2 account.
Which security technology should you use to grant the access?

ACL
NTFS
OAuth 2.0 Bearer Tokens
RBAC
Answer:
ACL
NTFS
OAuth 2.0 Bearer Tokens
RBAC
Granting access to Data Lake Storage Gen2 is done through RBAC.

Explore Azure Data Lake Storage security features - Training | Microsoft Learn
Question: 3
You have an Azure subscription that contains the following resources:
 An Azure Synapse Analytics workspace named workspace1

 A virtual network named VNet1 that has two subnets named sn1 and sn2
 Five virtual machines that are connected to sn1
You need to ensure that the virtual machines can connect to workspace1. The
solution must prevent traffic from the virtual machines to workspace1 from
traversing the public internet.
What should you create?

a private endpoint from Azure Synapse Analytics in sn1
a service endpoint from Azure Synapse Analytics in sn1
Answer:
This answer is incorrect.
Private endpoints are created in a subnet that does not contain virtual machines. As s1 already
contains virtual machines, the private endpoint cannot be created in that subnet and must be
created in s2.
Service endpoints are unavailable for Azure Synapse Analytics.
Understand network security options for Azure Synapse Analytics - Training | Microsoft
Learn
Question: 4
You have an Azure Synapse Analytics workspace.
You need to measure the performance of SQL queries running on the dedicated SQL
pool.
Which two actions achieve the goal? Each correct answer presents a complete
solution
Select all answers that apply.

From the Monitor page of Azure Synapse Studio, review the Pipeline runs tab.
From the Monitor page of Azure Synapse Studio, review the SQL requests tab.
Query the sys.dm_pdw_exec_request view.
Query the sys.dm_pdw_exec_sessions view.
Answer:
Query the sys.dm_pdw_exec_request view.
Query the sys.dm_pdw_exec_sessions view.
You should open the Monitor page and review the SQL request tab where you will find all
the queries running on the dedicated SQL pools.
You should query the sys.dm_pdw_exec_requests dynamic management view, as it contains
information about the queries, including their duration.
The sys.dm_pdw_exec_sessions dynamic management view contains information about
connections to the database.
Opening the Monitor page and reviewing the Pipeline runs tab displays information about the
pipelines.
How to monitor SQL requests in Synapse Studio - Azure Synapse Analytics | Microsoft
Learn
Operationalize your Azure Data Factory or Azure Synapse Pipeline - Training | Microsoft
Learn
Question: 5
You need to configure the diagnostics settings for pipeline runs. You must retain the
data for auditing purposes indefinitely and minimize costs associated with retaining
the data.
Which destination should you use?

Archive to a storage account.
Send to a Log Analytics workspace.
Send to a partner solution.
Stream to an Azure event hub.
Answer:
Archive to a storage account.
Send to a Log Analytics workspace.
Send to a partner solution.
Stream to an Azure event hub.
You should choose to archive to a storage account as it is useful for audit, static analysis, or
backup. Compared to using Azure Monitor logs or a Log Analytics workspace, this storage is
less expensive, and logs can be kept there indefinitely.
You should not choose to stream to an event hub since the data can be sent to external
systems, such as third-party SIEMs and other Log Analytics solutions.
You should not choose to send the data to a Log Analytics workspace, as this option is used
to help you to integrate the data into queries, alerts, and visualizations with existing log data.
You should not send the data to a partner solution, as this is only useful when using a partner.
Diagnostic settings in Azure Monitor - Azure Monitor | Microsoft Learn
Learn
Question: 6
You need to monitor bottlenecks related to the SQL Server OS state on each node of
the dedicated SQL pools.
Which view should you use?

sys.dm_pdw_nodes
sys.dm_pdw_os_threads
sys.dm_pdw_wait_stats
sys.dm_pdw_waits
Answer:
sys.dm_pdw_nodes
sys.dm_pdw_os_threads
sys.dm_pdw_wait_stats
sys.dm_pdw_waits
You should use the sys.dm_pdw_wait_stats view, as it holds information related to the SQL
Server OS state related to instances running on the different nodes.
The sys.dm_pdw_waits view holds information about all wait stats encountered during the
execution of a request or query, including locks and waits on a transmission queue.
The sys.dm_pdw_nodes holds information about all the wait stats encountered during the
execution of a request or query, including locks and waits on a transmission queue.
The sys.dm_pdw_os_threads view holds information about the running threads. It can display
information about the current waiting type if the thread requires access to a particular
resource.
sys.dm_pdw_wait_stats (Transact-SQL) - SQL Server | Microsoft Learn
Learn
Question: 7
You have an Azure Data Factory named ADF1.
You need to ensure that you can analyze pipeline runtimes for ADF1 for the last 90
days.
What should you use?

Azure Data Factory
Azure Monitor
Azure Stream Analytics
Azure App Insights
Answer:
Azure Data Factory
Azure Monitor
Azure Stream Analytics
Azure App Insights
Data Factory only stores pipeline runtimes for 45 days. To view the data for a longer period,
that data must be sent to Azure Monitor, where the information can then be retrieved and
viewed.
Monitor Azure Data Factory pipelines - Training | Microsoft Learn
Question: 8
You have an Azure Data Factory pipeline named S3toDataLake1 that copies data
between Amazon S3 storage and Azure Data Lake storage.
You need to use the Azure Data Factory Pipeline runs dashboard to view the history
of runs over a specific time range and group them by tags for S3toDataLake1.
Which view should you use?

Activity
Debug
Gantt
List
Answer:
Activity
Debug
Gantt
List
Correct: Gantt view allows you to see all the pipeline runs grouped by name, annotation, or
tag created in the pipeline, and it also displays bars relative to how long the run took.
Question: 9
You need to review Data Factory pipeline runtimes for the last seven days. The
solution must provide a graphical view of the data.

the Dashboard view of the pipeline runs
the List view of the pipeline runs
the Gantt view of the pipeline runs
the Overview tab of Azure Data Factory Studio
Answer:
the Dashboard view of the pipeline runs
the List view of the pipeline runs
the Gantt view of the pipeline runs
the Overview tab of Azure Data Factory Studio
The Gantt view of the pipeline runs shows you a graphical view of the runtime data so that
you can see which pipelines are running at the same time, and which runs are running at
different times.
Question: 10
You have an Apache Spark pool in Azure Synapse Analytics.
You run a notebook that creates a DataFrame containing a large amount of data.
You need to preserve the DataFrame in memory.
Which two transformations can you use? Each correct answer presents a complete
solution.

`cache()`
`persist()`
`take()`
`write()`
Answer:
`cache()`
`persist()`
`take()`
`write()`
The cache() transformation preserves data in the memory. Caching is done when the next
operation, such as count() or take(), is triggered.
The persist() transformation preserves data in the memory. You can optionally specify a
storage option, such as MEMORY_ONLY or MEMORY_AND_DISK. Caching is done when the next
operation, such as count() or take(), is triggered.
The take() operation works on a cached and uncached DataFrame.
The write() operation works on a cached and uncached DataFrame.
Best practice for cache(), count(), and take() - Azure Databricks | Microsoft Learn
Use Apache Spark in Azure Databricks - Training | Microsoft Learn
Question: 11
You monitor an Apache Spark job that has been slower than usual during the last
two days. The job runs a single SQL statement in which two tables are joined.
You discover that one of the tables has significant data skew.
You need to improve job performance.
Which hint should you use in the query?

COALESCE
REBALANCE
`REPARTITION`
SKEW
Answer:
COALESCE
REBALANCE
`REPARTITION`
SKEW
You should use the SKEW hint in the query.
The COALESCE hint reduces the number of partitions to the specified number of partitions.
The REPARTITION hint is used to specify the number of partitions using the specified
partitioning expressions.
The REBALANCE hint can be used to rebalance the query result output partitions, so that every
partition is a reasonable size (not too small and not too big).
Skew join optimization - Azure Databricks | Microsoft Learn
Use Apache Spark in Azure Databricks - Training | Microsoft Learn
Question: 12
You have an Azure Databricks cluster that uses Databricks Runtime 10.1.
You need to automatically compact small files for creating new tables, so that the
target file size is appropriate to the use case.
What should you set?

`delta.autoOptimize.autoCompact = auto`
delta.autoOptimize.autoCompact = false
`delta.autoOptimize.autoCompact = legacy`
delta.autoOptimize.autoCompact = true
Answer:
`delta.autoOptimize.autoCompact = auto`
delta.autoOptimize.autoCompact = false
`delta.autoOptimize.autoCompact = legacy`
delta.autoOptimize.autoCompact = true
You should use delta.autoOptimize.autoCompact = auto because it compacts the files to
the size that is appropriate to the use case.
delta.autoOptimize.autoCompact = true and delta.autoOptimize.autoCompact =
legacy compact the files to 128 MB.
delta.autoOptimize.autoCompact = false disables automated file compaction.
Auto optimize on Azure Databricks - Azure Databricks | Microsoft Learn
Explore Azure Databricks - Training | Microsoft Learn
Question: 13
Users report that queries that have a label of ‘query1’ are slow to complete.
You need to identify all the queries that have a label of ‘query1’.
Which query should you run?

SELECT * FROM sys.dm_pdw_dms_workers WHERE label = 'query1'
SELECT * FROM sys.dm_pdw_exec_requests WHERE label = 'query1'
SELECT * FROM sys.dm_pdw_request_steps WHERE label = 'query1'
SELECT * FROM sys.dm_pdw_sql_requests WHERE label = 'query1'
Answer:
SELECT * FROM sys.dm_pdw_dms_workers WHERE label = 'query1'
SELECT * FROM sys.dm_pdw_exec_requests WHERE label = 'query1'
SELECT * FROM sys.dm_pdw_request_steps WHERE label = 'query1'
SELECT * FROM sys.dm_pdw_sql_requests WHERE label = 'query1'
Labels for queries are available from sys.dm_pdw_exec_requests. Once the request IDs for
the queries are identified, the request IDs can be used for the other dynamic management
views.
Use dynamic management views to identify and troubleshoot query performance - Training |
Microsoft Learn
Question: 14
You have an Azure Synapse Analytics workspace that includes an Azure Synapse
Analytics cluster named Cluster1.
You need to review the estimated execution plan for a query on a specific node of
Cluster1. The query has a spid of 94 and a distribution ID of 5.
Which command should you run?

`DBCC PDW_SHOWEXECUTIONPLAN (5, 94)`
DBCC SHOWEXECUTIONPLAN (5,94)
SELECT * FROM sys.dm_exec_query_plan WHERE spid = 94 AND distribution_id = 5
`SELECT * FROM sys.pdw_nodes_exec_query_plan WHERE spid = 94 AND
distribution_id = 5`
Answer:
`DBCC PDW_SHOWEXECUTIONPLAN (5, 94)`
DBCC SHOWEXECUTIONPLAN (5,94)
SELECT * FROM sys.dm_exec_query_plan WHERE spid = 94 AND distribution_id = 5
`SELECT * FROM sys.pdw_nodes_exec_query_plan WHERE spid = 94 AND
distribution_id = 5`
The execution plan for the specific distribution is available by busing the DBCC
PDW_SHOWEXECUTIONPLAN command.
Microsoft Learn
Question: 15
You have an Azure Synapse Analytics workspace that includes a table named Table1.
You are evaluating the use of a clustered columnstore index.
What is the minimum recommended number of rows for clustered columnstore

indexes?

600,000
6 million
60 million
600 million
Answer:
600,000
6 million
60 million
600 million
Clustered columnstore indexes work on segments of 1,048,576 rows. As Azure Synapse
Analytics has 60 nodes per distribution, the minimum recommended number of rows for a
clustered columnstore index is 60,000,000.
Use indexes to improve query performance - Training | Microsoft Learn
Question: 16
You need to build a materialized view.
Which two items should be included in the SELECT clause of the view? Each correct
answer presents part of the solution.

a subquery
an aggregate function
the GROUP BY clause
the HAVING clause
the ÒPTION` clause
Answer:
a subquery
an aggregate function
the GROUP BY clause
the HAVING clause
the ÒPTION` clause
When writing a materialized view in Azure Synapse Analytics, the SELECT clause of the query
must include at least one aggregate function as well as the corresponding GROUP BY clause of
the query.
A HAVING clause and an OPTION clause are both optional as is a subquery.
Improve query performance with materialized views - Training | Microsoft Learn
Question: 17
You have a job that aggregates data over a five-second tumbling window.
You are monitoring the job and notice that the SU (Memory) % Utilization metric is
more than 80 percent, and the Backlogged Input Events metric shows values greater
than 0.
What should you do to resolve the performance issue?

Change the compatibility level.
Change the tumbling window to a snapshot window.
Create a user-defined aggregate to perform the aggregation.
Increase the number of the Streaming Units (SU).
Answer:
Change the compatibility level.
Change the tumbling window to a snapshot window.
Create a user-defined aggregate to perform the aggregation.
Increase the number of the Streaming Units (SU).
You should increase the number of SUs because the job is running out of resources. When
the Backlogged Input Events metric is greater than zero, the job is not able to process all
incoming events.
You should not change the compatibility level, as this option is not responsible for faster
event processing.
You should not write a user aggregate, as it this is useful only if the aggregation function is
available in the SQL dialect of the query.
You should not change the tumbling window to a snapshot window, as this can lead to data
loss.
Azure Stream Analytics job metrics | Microsoft Learn
Get started with Azure Stream Analytics - Training | Microsoft Learn
Question: 18
You are developing an Azure app named App1 that will store job candidate data.
App1 will be deployed to three Azure regions and store a resume and five photos for
each candidate.
You need to design a partition solution for App1. The solution must meet the
following requirements:
 The time it takes to retrieve the resume files must be minimized.

 Candidate data must be stored in the same region as the candidate.
What should you include in the solution?

multiple storage accounts with a single container per account
multiple storage account with two containers per account
a single storage account with one container
a single storage account with two containers
Answer:
multiple storage accounts with a single container per account
multiple storage account with two containers per account
a single storage account with one container
a single storage account with two containers
A single storage account with one container stores all files in the same geography.
A single storage account with two containers provides a separation for resume and images but
stores all files in the same geography.
Multiple storage accounts with a single container each provide the ability to store data in the
same geography as the candidate but cannot quickly retrieve just resume files.
Multiple storage accounts with two containers each provides geography partitioning and a
separate container for resumes, allowing quicker retrieval of just resume files.
Storage account overview - Azure Storage | Microsoft Learn
Create an Azure Storage account - Training | Microsoft Learn
Question: 19
You have an Azure Synapse Analytics database named DB1.
You plan to import data into DB1.
You need to maximize the performance of the data import.
What should you implement?

functional partitioning on the source data
horizontal partitioning on the source data
table partitioning on the target database
vertical partitioning on the source data
Answer:
functional partitioning on the source data
horizontal partitioning on the source data
table partitioning on the target database
vertical partitioning on the source data
By using horizontal partitioning, you can improve the performance of the data load. As more
server resources and bandwidth are available to the source files, the import process gets
faster.
Data partitioning guidance - Azure Architecture Center | Microsoft Learn
Store application data with Azure Blob storage - Training | Microsoft Learn
Question: 20
You have an app named App1 that contains two datasets named dataset1 and
dataset2. App1 frequently queries dataset1. App1 infrequently queries dataset2.
You need to prevent queries to dataset2 from affecting the buffer pool and aging
out the data in dataset1.
Which type of partitioning should you use?

functional
horizontal
table
vertical
Answer:
functional
horizontal
table
vertical
By using vertical partitioning, different parts of the database can be isolated from each other
to improve cache use.
Choose a data storage approach in Azure - Training | Microsoft Learn
Question: 21
You have an Azure subscription that contains the following resources:
 An Azure Synapse Analytics workspace named app1-syn

 An Azure Data Lake Storage Gen2 account named app1synstg
 A file system named data in app1synstg
You upload a file named Data.parquet to app1synstg.
You need to query the first 100 rows of Data.parquet by using a SQL serverless pool.
Which query should you run?

SELECT TOP 100 * FROM OPENROWSET( BULK
'https://app1synstg.dfs.core.windows.net/data/NYCTripSmall.parquet', FORMAT =
'PARQUET' ) as result
'https://app1-syn.dfs.core.windows.net/data/NYCTripSmall.parquet', FORMAT =
'https://app1synstg.dfs.core.windows.net/data/NYCTripSmall.parquet', SINGLE_CLOB )
as result
'https://app1-syn.dfs.core.windows.net/data/NYCTripSmall.parquet', SINGLE_CLOB )
as result
Answer:
'https://app1synstg.dfs.core.windows.net/data/NYCTripSmall.parquet', FORMAT =
'https://app1-syn.dfs.core.windows.net/data/NYCTripSmall.parquet', FORMAT =
'https://app1synstg.dfs.core.windows.net/data/NYCTripSmall.parquet', SINGLE_CLOB )
as result
'https://app1-syn.dfs.core.windows.net/data/NYCTripSmall.parquet', SINGLE_CLOB )
as result
This item tests the candidate’s ability to explore data in Azure SQL Synapse. The correct
answer is the following query.
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://app1synstg.dfs.core.windows.net/data/NYCTripSmall.parquet',
FORMAT = 'PARQUET'
) as result
The remaining answers are incorrect as they reference the workspace name instead of the
storage account name and use CSV instead of Parquet.
Tutorial: Get started analyze data with a serverless SQL pool - Azure Synapse Analytics |
Microsoft Learn
Survey the Components of Azure Synapse Analytics - Training | Microsoft Learn
Question: 22
You are designing an Azure Synapse Analytics solution that will be used to analyze
patient outcome data from a hospital.
Which database template should you use?

Healthcare Insurance
Healthcare Provider
Life Insurance & Annuities
Pharmaceuticals
Answer:
Healthcare Insurance
Healthcare Provider
Life Insurance & Annuities
Pharmaceuticals
Healthcare Provider is used for companies that provide healthcare to others.
Healthcare Insurance is used for insurance companies that sell health insurance.
Life Insurance & Annuities is used for companies that sell life insurance.
Pharmaceuticals is used for companies that create pharmaceutical products.
Overview of Azure Synapse database templates - Azure Synapse Analytics | Microsoft Learn
Question: 23
You plan to deploy an Azure Synapse Analytics solution that will use the Retail
database template and include three tables from the Business Metrics category.
You need to create a one-to-many relationship from a table in Retail to a table in

Business Metrics.
What should you do first?

Create a database.
Publish the database.
Select the table in Business Metric.
Select the table in Retail.
Answer:
Create a database.
Publish the database.
Select the table in Business Metric.
Select the table in Retail.
You cannot add relationships until a database is created. You can only view relationships
before a database is created. You can only publish the database after the database has been
created.
Designing tables - Azure Synapse Analytics | Microsoft Learn
Many-to-many relationship guidance - Power BI | Microsoft Learn
Question: 24
You create a Microsoft Purview account and add an Azure SQL Database data source
that has data lineage scan enabled.
You assign a managed identity for the Microsoft Purview account and the db_owner
role for the database.
After scanning the data source, you are unable to obtain any lineage data for the
tables in the database.
You need to create lineage data for the tables.
What should you do?

Create a certificate in the database.
Create a master key in the database.
Use a user-managed service principal.
Use SQL authentication.
Answer:
Create a certificate in the database.
Create a master key in the database.
Use a user-managed service principal.
Use SQL authentication.
You need a master key in the Azure SQL database for lineage to work.
Using SQL authentication will just change the way data lineage scan enables Microsoft
Purview to authenticate to the data source.
Using a user-managed service principal just changes the way Microsoft Purview
authenticates to the data source. You do not need a certificate, but a master key.
Data lineage in Microsoft Purview - Microsoft Purview | Microsoft Learn
Question: 25
You have a data solution that includes an Azure SQL database named SQL1 and an
Azure Synapse database named SYN1. SQL1 contains a table named Table1. Data is
loaded from SQL1 to the SYN1.
You need to ensure that Table1 supports incremental loading.
What should you do?

Add a new column to track lineage in Table1.
Define a new foreign key in Table1.
Enable data classification in Microsoft Purview.
Enable data lineage in Microsoft Purview.
Answer:
Add a new column to track lineage in Table1.
Define a new foreign key in Table1.
Enable data classification in Microsoft Purview.
Enable data lineage in Microsoft Purview.
A new column of type date or int can be used to track lineage in a table and be used for
filtering during an incremental load.
Data lineage in Microsoft Purview cannot be used to assist an incremental load. It is just used
for tracing lineage.
Data classification cannot be used for incremental loading.
Foreign keys are used for relationship between tables, not lineage.
Automated enterprise BI - Azure Architecture Center | Microsoft Learn
Question: 26
You plan to implement a data storage solution for a healthcare provider.
You need to ensure that the solution follows industry best practices and is designed
in the minimum amount of time.

Azure Data Factory
Azure Quickstart guides
Azure Resource Manager (ARM) templates
Azure Synapse Analytics database templates
Answer:
Azure Data Factory
Azure Quickstart guides
Azure Resource Manager (ARM) templates
Azure Synapse Analytics database templates
Azure Synapse Analytics database templates give you a starting point with many of the
commonly needed tables and columns as part of the database design.
Overview of Azure Synapse database templates - Azure Synapse Analytics | Microsoft Learn
Question: 27
You create a data flow activity in an Azure Synapse Analytics pipeline.
You plan to use the data flow to read data from a fixed-length text file.
You need to create the columns from each line of the text file. The solution must
ensure that the data flow only writes three of the columns to a CSV file.
Which three types of tasks should you add to the data flow activity? Each correct
answer presents part of the solution.

aggregate
derived column
flatten
select
sink
Answer:
aggregate
derived column
flatten
select
sink
You need to use a derived column task to extract the columns from the line of text. Select
takes just the value of the three columns you want to write to the CSV file. You need a sink
to write the data to a CSV file.
There is no data to aggregate.
There is no need to flatten the data.
Process fixed-length text files with mapping data flows in Azure Data Factory - Azure Data
Factory | Microsoft Learn
Data integration with Azure Data Factory - Training | Microsoft Learn
Question: 28
You create a data flow activity in an Azure Data Factory pipeline.

You need to execute a select task and write the output to multiple columns across
multiple sinks.
Which type of task should you add from the select task?

conditional split
new branch
pivot
unpivot
Answer:
conditional split
new branch
pivot
unpivot
A new branch allows for the same data to flow to a different lane, and then you can select just
the columns you need for each branch to a new sink.
Conditional split creates a split based on a condition. You want to create a separation of
columns, not filter some rows. Unpivot creates new rows based on names of columns. Pivot
creates new columns based on values of rows.
Branching and chaining activities in a pipeline using Azure portal - Azure Data Factory |
Microsoft Learn
Question: 29
You have source data that contains an array of JSON objects. Each JSON object has a
child array of JSON objects.
You need to transform the source so that it can be written to an Azure SQL Database
table where each row represents an element of the child array, along with the values
of its parent element.
Which type of task should you add to the data flow activity?

flatten
parse
pivot
unpivot
Answer:
flatten
parse
pivot
unpivot
Flatten flattens JSON arrays.
Parse parses data.
Unpivot creates new rows based on the names of columns.
Pivot creates new columns based on the values of rows.
Flatten transformation in mapping data flow - Azure Data Factory & Azure Synapse |
Microsoft Learn
Question: 30
You have an Azure Data Factory pipeline that uses Apache Spark to transform data.
You need to run the pipeline.
Which PowerShell cmdlet should you run?

Ìnvoke-AzDataFactoryV2Pipeline`
Invoke-AzureDataFactoryV2Pipeline
Start-AzDataFactoryV2Pipeline
Start-AzureDataFactoryV2Pipeline
Answer:
Ìnvoke-AzDataFactoryV2Pipeline`
Invoke-AzureDataFactoryV2Pipeline
Start-AzDataFactoryV2Pipeline
Start-AzureDataFactoryV2Pipeline
The Invoke-AzDataFactoryV2Pipeline cmdlet is used to start a Data Factory pipeline.
Transform data using Hive in Azure Virtual Network - Azure Data Factory | Microsoft Learn
Question: 31
You have an Azure subscription that contains an Azure Synapse Analytics Dedicated
SQL pool named Pool1. Pool1 hosts a table named Table1.
You receive JSON data from an external data source.
You need to store the external data in Table1.

Which T-SQL element should you use?

ConvertFrom-Json
FROM JSON
FOR JSON
OPENJSON
Answer:
ConvertFrom-Json
FROM JSON
FOR JSON
OPENJSON
The OPENJSON command converts a JSON document into table format.
Parse and Transform JSON Data with OPENJSON - SQL Server | Microsoft Learn
Question: 32
You have an Azure Synapse Analytics workspace named workspace1.
You plan to write new data and update existing rows in workspace1.
You create an Azure Synapse Analytics sink to write the processed data to
workspace1.
You need to configure the writeBehavior parameter for the sink. The solution must
minimize the number of pipelines required.

Change
Insert
Update
Upsert
Answer:
Change
Insert
Update
Upsert
Updates and inserts can be done with a single pipeline by using the Upsert syntax.
Copy and transform data in Azure Synapse Analytics - Azure Data Factory & Azure Synapse
| Microsoft Learn
Question: 33
You have an Azure subscription that contains an Azure Synapse Analytics workspace.
You use the workspace to perform ELT activities that can take up to 30 minutes to
complete.
You develop an Azure function to stop the compute resources used by Azure
Synapse Analytics during periods of zero activity.
You notice that it can take more than 20 minutes for the compute resources to stop.
You need to minimize the time it takes to stop the compute resources. The solution
must minimize the impact on running transactions.
How should you change the function?

Add a timer to wait 20 minutes before stopping the compute resources.
Check the `sys.dm_operation_status` dynamic management view until no transactions
are active in the database before stopping the compute resources.
Close all connections to the database before stopping the compute resources.
Set the database to `READ_ONLY` before stopping the compute resources.
Answer:
Add a timer to wait 20 minutes before stopping the compute resources.
Check the `sys.dm_operation_status` dynamic management view until no transactions
are active in the database before stopping the compute resources.
Close all connections to the database before stopping the compute resources.
Set the database to `READ_ONLY` before stopping the compute resources.
Checking the sys.dm_operation_status dynamic management view until no transactions are
active in the database before stopping the compute resources ensures that any running
transaction will finish before stopping the computer nodes. If you stop the node while a
transaction is running, the transaction will be rolled back, which can take time to occur.
Adding a timer to wait 20 minutes before stopping the compute resources might still incur
time to shut down if a new transaction starts during the wait time.
Closing all connections to the database before stopping the compute resources will start a
rollback and take time.
Setting the database to READ_ONLY before stopping the compute resources will start a rollback
and take time.
Quickstart: Scale compute in dedicated SQL pool (formerly SQL DW) - T-SQL - Azure
Synapse Analytics | Microsoft Learn
Manage compute resource for for dedicated SQL pool (formerly SQL DW) - Azure Synapse
Analytics | Microsoft Learn
Optimize data warehouse query performance in Azure Synapse Analytics - Training |
Microsoft Learn
Question: 34
You are developing an Azure Databricks solution.
You need to ensure that workloads support PyTorch code. The solution must
minimize costs.
Which workload persona should you use?

Data Science and Engineering
Machine Learning
SQL
Answer:
Data Science and Engineering
Machine Learning
SQL
The PyTorch support is only available in the Machine Learning persona.
Identify Azure Databricks workloads - Training | Microsoft Learn
Question: 35
You have an Azure Databricks cluster.
You need to stage files into the shared cluster storage by using a third-party tool.
Which file system should the tool support?

DBFS
NTFS
ReFS
OraFS
Answer:
DBFS
NTFS
ReFS
OraFS
Databricks shared storage, which all the nodes of the cluster can access, is built and formatted
by using DBFS.
Understand key concepts - Training | Microsoft Learn
Question: 36
You have an Azure Stream Analytics solution that receives data from multiple
thermostats in a building.
You need to write a query that returns the average temperature per device every five
minutes for readings within that same five minute period.
Which two windowing functions could you use?

HoppingWindow
SessionWindow
`SlidingWindow`
TumblingWindow
Answer:
HoppingWindow
SessionWindow
`SlidingWindow`
TumblingWindow
Tumbling windows have a defined period and can aggregate all events for that same time
period. A tumbling window is essentially a specific case of a hopping window where the time
period and the event aggregation period are the same.
Hopping windows have a defined period and can aggregate the events for a potentially
different time period
Sliding windows are used to create aggregations for so many events, not at identical
timelapses.
Snapshot windows aggregate all events with the same timestamp.
Introduction to Azure Stream Analytics windowing functions | Microsoft Learn
Implement a Data Streaming Solution with Azure Stream Analytics - Training | Microsoft
Learn
Question: 37
You create an Azure Stream Analytics job. You run the job for five hours.
You review the logs and notice multiple instances of the following message.
{"message Time":"2019-02-04 17:11:52Z","error":null, "message":"First Occurred:

02/04/2019 17:11:48 | Resource Name: ASAjob | Message: Source 'ASAjob' had 24 data
errors of kind 'LateInputEvent' between processing times '2019-02-
04T17:10:49.7250696Z' and '2019-02-04T17:11:48.7563961Z'. Input event with
application timestamp '2019-02-04T17:05:51.6050000' and arrival time '2019-02-
04T17:10:44.3090000' was sent later than configured
tolerance.","type":"DiagnosticMessage","correlation ID":"49efa148-4asd-4fe0-869d-
a40ba4d7ef3b"}
You need to ensure that these events are not dropped.
What should you do?

Decrease the number of Streaming Units (SUs) to 3.
Increase the number of Streaming Unit (SUs) for the job to 12.
Increase the tolerance for late arrivals.
Increase the tolerance for out-of-order events.
Answer:
Decrease the number of Streaming Units (SUs) to 3.
Increase the number of Streaming Unit (SUs) for the job to 12.
Increase the tolerance for late arrivals.
Increase the tolerance for out-of-order events.
Increasing the tolerance for late arrivals ensures that late arrivals are not dropped.
The error is about late arrivals, not out-of-order events.
Increasing the number of SUs to 12 will not change how late arrivals are handled.
Decreasing the number of SUs to 3 will not change how late arrivals are handled.
Configuring event ordering policies for Azure Stream Analytics | Microsoft Learn
Question: 38
You have an Azure Stream Analytics job named Job1.
Job1 runs continuously and executes non-parallelized queries.
You need to minimize the impact of Azure node updates on Job1. The solution must
minimize costs.
To what should you increase the Scale Units (SUs)?

2
3
6
12
Answer:
2
3
6
12
Increasing the SUs to 12 still uses two nodes.
The other options still use a single node that will stop for maintenance.
Avoid service interruptions in Azure Stream Analytics jobs | Microsoft Learn
Question: 39
Which Azure Data Factory components should you use to connect to a data source?

a dataset
a linked service
a pipeline
an activity
an aggregation
Answer:
a dataset
a linked service
a pipeline
an activity
an aggregation
Linked services allow you to connect to your data source.
Datasets are the databases that are available via the linked service.
Activities contain the transformations or analysis of data factories.
Pipelines are groups of activities.
Understand Azure Data Factory components - Training | Microsoft Learn
Question: 40
You have an Azure Data Factory pipeline named Pipeline1. Pipeline1 executes many
API write operations every time it runs. Pipeline1 is scheduled to run every five
minutes.
After executing Pipeline1 10 times, you notice the following entry in the logs.
Type=Microsoft.DataTransfer.Execution.Core.ExecutionException,Message=There are
substantial concurrent MappingDataflow executions which is causing failures due to
throttling under Integration Runtime 'AutoResolveIntegrationRuntime'.
You need to ensure that you can run Pipeline1 every five minutes.
What should you do?

Change the compute size to large.
Create a new integration runtime and a new Pipeline as a copy of Pipeline1.
Configure both pipelines to run every 10 minutes, five minutes apart.
Create a second trigger and set each trigger to run every 10 minutes, five minutes
apart.
Create another pipeline in the data factory and schedule each pipeline to run every
10 minutes, five minutes apart.
Answer:
Change the compute size to large.
Create a new integration runtime and a new Pipeline as a copy of Pipeline1.
Configure both pipelines to run every 10 minutes, five minutes apart.
Create a second trigger and set each trigger to run every 10 minutes, five minutes
apart.
Create another pipeline in the data factory and schedule each pipeline to run every
10 minutes, five minutes apart.
There is a limit of simultaneous pipelines in an integration runtime. You need to split the
pipeline to run into multiple runtimes.
Compute size will not affect integration runtime limits.
Creating another pipeline in the data factory and scheduling each pipeline to run every 10
minutes, five minutes apart, will cause the same limitation.
Creating a second trigger and setting each trigger to run every 10 minutes, five minutes apart,
still uses the same integration runtime.
Troubleshoot pipeline orchestration and triggers in Azure Data Factory - Azure Data Factory |
Microsoft Learn
Learn
Question: 41
You have an Azure Data Factory named datafactory1.
You configure datafacotry1 to use Git for source control.
You make changes to an existing pipeline.

When you try to publish the changes, you notice the following message displayed
when you hover over the Publish All button.
Publish from ADF Studio is disabled to avoid overwriting automated deployments. If

required you can change publish setting in Git configuration.
You need to allow publishing from the portal.
What should you do?

Change the Automated publish config setting.
Select **Override live mode** in the Git Configuration.
Use a Git client to merge the collaboration branch into the live branch.
Use the browser to create a pull request.
Answer:
Change the Automated publish config setting.
Select **Override live mode** in the Git Configuration.
Use a Git client to merge the collaboration branch into the live branch.
Use the browser to create a pull request.
Changing the Automated publish config setting defaults to Disable publish when Git is
configured.
Selecting Override live mode in the Git configuration copies the data from the collaboration
branch to the live branch.
Using a Git client to merge the collaboration branch into the live branch does the same thing
as Override live mode.
Using the browser to create a pull request creates a pull request that must be approved, but
still does not publish from Data Factory.
Source control - Azure Data Factory | Microsoft Learn
Learn
Question: 42
You have an Azure Data Factory pipeline named Pipeline1. Pipeline1 includes a data
flow activity named Dataflow1. Dataflow1 uses a source named source1. Source1
contains 1.5 million rows.
Dataflow1 takes 20 minutes to complete.
You need to debug Pipeline1. The solution must reduce the number of rows that flow
through the activities in Dataflow1.
What should you do?

Create a new integration runtime for Pipeline1.
Enable sampling in source1.
Enable staging in Pipeline1.
Set the Filter by last modified setting in source1.
Answer:
Create a new integration runtime for Pipeline1.
Enable sampling in source1.
Enable staging in Pipeline1.
Set the Filter by last modified setting in source1.
Enabling sampling in source1 allows you to specify how many rows to retrieve.
Enabling staging in Pipeline1 just uses a staging area in Azure Synapse.
Creating a new integration runtime for Pipeline1 can increase performance, but it will not
reduce the number of rows retrieved from source1.
Setting the Filter by last modified setting in source1 filters files by date.
Mapping data flow Debug Mode - Azure Data Factory & Azure Synapse | Microsoft Learn
Learn
Question: 43
You are creating an Azure Data Factory pipeline.
You need to store the passwords used to connect to resources.
Where should you store the passwords?

Azure Key Vault
Azure Repos
Azure SQL Database
Data Factory
Answer:
Azure Key Vault
Azure Repos
Azure SQL Database
Data Factory
Passwords for resources are not stored in the Data Factory pipeline. It is recommended that
the passwords be stored in Key Vault so they can be stored securely.
Manage source control of Azure Data Factory solutions - Training | Microsoft Learn
Question: 44
You are configuring Azure Data Factory to be used in a CI/CD deployment process.
You need to minimize the administrative tasks required by using global parameters.
Which global parameters should you configure?

execution schedule
server names within a connection object
sink task name
target database version number
Answer:
execution schedule
server names within a connection object
sink task name
target database version number
Within a CI/CD pipeline, global parameters can be used to configure server names. This
allows for the names to be changed once within the parameter definition, and those changes
will be used all through the pipeline.
Global parameters cannot be used to set the execution schedule of the pipeline and cannot set
the target database version number. Changing the sink task name will not reduce
administrative burden when deploying the pipeline.
Add parameters to data factory components - Training | Microsoft Learn
Question: 45
Your company has a branch office that contains a point of sale (POS) system.
You have an Azure subscription that contains a Microsoft SQL Server database
named DB1 and an Azure Synapse Analytics workspace.
You plan to use an Azure Synapse pipeline to copy CSV files from the branch office,
perform complex transformations on their content, and then load them to DB1.
You need to pass a subset of data to test whether the CSV columns are mapped
correctly.
What can you use to perform the test?

Data Flow Debug
datasets
integration runtime
linked service
Answer:
Data Flow Debug
datasets
integration runtime
linked service
Correct: The Data Flow Debug option is available inside of a data flow activity and allows
you to pass a subset of data through the flow, which can be useful to test whether columns are
mapped correctly.
Incorrect: Integration runtime is a pipeline concept referring to the compute resources
required to execute the pipeline.
Incorrect: A linked service is required when an activity needs or depends on an external
service.
Incorrect: Datasets refers to the specific data consumed and produced by activities in a
pipeline.
Build a data pipeline in Azure Synapse Analytics - Training | Microsoft Learn
Question: 46
You have an Azure Synapse Analytics data pipeline.
You need to run the pipeline at scheduled intervals.
What should you configure?

a control flow
a sink
a trigger
an activity
Answer:
a control flow
a sink
a trigger
an activity
A trigger is needed to initiate a pipeline run. Control flow is an activity that implements
processing logic. Activities are tasks within a pipeline that cannot trigger the pipeline. A sink
represents a target in a data flow but does not provide trigger capability.
Build a data pipeline in Azure Synapse Analytics - Training | Microsoft Learn
Run a pipeline - Training | Microsoft Learn
Question: 47
You are developing an Apache Spark pipeline to transform data from a source to a
target.
You need to filter the data in a column named Category where the category is cars.

df.select("ProductName", "ListPrice").where((df["Category"] == "Cars")
df.select("ProductName", "ListPrice") | where((df["Category"] == "Cars")
df.select("ProductName", "ListPrice").where((df["Category"] -eq "Cars")
df.select("ProductName", "ListPrice") | where((df["Category"] -eq "Cars")
Answer:
df.select("ProductName", "ListPrice").where((df["Category"] == "Cars")
df.select("ProductName", "ListPrice") | where((df["Category"] == "Cars")
df.select("ProductName", "ListPrice").where((df["Category"] -eq "Cars")
df.select("ProductName", "ListPrice") | where((df["Category"] -eq "Cars")
The correct format of the where statement is putting .where after the select statement.
Analyze data with Spark – Training | Microsoft Learn
Question: 48
You have a database named DB1 and a data warehouse named DW1.
You need to ensure that all changes to DB1 are stored in DW1. The solution must
meet the following requirements:
 Identify each row that has changed.

 Minimize the performance impact on the source system.
What should you include in the solution?

change data capture
change tracking
merge replication
snapshot replication
Answer:
change data capture
change tracking
merge replication
snapshot replication
Change tracking captures the fact that a row was changed without tracking the data that was
changed. Change tracking requires less server resources than change data capture.
Perform an Incremental Load of Multiple Tables - SQL Server Integration Services (SSIS) |
Microsoft Learn
Use data loading best practices in Azure Synapse Analytics - Training | Microsoft Learn
Question: 49
You use an Azure Databricks pipeline to process a stateful streaming operation.
You need to reduce the amount of state data to improve latency during a long-
running steaming operation.
What should you use in the streaming DataFrame?

a partition
a tumbling window
a watermark
RocksDB state management
Answer:
a partition
a tumbling window
a watermark
RocksDB state management
Watermarks interact with output modes to control when data is written to the sink. Because
watermarks reduce the total amount of state information to be processed, effective use of
watermarks is essential for efficient stateful streaming throughput. Partitions are useful to
improve performance but not to reduce state information. A tumbling window segments a
data stream into a contiguous series of fixed-size time segments. RocksDB state management
helps debug job slowness.
Learn
Apply watermarks to control data processing thresholds - Azure Databricks | Microsoft Learn
Question: 50
You are building an Azure Stream Analytics pipeline.
You need to configure the pipeline to analyze events that occur during a five-minute
window after an event fires.
Which windowing function should you use?

HoppingWindow
SessionWindow
SlidingWindow
TumblingWindow
Answer:
HoppingWindow
SessionWindow
SlidingWindow
TumblingWindow
Sliding windows generate events for points in time during which the contents of the window
changed. To limit the number of windows it needs to consider, Stream Analytics outputs
events for only those points in time when an event entered or exited the window. As such,
every window contains a minimum of one event. Events in sliding windows can belong to
more than one sliding window, such as hopping windows.
Learn
Question: 51
You have 100 retail stores distributed across Asia, Europe, and North America.
You are developing an analytical workload that contains sales data for stores in
different regions. The workload contains a fact table with the following columns:
 Date: Contains the order date

 Customer: Contains the customer ID
 Store: Contains the store ID
 Region Contains the region ID
 Product: Contains the product ID
 Price: Contains the unit price per product
 Quantity: Contains the quantity sold
 Amount: Contains the price multiplied by quantity
You need to design a partition solution for the fact table. The solution must meet the
following requirements:
 Optimize read performance when querying sales data for a single region
in a given month.
 Optimize read performance when querying sales data for all regions in a
given month.
 Minimize the number of partitions.
Which column should you use for partitioning?

Date partitioned by month
Product
Region
Store
Date partitioned by month
Product
Region
Store
Product ensures parallelism when querying data from a given month within the same region,
or multiple regions.
Using date and partitioning by month, all sales for a month will be in the same partition, not
providing parallelism.
All sales for a given region will be in the same partition, not providing parallelism.
Since a store is in a single region, it will still not provide parallelism for the same region.
Use query parallelization and scale in Azure Stream Analytics | Microsoft Learn
Question: 52
You are evaluating the use of Azure Data Lake Storage Gen2.
What should you consider when choosing a partitioning strategy?

access policies
data residency
file size
geo-replication requirements
Question: 53
You have an Azure Synapse Analytics database named DB1.
You need to import data into DB1. The solution must minimize Azure Data Lake
Storage transaction costs.
Which design pattern should you use?

Store the data in 500-MB files.
Store the data in 2,000-byte files.
Use a read-access geo-redundant storage (RA-GRS) storage account.
Use the Avro file format.
Use the ORC file format.
Store the data in 500-MB files.
Store the data in 2,000-byte files.
Use a read-access geo-redundant storage (RA-GRS) storage account.
Use the Avro file format.
Use the ORC file format.
By using larger files when importing data, transaction costs can be reduced. This is because
the reading of files is billed with a 4-MB operation, even if the file is less than 4 MB. To
reduce costs, the entire 4 MB should be used per read.
Best practices for using Azure Data Lake Storage Gen2 - Azure Storage | Microsoft Learn
Store application data with Azure Blob storage - Training | Microsoft Learn
Question: 54
You are designing a database solution that will host data for multiple business units.
You need to ensure that queries from one business unit do not affect the other
business units.
Which type of partitioning should you use?

functional
horizontal
table
vertical
functional
horizontal
table
vertical
By using functional partitioning, different users of the database can be isolated from each
other to ensure that one business unit does not affect another business unit.
Question: 55
You have an Azure subscription that contains a Delta Lake solution. The solution
contains a table named employees.
You need to view the contents of the employees table from 24 hours ago. You must
minimize the time it takes to retrieve the data.
What should you do?

Query the table by using TIMESTAMP AS OF.
Query the table by using VERSION AS OF.
Restore the database from a backup and query the table.
Restore the table from a backup and query the table.
Query the table by using TIMESTAMP AS OF.
Query the table by using VERSION AS OF.
Restore the database from a backup and query the table.
Restore the table from a backup and query the table.
Querying the table by using TIMESTAMP AS OF is the correct way to view historic data in a
delta lake.
Restoring the database from a backup and querying the table requires a backup and takes
time.
Restoring the table from a backup and querying the table requires a backup and takes time.
Querying the table by using VERSION AS OF is incorrect. You still need to query the system to
see which version you are interested in, which takes more time than using TIMESTAMP AS OF.
What is Delta Lake? - Azure Databricks | Microsoft Learn
Use Delta Lake in Azure Databricks - Training | Microsoft Learn
Question: 56
You design an Azure Data Factory pipeline that has a data flow activity named Move
to Synapse and an append variable activity named Upon failure. Upon failure runs
upon the failure of Move to Synapse.
You notice that if the Move to Synapse activity fails, the pipeline status is successful.
You need to ensure that if Move to Synapse fails, the pipeline status is failed. The
solution must ensure that Upon Failure executes when Move to Synapse fails.
What should you do?
Add a new activity with a Failure predecessor to Upon Failure.

Add a new activity with a Success predecessor to Move to Synapse.
Change the precedence for Upon Failure to Completion.
Change the precedence for Upon Failure to Success.
Add a new activity with a Failure predecessor to Upon Failure.
Add a new activity with a Success predecessor to Move to Synapse.
Change the precedence for Upon Failure to Completion.
Change the precedence for Upon Failure to Success.
Adding a new activity with a Success predecessor to Move to Synapse will ensure that the
pipeline is marked as failed when the data flow fails.
Changing the precedence for Upon Failure to Completion will still succeed whether Move to
Synapse fails.
Changing the precedence for Upon Failure to Success will not trigger Upon Failure when the
data flow fails.
Adding a new activity with a Failure predecessor to Upon Failure will not change the result.
Pipeline failure and error message - Azure Data Factory | Microsoft Learn
Orchestrating data movement and transformation in Azure Data Factory - Training |
Microsoft Learn
Question: 57
You have a Delta Lake solution that contains a table named table1.
You need to roll back the contents of table1 to 24 hours ago.

ALTER TABLE employee
COPY INTO employee1
RESTORE TABLE employee TO TIMESTAMP AS OF current_timestamp() - INTERVAL '24'
HOUR;
VACUUM employee RETAIN 24;
ALTER TABLE employee
COPY INTO employee1

HOUR;

VACUUM employee RETAIN 24;

HOUR; restores the table to 24 hours ago.
VACUUM employee RETAIN 24; moves unused files from the delta folder.
COPY INTO employee1 copies data to a new table.
RESTORE - Azure Databricks - Databricks SQL | Microsoft Learn
Use Delta Lake in Azure Databricks - Training | Microsoft Learn
Question: 58
You are testing a change to an Azure Data Factory pipeline.
You need to check the change into source control without affecting other users’ work
in the data factory.
What should you do?

Save the change to a forked branch in the source control project.
Save the change to the master branch of the source control project.
Save the changed pipeline to your workstation.
Save the change to a forked branch in the source control project.
Save the change to the master branch of the source control project.
Save the changed pipeline to your workstation.
To save changes to source control without affecting other users’ ability to use the pipeline
requires creating a branch of the project in source control.
Manage source control of Azure Data Factory solutions - Training | Microsoft Learn
Question: 59
You need to implement encryption at rest by using transparent data encryption

(TDE).
You implement a master key.
What should you do next?

Back up the master database.
Create a certificate that is protected by the master key.
Create a database encryption key.
Turn on the database encryption process.
Back up the master database.
Create a certificate that is protected by the master key.
Create a database encryption key.
Turn on the database encryption process.
You need to create a certificated that is protected by the master key. Having this certificate,
you can then create a database encryption key. Creating a database encryption key can be
done once there is a certificate created in the master database. You can start the database
encryption only when you have a database encryption key.
You do not need to back up the master database, only backup of the master key is required,
but this can be done anytime.
Transparent data encryption (TDE) - SQL Server | Microsoft Learn
Question: 60
You are implementing an application that queries a table named Purchase in an
Azure Synapse Analytics Dedicated SQL pool.
The application must show data only for the currently signed-in user.
You use row-level security (RLS), implement a security policy, and implement a
function that uses a filter predicate.
Users in the marketing department report that they cannot see their data.
What should you do to ensure that the marketing department users can see their
data?

Grant the SELECT permission on the function to the Marketing users.
Grant the SELECT permission on the Purchase table to the Marketing users.
Implement a blocking predicate.
Rebuild the function with `SCHEMABINDING=OFF`.
Grant the SELECT permission on the function to the Marketing users.
Grant the SELECT permission on the Purchase table to the Marketing users.
Implement a blocking predicate.
Rebuild the function with `SCHEMABINDING=OFF`.
The SELECT permission on the Purchase table is needed to query the data.
Adding the blocking predicate will not help because this predicate is responsible for blocking
write operations. Granting the SELECT permission to the function is not a solution because
access to the table is required, and the function is responsible for filtering rows. Rebuilding
the function with SCHEMABINDING=OFF is not a solution because access to the table is required,
and the function is responsible for filtering rows.
Row-Level Security - SQL Server | Microsoft Learn
Question: 61
You configure ADF1 to send data to Log Analytics in Azure-Diagnostics mode.
You need to review the data.

Which table should you query?

ADFActivityRun
ADFPipelineRun
ADFSSISIntegrationRuntimeLogs
ADFSSISPackageExecutableStatistics
AzureDiagnostics
ADFActivityRun
ADFPipelineRun
ADFSSISIntegrationRuntimeLogs
ADFSSISPackageExecutableStatistics
AzureDiagnostics
When Data Factory is configured to send logging data to Log Analytics and is in Azure-
Diagnostics mode, the data will be sent to the AzureDiagnostics table in Log Analytics.
The ADFActivityRun, ADFPipelineRun, ADFSSISIntegrationRuntimeLogs,
and ADFSSISPackageExecutableStatistics tables are used when the Data Factory is in
Resource-Specific mode.
Question: 62
You need to store information about failed Azure Data Factory pipelines for three
months.
Which three actions should you perform? Each correct answer presents part of the
solution.

Add diagnostic settings and add Azure Event Hubs as a target.
Add diagnostic settings and add Log Analytics as a target.
Create a Log Analytics workspace.
Create a storage account that has a lifecycle policy.
Add diagnostic settings and add Azure Event Hubs as a target.
Add diagnostic settings and add Log Analytics as a target.
Create a Log Analytics workspace.

Create a storage account that has a lifecycle policy.
A Data Factory pipeline stores monitoring data for 45 days. To keep data for longer, you need
to create a Log Analytical workspace and send the data to an Azure Storage account.
Configure diagnostic settings and a workspace - Azure Data Factory | Microsoft Learn
Learn
Question: 63
You need to identify running sessions in the workspace.
Which dynamics management view should you use?

sys.dm_exec_requests
sys.dm_exec_sessions
sys.dm_pdw_exec_requests

sys.dm_pdw_exec_sessions
sys.dm_pdw_exec_sessions shows the status of the sessions, not the running requests.
sys.dm_pdw_exec_requests shows the requests that are in process, completed, failed, or
closed.
and sys.dm_exec_sessions are used by Microsoft SQL Server.
sys.dm_exec_requests
Microsoft Learn
Question: 64
You have an Azure subscription that contains an Azure SQL database named DB1.
You need to implement row-level security (RLS) for DB1. The solution must block
users from updating rows with values that violate RLS.
Which block predicate should you use?

AFTER INSERT
AFTER UPDATE
BEFORE DELETE
BEFORE UPDATE
AFTER INSERT
AFTER UPDATE

BEFORE DELETE
BEFORE UPDATE
AFTER UPDATE prevents users from updating rows to values that violate the predicate.
AFTER INSERT prevents users from inserting rows that violate the predicate. BEFORE
UPDATE prevents users from updating rows that currently violate the predicate. Blocks
delete operations if the row violates the predicate
Implement Row Level security - Training | Microsoft Learn
Question: 65
Your source data contains normalized data that represents items sold.
You need to denormalize the source data. The solution must substitute product IDs
for the actual product name before writing data to a sink.
Which type of task should you add to the data flow activity?

aggregate
derived column
flatten
lookup
aggregate
derived column
flatten
lookup

Lookup allows you to take a value and search for it in a different source to retrieve another
value or set of values. Derived column can create a column based on the input, but not get a
new value from a different data source. There is no data to aggregate, and there is no need to
flatten data.
Lookup transformations in mapping data flow - Azure Data Factory & Azure Synapse |
Microsoft Learn
Question: 66
You design an Azure Data Factory data flow activity to move large amounts of data
from text files to an Azure Synapse Analytics database. You add a data flow script to
your data flow. The data flow in the designer has the following tasks:
 distinctRows1: Aggregate data by using myCols that produce columns.

 source1: Import data from DelimitedText1.
 derivedColumn1: Create and update the C1 columns.
 select1: Rename derivedColumn1 as select1 with columns C1.
 sink1: Add a sink dataset.
You need to ensure that all the rows in source1 are deduplicated.
What should you do?

Change the incoming stream for derivedColumn1 to distinctRows1.
Change the incoming stream for distinctRows1 to source1.
Create a new aggregate task after source1 and copy the script to the aggregate task.
Create a new flowlet task after source1.
Change the incoming stream for derivedColumn1 to distinctRows1.
Change the incoming stream for distinctRows1 to source1.
Create a new aggregate task after source1 and copy the script to the aggregate task.
Create a new flowlet task after source1.
Changing the incoming stream for distinctRows1 to source1 will move the dedupe script right
after source1, and only retrieve distinct rows.
Creating a new aggregate task after source1 and copying the script to the aggregate task will
not work, and cause errors in the flow.
Changing the incoming stream for derivedColumn1 to distinctRows1 will break the flow as
there will be no data coming into distinctRows1.
Creating a new flowlet task after source1 adds a subflow to the task.
Dedupe rows and find nulls by using data flow snippets - Azure Data Factory | Microsoft
Learn
Orchestrating data movement and transformation in Azure Data Factory - Training |
Microsoft Learn
Question: 67
You have an Azure Data Factory pipeline named Pipeline1.
You need to ensure that Pipeline1 runs when an email is received.
What should you use to create the trigger?

an Azure logic app
the Azure Synapse Analytics pipeline designer
the Data Factory pipeline designer
an Azure logic app
the Azure Synapse Analytics pipeline designer
the Data Factory pipeline designer
A logic app can be triggered by an email, and then run a pipeline.

Only timer, event hub, and storage triggers can be added from the designer.
Pipeline execution and triggers - Azure Data Factory & Azure Synapse | Microsoft Learn
Question: 68
You have an Azure Storage account named account1.
You need to ensure that requests to account1 can only be made from specific
domains.
What should you configure?

blob public access
CDN
CORS
secure transfer
blob public access
CDN
CORS
secure transfer
By using CORS, you can specify which domains a web request is allowed to respond to. If
the domain is not listed as an approved domain, the request will be rejected.
Explore Azure Storage security features - Training | Microsoft Learn
Question: 69
You have a pipeline in an Azure Synapse Analytics workspace. The pipeline runs a
stored procedure against the dedicated SQL pool.
The pipeline throws errors occasionally.
You need to check the error information by using the minimum amount of
administrative effort.
What should you do?

Configure diagnostic settings in the workspace.
Configure the Activity run ended metric in the workspace.
Configure diagnostic settings in the workspace.
Configure the Activity run ended metric in the workspace.
You should open the Monitor page and review the Pipeline runs tab as the status information
is displayed on this tab.
The SQL request tab contains information that only relates to SQL requests, and there is no
information about the pipeline.
The Activity run ended metric in the workspace does not provide information about the status
of pipeline runs. It presents aggregated information about the number of runs over time.
You would need to implement KQL queries to retrieve the information in the workspace by
using the diagnostic settings.
How to monitor Synapse Analytics using Azure Monitor - Azure Synapse Analytics |
Microsoft Learn
Learn
Question: 70
You have an Azure Data Lake Storage Gen2 account.
You grant developers Read and Write permissions by using ACLs to the files in the
path \root\input\cleaned.
The developers report that they cannot open the files.
How should you modify the permissions to ensure that the developers can open the
files?
Add Contributor permission to the developers.
Add Execute permissions to the files.
Grant Execute permissions to all folders.
Grant Execute permissions to the root folder only.
If you are granting permissions by using only ACLs (not Azure RBAC), then to grant a
security principal read or write access to a file, you will need to grant the security principal
Execute permissions to the root folder of the container and to each folder in the hierarchy of
folders that lead to the file.
Adding Contributor permissions to the developers will not help as this type of permission
does not provide access to the data.
Access control lists in Azure Data Lake Storage Gen2 - Azure Storage | Microsoft Learn
Question: 71
You plan to deploy an app that will distribute files across multiple Azure Storage
accounts.
You need to recommend a partitioning strategy that meets the following
requirements:
 Optimizes the data distribution balance.

 Minimizes the creation of extra tables.
What should you recommend?

hash
lookup
range
hash
lookup
range
Lookup partitioning requires a lookup table to identify which partition data should reside in.
Range partitioning does not provide optimization for balancing.
Hash partitioning is optimized for data distribution and uses a hash function to eliminate the
need for a lookup table.
Sharding pattern - Azure Architecture Center | Microsoft Learn
Question: 72
You need to limit sensitive data exposure to non-privileged users. You must be able
to grant and revoke access to the sensitive data.
What should you implement?
Always Encrypted
dynamic data masking
row-level security (RLS)
transparent data encryption (TDE)

Dynamic data masking helps prevent unauthorized access to sensitive data by enabling
customers to designate how much of the sensitive data to reveal with minimal impact on the
application layer. It is a policy-based security feature that hides the sensitive data in the result
set of a query over designated database fields, while the data in the database is not changed.
Dynamic data masking - Azure SQL Database | Microsoft Learn
Question: 73
Question: 74
Question: 75

Practice Assessment For Exam DP

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Practice Assessment For Exam DP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practice Assessment For Exam DP

Uploaded by

Copyright:

Available Formats

Practice Assessment for Exam DP-203:

Data Engineering on Microsoft Azure

Which role should you use?

Select only one answer.

Which security technology should you use to grant the access?

Select only one answer.

Granting access to Data Lake Storage Gen2 is done through RBAC.

You have an Azure subscription that contains the following resources:

 An Azure Synapse Analytics workspace named workspace1

What should you create?

Select only one answer.

You have an Azure Synapse Analytics workspace.

Select all answers that apply.

Which destination should you use?

Select only one answer.

You have an Azure Synapse Analytics workspace.

Which view should you use?

Select only one answer.

You have an Azure Data Factory named ADF1.

What should you use?

Select only one answer.

Which view should you use?

Select only one answer.

You have an Azure Data Factory named ADF1.

What should you use?

Select only one answer.

You have an Apache Spark pool in Azure Synapse Analytics.

You need to preserve the DataFrame in memory.

Select all answers that apply.

You need to improve job performance.

Which hint should you use in the query?

Select only one answer.

What should you set?

Select only one answer.

You have an Azure Synapse Analytics workspace.

Which query should you run?

Select only one answer.

Which command should you run?

Select only one answer.

You are evaluating the use of a clustered columnstore index.

What is the minimum recommended number of rows for clustered columnstore

Select only one answer.

You have an Azure Synapse Analytics workspace.

You need to build a materialized view.

Select all answers that apply.

Select only one answer.

 The time it takes to retrieve the resume files must be minimized.

What should you include in the solution?

Select only one answer.

You have an Azure Synapse Analytics database named DB1.

You plan to import data into DB1.

You need to maximize the performance of the data import.

What should you implement?

Select only one answer.

Which type of partitioning should you use?

Select only one answer.

You have an Azure subscription that contains the following resources:

 An Azure Synapse Analytics workspace named app1-syn