DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp

DP-200.prepaway.premium.exam.
201q
Number: DP-200
Passing Score: 800
Time Limit: 120 min
File Version: 9.0
DP-200
Implementing an Azure Data Solution
Version 9.0
20019535C3F31C49C9E768B2921390F7
Implement data storage solutions
Question Set 1
QUESTION 1
You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source
big data solution to collect, process, and maintain data. The analytical data store performs poorly.
You must implement a solution that meets the following requirements:
Provide data warehousing

Reduce ongoing management activities
Deliver SQL query responses in less than one second
You need to create an HDInsight cluster to meet the requirements.
Which type of cluster should you create?
A. Interactive Query
B. Apache Hadoop
C. Apache HBase
D. Apache Spark
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Lambda Architecture with Azure:
Azure offers you a combination of following technologies to accelerate real-time big data analytics:
1. Azure Cosmos DB, a globally distributed and multi-model database service.
2. Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics
applications.
3. Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process.
4. The Spark to Azure Cosmos DB Connector
Note: Lambda architecture is a data-processing architecture designed to handle massive quantities of data
by taking advantage of both batch processing and stream processing methods, and minimizing the latency
20019535C3F31C49C9E768B2921390F7
involved in querying big data.
References:
https://sqlwithmanoj.com/2018/02/16/what-is-lambda-architecture-and-what-azure-offers-with-its-new-
cosmos-db/
QUESTION 2
DRAG DROP
You develop data engineering solutions for a company. You must migrate data from Microsoft Azure Blob
storage to an Azure SQL Data Warehouse for further transformation. You need to implement the solution.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Provision an Azure SQL Data Warehouse instance.

Create a data warehouse in the Azure portal.
20019535C3F31C49C9E768B2921390F7
Step 2: Connect to the Azure SQL Data warehouse by using SQL Server Management Studio
Connect to the data warehouse with SSMS (SQL Server Management Studio)
Step 3: Build external tables by using the SQL Server Management Studio
Create external tables for data in Azure blob storage.
You are ready to begin the process of loading data into your new data warehouse. You use external tables
to load data from the Azure storage blob.
Step 4: Run Transact-SQL statements to load data.

You can use the CREATE TABLE AS SELECT (CTAS) T-SQL statement to load the data from Azure
Storage Blob into new tables in your data warehouse.
References:
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/sql-data-warehouse/load-data-from-
azure-blob-storage-using-polybase.md
QUESTION 3
You develop data engineering solutions for a company. The company has on-premises Microsoft SQL
Server databases at multiple locations.
The company must integrate data with Microsoft Power BI and Microsoft Azure Logic Apps. The solution
must avoid single points of failure during connection and transfer to the cloud. The solution must also
minimize latency.
You need to secure the transfer of data between on-premises databases and Microsoft Azure.
What should you do?
A. Install a standalone on-premises Azure data gateway at each location

B. Install an on-premises data gateway in personal mode at each location
C. Install an Azure on-premises data gateway at the primary location
D. Install an Azure on-premises data gateway as a cluster at each location
Correct Answer: D
Section: (none)
Explanation
Explanation:
You can create high availability clusters of On-premises data gateway installations, to ensure your
organization can access on-premises data resources used in Power BI reports and dashboards. Such
clusters allow gateway administrators to group gateways to avoid single points of failure in accessing on-
premises data resources. The Power BI service always uses the primary gateway in the cluster, unless it’s
not available. In that case, the service switches to the next gateway in the cluster, and so on.
References:
https://docs.microsoft.com/en-us/power-bi/service-gateway-high-availability-clusters
QUESTION 4
You are a data architect. The data engineering team needs to configure a synchronization of data between
an on-premises Microsoft SQL Server database to Azure SQL Database.
Ad-hoc and reporting queries are being overutilized the on-premises production instance. The
synchronization process must:
Perform an initial data synchronization to Azure SQL Database with minimal downtime
Perform bi-directional data synchronization after initial synchronization
You need to implement this synchronization solution.
Which synchronization method should you use?
20019535C3F31C49C9E768B2921390F7
A. transactional replication
B. Data Migration Assistant (DMA)
C. backup and restore
D. SQL Server Agent job
E. Azure SQL Data Sync
Correct Answer: E
Section: (none)
Explanation
Explanation:
SQL Data Sync is a service built on Azure SQL Database that lets you synchronize the data you select bi-
directionally across multiple SQL databases and SQL Server instances.
With Data Sync, you can keep data synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications.
Compare Data Sync with Transactional Replication
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-sync-data
QUESTION 5
An application will use Microsoft Azure Cosmos DB as its data solution. The application will use the
Cassandra API to support a column-based database type that uses containers to store items.
You need to provision Azure Cosmos DB. Which container name and item name should you use? Each
correct answer presents part of the solutions.
NOTE: Each correct answer selection is worth one point.
A. collection
B. rows
C. graph
D. entities
E. table
Correct Answer: BE
Section: (none)
Explanation
Explanation:
B: Depending on the choice of the API, an Azure Cosmos item can represent either a document in a
collection, a row in a table or a node/edge in a graph. The following table shows the mapping between API-
specific entities to an Azure Cosmos item:
20019535C3F31C49C9E768B2921390F7
E: An Azure Cosmos container is specialized into API-specific entities as follows:
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/databases-containers-items
QUESTION 6
A company has a SaaS solution that uses Azure SQL Database with elastic pools. The solution contains a
dedicated database for each customer organization. Customer organizations have peak usage at different
periods during the year.
You need to implement the Azure SQL Database elastic pool to minimize cost.
Which option or options should you configure?
A. Number of transactions only

B. eDTUs per database only
C. Number of databases only
D. CPU usage only
E. eDTUs and max data size
Correct Answer: E
Section: (none)
Explanation
Explanation:
The best size for a pool depends on the aggregate resources needed for all databases in the pool. This
involves determining the following:
Maximum resources utilized by all databases in the pool (either maximum DTUs or maximum vCores
depending on your choice of resourcing model).
Maximum storage bytes utilized by all databases in the pool.
Note: Elastic pools enable the developer to purchase resources for a pool shared by multiple databases to
accommodate unpredictable periods of usage by individual databases. You can configure resources for the
pool based either on the DTU-based purchasing model or the vCore-based purchasing model.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-pool
QUESTION 7
HOTSPOT
You are a data engineer. You are designing a Hadoop Distributed File System (HDFS) architecture. You
plan to use Microsoft Azure Data Lake as a data storage repository.
You must provision the repository with a resilient data schema. You need to ensure the resiliency of the
Azure Data Lake Storage. What should you use? To answer, select the appropriate options in the answer
area.
20019535C3F31C49C9E768B2921390F7
NOTE: Each correct selection is worth one point.
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: NameNode
An HDFS cluster consists of a single NameNode, a master server that manages the file system
namespace and regulates access to files by clients.
20019535C3F31C49C9E768B2921390F7
Box 2: DataNode
The DataNodes are responsible for serving read and write requests from the file system’s clients.
Box 3: DataNode
The DataNodes perform block creation, deletion, and replication upon instruction from the NameNode.
Note: HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master
server that manages the file system namespace and regulates access to files by clients. In addition, there
are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the
nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files.
Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The
NameNode executes file system namespace operations like opening, closing, and renaming files and
directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for
serving read and write requests from the file system’s clients. The DataNodes also perform block creation,
deletion, and replication upon instruction from the NameNode.
References:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes
QUESTION 8
DRAG DROP
You are developing the data platform for a global retail company. The company operates during normal
working hours in each region. The analytical database is used once a week for building sales projections.
Each region maintains its own private virtual network.
Building the sales projections is very resource intensive are generates upwards of 20 terabytes (TB) of
data.
Microsoft Azure SQL Databases must be provisioned.
Database provisioning must maximize performance and minimize cost

The daily sales for each region must be stored in an Azure SQL Database instance
Once a day, the data for all regions must be loaded in an analytical Azure SQL Database instance
You need to provision Azure SQL database instances.
How should you provision the database instances? To answer, drag the appropriate Azure SQL products to
the correct databases. Each Azure SQL product may be used once, more than once, or not at all. You may
need to drag the split bar between panes or scroll to view content.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: Azure SQL Database elastic pools

SQL Database elastic pools are a simple, cost-effective solution for managing and scaling multiple
databases that have varying and unpredictable usage demands. The databases in an elastic pool are on a
single Azure SQL Database server and share a set number of resources at a set price. Elastic pools in
Azure SQL Database enable SaaS developers to optimize the price performance for a group of databases
within a prescribed budget while delivering performance elasticity for each database.
Box 2: Azure SQL Database Hyperscale

A Hyperscale database is an Azure SQL database in the Hyperscale service tier that is backed by the
Hyperscale scale-out storage technology. A Hyperscale database supports up to 100 TB of data and
provides high throughput and performance, as well as rapid scaling to adapt to the workload requirements.
Scaling is transparent to the application – connectivity, query processing, and so on, work like any other
SQL database.
Incorrect Answers:
Azure SQL Database Managed Instance: The managed instance deployment model is designed for
customers looking to migrate a large number of apps from on-premises or IaaS, self-built, or ISV provided
environment to fully managed PaaS cloud environment, with as low migration effort as possible.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-pool
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale-faq
QUESTION 9
A company manages several on-premises Microsoft SQL Server databases.
You need to migrate the databases to Microsoft Azure by using a backup process of Microsoft SQL Server.
Which data technology should you use?
A. Azure SQL Database single database

B. Azure SQL Data Warehouse
C. Azure Cosmos DB
D. Azure SQL Database Managed Instance
Correct Answer: D
Section: (none)
Explanation
Explanation:
Managed instance is a new deployment option of Azure SQL Database, providing near 100% compatibility
20019535C3F31C49C9E768B2921390F7
with the latest SQL Server on-premises (Enterprise Edition) Database Engine, providing a native virtual
network (VNet) implementation that addresses common security concerns, and a business model
favorable for on-premises SQL Server customers. The managed instance deployment model allows
existing SQL Server customers to lift and shift their on-premises applications to the cloud with minimal
application and database changes.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance
QUESTION 10
The data engineering team manages Azure HDInsight clusters. The team spends a large amount of time
creating and destroying clusters daily because most of the data pipeline process runs in minutes.
You need to implement a solution that deploys multiple HDInsight clusters with minimal effort.
What should you implement?
A. Azure Databricks
B. Azure Traffic Manager
C. Azure Resource Manager templates
D. Ambari web user interface
Correct Answer: C
Section: (none)
Explanation
Explanation:
A Resource Manager template makes it easy to create the following resources for your application in a
single, coordinated operation:
HDInsight clusters and their dependent resources (such as the default storage account).
Other resources (such as Azure SQL Database to use Apache Sqoop).
In the template, you define the resources that are needed for the application. You also specify deployment
parameters to input values for different environments. The template consists of JSON and expressions that
you use to construct values for your deployment.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-arm-templates
QUESTION 11
You are the data engineer for your company. An application uses a NoSQL database to store data. The
database uses the key-value and wide-column NoSQL database type.
Developers need to access data in the database using an API.
You need to determine which API to use for the database model and type.
Which two APIs should you use? Each correct answer presents a complete solution.
A. Table API
B. MongoDB API
C. Gremlin API
D. SQL API
E. Cassandra API
Correct Answer: BE
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
B: Azure Cosmos DB is the globally distributed, multimodel database service from Microsoft for mission-
critical applications. It is a multimodel database and supports document, key-value, graph, and columnar
data models.
E: Wide-column stores store data together as columns instead of rows and are optimized for queries over
large datasets. The most popular are Cassandra and HBase.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction
https://www.mongodb.com/scale/types-of-nosql-databases
QUESTION 12
A company is designing a hybrid solution to synchronize data and on-premises Microsoft SQL Server
database to Azure SQL Database.
You must perform an assessment of databases to determine whether data will move without compatibility
issues. You need to perform the assessment.
Which tool should you use?
A. SQL Server Migration Assistant (SSMA)

B. Microsoft Assessment and Planning Toolkit
C. SQL Vulnerability Assessment (VA)
D. Azure SQL Data Sync
E. Data Migration Assistant (DMA)
Correct Answer: E
Section: (none)
Explanation
Explanation:
The Data Migration Assistant (DMA) helps you upgrade to a modern data platform by detecting
compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL
Database. DMA recommends performance and reliability improvements for your target environment and
allows you to move your schema, data, and uncontained objects from your source server to your target
server.
References:
https://docs.microsoft.com/en-us/sql/dma/dma-overview
QUESTION 13
DRAG DROP
You manage a financial computation data analysis process. Microsoft Azure virtual machines (VMs) run the
process in daily jobs, and store the results in virtual hard drives (VHDs.)
The VMs product results using data from the previous day and store the results in a snapshot of the VHD.
When a new month begins, a process creates a new VHD.
You must implement the following data retention requirements:
Daily results must be kept for 90 days

Data for the current year must be available for weekly reports
Data from the previous 10 years must be stored for auditing purposes
Data required for an audit must be produced within 10 days of a request.
You need to enforce the data retention requirements while minimizing cost.
How should you configure the lifecycle policy? To answer, drag the appropriate JSON segments to the
20019535C3F31C49C9E768B2921390F7
correct locations. Each JSON segment may be used once, more than once, or not at all. You may need to
drag the split bat between panes or scroll to view content.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
The Set-AzStorageAccountManagementPolicy cmdlet creates or modifies the management policy of an

Azure Storage account.
Example: Create or update the management policy of a Storage account with ManagementPolicy rule
objects.
20019535C3F31C49C9E768B2921390F7
Action -BaseBlobAction Delete -daysAfterModificationGreaterThan 100
PS C:\>$action1 = Add-AzStorageAccountManagementPolicyAction -InputObject $action1 -BaseBlobAction
TierToArchive -daysAfterModificationGreaterThan 50
PS C:\>$action1 = Add-AzStorageAccountManagementPolicyAction -InputObject $action1 -BaseBlobAction
TierToCool -daysAfterModificationGreaterThan 30
PS C:\>$action1 = Add-AzStorageAccountManagementPolicyAction -InputObject $action1 -SnapshotAction
Delete -daysAfterCreationGreaterThan 100
PS C:\>$filter1 = New-AzStorageAccountManagementPolicyFilter -PrefixMatch ab,cd
PS C:\>$rule1 = New-AzStorageAccountManagementPolicyRule -Name Test -Action $action1 -Filter
$filter1
PS C:\>$action2 = Add-AzStorageAccountManagementPolicyAction -BaseBlobAction Delete -

daysAfterModificationGreaterThan 100
PS C:\>$filter2 = New-AzStorageAccountManagementPolicyFilter
References:
https://docs.microsoft.com/en-us/powershell/module/az.storage/set-azstorageaccountmanagementpolicy
QUESTION 14
A company plans to use Azure SQL Database to support a mission-critical application.
The application must be highly available without performance degradation during maintenance windows.
You need to implement the solution.
Which three technologies should you implement? Each correct answer presents part of the solution.
A. Premium service tier

B. Virtual machine Scale Sets
C. Basic service tier
D. SQL Data Sync
E. Always On availability groups
F. Zone-redundant configuration
Correct Answer: AEF

Section: (none)
Explanation
Explanation:
A: Premium/business critical service tier model that is based on a cluster of database engine processes.
This architectural model relies on a fact that there is always a quorum of available database engine nodes
and has minimal performance impact on your workload even during maintenance activities.
E: In the premium model, Azure SQL database integrates compute and storage on the single node. High
availability in this architectural model is achieved by replication of compute (SQL Server Database Engine
process) and storage (locally attached SSD) deployed in 4-node cluster, using technology similar to SQL
Server Always On Availability Groups.
20019535C3F31C49C9E768B2921390F7
F: Zone redundant configuration
By default, the quorum-set replicas for the local storage configurations are created in the same datacenter.
With the introduction of Azure Availability Zones, you have the ability to place the different replicas in the
quorum-sets to different availability zones in the same region. To eliminate a single point of failure, the
control ring is also duplicated across multiple zones as three gateway rings (GW).
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability
QUESTION 15
A company plans to use Azure Storage for file storage purposes. Compliance rules require:
A single storage account to store all operations including reads, writes and deletes
Retention of an on-premises copy of historical operations
You need to configure the storage account.
Which two actions should you perform? Each correct answer presents part of the solution.
A. Configure the storage account to log read, write and delete operations for service type Blob
B. Use the AzCopy tool to download log data from $logs/blob
C. Configure the storage account to log read, write and delete operations for service-type table
D. Use the storage client to download log data from $logs/table
20019535C3F31C49C9E768B2921390F7
E. Configure the storage account to log read, write and delete operations for service type queue
Correct Answer: AB
Section: (none)
Explanation
Explanation:
Storage Logging logs request data in a set of blobs in a blob container named $logs in your storage
account. This container does not show up if you list all the blob containers in your account but you can see
its contents if you access it directly.
To view and analyze your log data, you should download the blobs that contain the log data you are
interested in to a local machine. Many storage-browsing tools enable you to download blobs from your
storage account; you can also use the Azure Storage team provided command-line Azure Copy Tool
(AzCopy) to download your log data.
References:
https://docs.microsoft.com/en-us/rest/api/storageservices/enabling-storage-logging-and-accessing-log-data
QUESTION 16
DRAG DROP
You are developing a solution to visualize multiple terabytes of geospatial data.
The solution has the following requirements:
Data must be encrypted.

Data must be accessible by multiple resources on Microsoft Azure.
You need to provision storage for the solution.
Which four actions should you perform in sequence? To answer, move the appropriate action from the list
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Create a new Azure Data Lake Storage account with Azure Data Lake managed encryption keys
For Azure services, Azure Key Vault is the recommended key storage solution and provides a common
management experience across services. Keys are stored and managed in key vaults, and access to a key
vault can be given to users or services. Azure Key Vault supports customer creation of keys or import of
customer keys for use in customer-managed encryption key scenarios.
Note: Data Lake Storage Gen1 account Encryption Settings. There are three options:
Do not enable encryption.
Use keys managed by Data Lake Storage Gen1, if you want Data Lake Storage Gen1 to manage your
encryption keys.
Use keys from your own Key Vault. You can select an existing Azure Key Vault or create a new Key
Vault. To use the keys from a Key Vault, you must assign permissions for the Data Lake Storage Gen1
account to access the Azure Key Vault.
References:
https://docs.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest
QUESTION 17
You are developing a data engineering solution for a company. The solution will store a large set of key-
value pair data by using Microsoft Azure Cosmos DB.
The solution has the following requirements:
Data must be partitioned into multiple containers.

Data containers must be configured separately.
Data must be accessible from applications hosted around the world.
The solution must minimize latency.
You need to provision Azure Cosmos DB.
Which three actions should you perform? Each correct answer presents part of the solution.
A. Configure account-level throughput.
20019535C3F31C49C9E768B2921390F7
B. Provision an Azure Cosmos DB account with the Azure Table API. Enable geo-redundancy.
C. Configure table-level throughput.
D. Replicate the data globally by manually adding regions to the Azure Cosmos DB account.
E. Provision an Azure Cosmos DB account with the Azure Table API. Enable multi-region writes.
Correct Answer: E
Section: (none)
Explanation
Explanation:
Scale read and write throughput globally. You can enable every region to be writable and elastically scale
reads and writes all around the world. The throughput that your application configures on an Azure Cosmos
database or a container is guaranteed to be delivered across all regions associated with your Azure
Cosmos account. The provisioned throughput is guaranteed up by financially backed SLAs.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/distribute-data-globally
QUESTION 18
A company has a SaaS solution that uses Azure SQL Database with elastic pools. The solution will have a
dedicated database for each customer organization. Customer organizations have peak usage at different
periods during the year.
Which two factors affect your costs when sizing the Azure SQL Database elastic pools? Each correct
answer presents a complete solution.
A. maximum data size

B. number of databases
C. eDTUs consumption
D. number of read operations
E. number of transactions
Correct Answer: AC
Section: (none)
Explanation
Explanation:
A: With the vCore purchase model, in the General Purpose tier, you are charged for Premium blob storage
that you provision for your database or elastic pool. Storage can be configured between 5 GB and 4 TB
with 1 GB increments. Storage is priced at GB/month.
C: In the DTU purchase model, elastic pools are available in basic, standard and premium service tiers.
Each tier is distinguished primarily by its overall performance, which is measured in elastic Database
Transaction Units (eDTUs).
References:
https://azure.microsoft.com/en-in/pricing/details/sql-database/elastic/
QUESTION 19
HOTSPOT
You are developing a solution using a Lambda architecture on Microsoft Azure.
The data at rest layer must meet the following requirements:
Data storage:
Serve as a repository for high volumes of large files in various formats.
20019535C3F31C49C9E768B2921390F7
Implement optimized storage for big data analytics workloads.
Ensure that data can be organized using a hierarchical structure.
Batch processing:
Use a managed solution for in-memory computation processing.

Natively support Scala, Python, and R programming languages.
Provide the ability to resize and terminate the cluster automatically.
Analytical data store:
Support parallel processing.

Use columnar storage.
Support SQL-based languages.
You need to identify the correct technologies to build the Lambda architecture.
Which technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Data storage: Azure Data Lake Store

A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object
storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/
files within an account to be organized into a hierarchy of directories and nested subdirectories in the same
way that the file system on your computer is organized. With the hierarchical namespace enabled, a
storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with
file system semantics that are familiar to analytics engines and frameworks.
Batch processing: HD Insight Spark

Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to
boost the performance of big-data analysis applications.
HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch
processing, you can use Spark, Hive, Hive LLAP, MapReduce.
Languages: R, Python, Java, Scala, SQL
Analytic data store: Azure Synapse Analytics
20019535C3F31C49C9E768B2921390F7
Azure Synapse Analytics Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that uses
Massively Parallel Processing (MPP).
Azure Synapse Analytics stores data into relational tables with columnar storage.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is
QUESTION 20
DRAG DROP
Your company has on-premises Microsoft SQL Server instance.
The data engineering team plans to implement a process that copies data from the SQL Server instance to
Azure Blob storage. The process must orchestrate and manage the data lifecycle.
You need to configure Azure Data Factory to connect to the SQL Server instance.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: Deploy an Azure Data Factory

You need to create a data factory and start the Data Factory UI to create a pipeline in the data factory.
Step 2: From the on-premises network, install and configure a self-hosted runtime.
To use copy data from a SQL Server database that isn't publicly accessible, you need to set up a self-
hosted integration runtime.
Step 3: Configure a linked service to connect to the SQL Server instance.
References:
https://docs.microsoft.com/en-us/azure/data-factory/connector-sql-server
QUESTION 21
A company runs Microsoft SQL Server in an on-premises virtual machine (VM).
You must migrate the database to Azure SQL Database. You synchronize users from Active Directory to
Azure Active Directory (Azure AD).
You need to configure Azure SQL Database to use an Azure AD user as administrator.
What should you configure?
A. For each Azure SQL Database, set the Access Control to administrator.
B. For each Azure SQL Database server, set the Active Directory to administrator.
C. For each Azure SQL Database, set the Active Directory administrator role.
D. For each Azure SQL Database server, set the Access Control to administrator.
Correct Answer: C
Section: (none)
Explanation
Explanation:
There are two administrative accounts (Server admin and Active Directory admin) that act as
administrators.
One Azure Active Directory account, either an individual or security group account, can also be configured
as an administrator. It is optional to configure an Azure AD administrator, but an Azure AD administrator
must be configured if you want to use Azure AD accounts to connect to SQL Database.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-manage-logins
QUESTION 22
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field
named Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements:
The first two prefix characters must be exposed.

The last four prefix characters must be exposed.
20019535C3F31C49C9E768B2921390F7
All other characters must be masked.
Solution: You implement data masking and use a credit card function mask.
Does this meet the goal?
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Must use Custom Text data masking, which exposes the first and last characters and adds a custom
padding string in the middle.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started
QUESTION 23

Solution: You implement data masking and use an email function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
References:
QUESTION 24
20019535C3F31C49C9E768B2921390F7

Solution: You implement data masking and use a random number function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
References:
QUESTION 25
DRAG DROP
You are responsible for providing access to an Azure Data Lake Storage Gen2 account.
Your user account has contributor access to the storage account, and you have the application ID and
access key.
You plan to use PolyBase to load data into an enterprise data warehouse in Azure Synapse Analytics.
You need to configure PolyBase to connect the data warehouse to the storage account.
Which three components should you create in sequence? To answer, move the appropriate components
from the list of components to the answer are and arrange them in the correct order.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: a database scoped credential
To access your Data Lake Storage account, you will need to create a Database Master Key to encrypt your
credential secret used in the next step. You then create a database scoped credential.
Step 2: an external data source

Create the external data source. Use the CREATE EXTERNAL DATA SOURCE command to store the
location of the data. Provide the credential created in the previous step.
Step 3: an external file format

Configure data format: To import the data from Data Lake Storage, you need to specify the External File
Format. This object defines how the files are written in Data Lake Storage.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
20019535C3F31C49C9E768B2921390F7
QUESTION 26
You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:
Provide the fastest query time.

Minimize data movement during queries.
Which type of table should you use?
A. hash distributed
B. heap
C. replicated
D. round-robin
Correct Answer: D
Section: (none)
Explanation
Explanation:
Usually common dimension tables or tables that doesn’t distribute evenly are good candidates for round-
robin distributed table.
Note: Dimension tables or other lookup tables in a schema can usually be stored as round-robin tables.
Usually these tables connect to more than one fact tables and optimizing for one join may not be the best
idea. Also usually dimension tables are smaller which can leave some distributions empty when hash
distributed. Round-robin by definition guarantees a uniform data distribution.
Reference:
https://blogs.msdn.microsoft.com/sqlcat/2015/08/11/choosing-hash-distributed-table-vs-round-robin-
distributed-table-in-azure-sql-dw-service/
QUESTION 27
You have an enterprise data warehouse in Azure Synapse Analytics.
Using PolyBase, you create table named [Ext].[Items] to query Parquet files stored in Azure Data Lake
Storage Gen2 without importing the data to the data warehouse.
The external table has three columns.
You discover that the Parquet files have a fourth column named ItemID.
Which command should you run to add the ItemID column to the external table?
20019535C3F31C49C9E768B2921390F7
A. Option A
B. Option B
C. Option C
D. Option D
Correct Answer: A
Section: (none)
Explanation
Explanation:
Incorrect Answers:
B, D: Only these Data Definition Language (DDL) statements are allowed on external tables:
CREATE TABLE and DROP TABLE

CREATE STATISTICS and DROP STATISTICS
CREATE VIEW and DROP VIEW
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql
QUESTION 28
DRAG DROP
You have a table named SalesFact in Azure Synapse Analytics. SalesFact contains sales data from the
20019535C3F31C49C9E768B2921390F7
past 36 months and has the following characteristics:
Is partitioned by month
Contains one billion rows
Has clustered columnstore indexes
At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as
quickly as possible.
Which three actions should you perform in sequence in a stored procedure? To answer, move the
appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Create an empty table named SalesFact_work that has the same schema as SalesFact.
Step 2: Switch the partition containing the stale data from SalesFact to SalesFact_Work.
SQL Data Warehouse supports partition splitting, merging, and switching. To switch partitions between two
tables, you must ensure that the partitions align on their respective boundaries and that the table definitions
match.
Loading data into partitions with partition switching is a convenient way stage new data in a table that is not
visible to users the switch in the new data.
20019535C3F31C49C9E768B2921390F7
Step 3: Drop the SalesFact_Work table.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition
QUESTION 29
You plan to implement an Azure Cosmos DB database that will write 100,000,000 JSON every 24 hours.
The database will be replicated to three regions. Only one region will be writable.
You need to select a consistency level for the database to meet the following requirements:
Guarantee monotonic reads and writes within a session.

Provide the fastest throughput.
Provide the lowest latency.
Which consistency level should you select?
A. Strong
B. Bounded Staleness
C. Eventual
D. Session
E. Consistent Prefix
Correct Answer: D
Section: (none)
Explanation
Explanation:
Session: Within a single client session reads are guaranteed to honor the consistent-prefix (assuming a
single “writer” session), monotonic reads, monotonic writes, read-your-writes, and write-follows-reads
guarantees. Clients outside of the session performing writes will see eventual consistency.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
QUESTION 30

Solution: You implement data masking and use a custom text mask.
A. Yes
B. No
Correct Answer: A
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
We must use Custom Text data masking, which exposes the first and last characters and adds a custom
Reference:
QUESTION 31
You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical
values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse
Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is less than 1 MB.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Instead convert the files to compressed delimited text files.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
QUESTION 32
You plan to copy the data from the storage account to an Azure SQL data warehouse.
Solution: You modify the files to ensure that each row is more than 1 MB.
20019535C3F31C49C9E768B2921390F7
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Instead modify the files to ensure that each row is less than 1 MB.
References:
QUESTION 33
Analytics.
Solution: You copy the files to a table that has a columnstore index.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Instead convert the files to compressed delimited text files.
Reference:
QUESTION 34
You plan to deploy an Azure Cosmos DB database that supports multi-master replication.
You need to select a consistency level for the database to meet the following requirements:
Provide a recovery point objective (RPO) of less than 15 minutes.

Provide a recovery time objective (RTO) of zero minutes.
What are three possible consistency levels that you can select? Each correct answer presents a complete
solution.
20019535C3F31C49C9E768B2921390F7
A. Strong
B. Bounded Staleness
C. Eventual
D. Session
E. Consistent Prefix
Correct Answer: CDE

Section: (none)
Explanation
Explanation:
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels-choosing
QUESTION 35
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
Azure Username: xxxxx

Azure Password: xxxxx
The following information is for technical support purposes only:

Lab Instance: 10277521
You need to ensure that you can recover any blob data from an Azure Storage account named storage
10277521 up to 30 days after the data is deleted.
To complete this task, sign in to the Azure portal.
Correct Answer: See the explanation below.

Section: (none)
Explanation
Explanation:
1. Open Azure Portal and open the Azure Blob storage account named storage10277521.
2. Right-click and select Blob properties
20019535C3F31C49C9E768B2921390F7
3. From the properties window, change the access tier for the blob to Cool.
Note: The cool access tier has lower storage costs and higher access costs compared to hot storage. This
tier is intended for data that will remain in the cool tier for at least 30 days.
Reference:
https://dailydotnettips.com/how-to-update-access-tier-in-azure-storage-blob-level/
QUESTION 36
SIMULATION
20019535C3F31C49C9E768B2921390F7


You need to replicate db1 to a new Azure SQL server named REPL10277521 in the Central Canada
region.
NOTE: This task might take several minutes to complete. You can perform other tasks while the task
completes or ends this section of the exam.

Section: (none)
Explanation
Explanation:
1. In the Azure portal, browse to the database that you want to set up for geo-replication.
2. On the SQL database page, select geo-replication, and then select the region to create the secondary
database.
20019535C3F31C49C9E768B2921390F7
3. Select or configure the server and for the secondary database.
Region: Central Canada
Target server: REPL10277521
20019535C3F31C49C9E768B2921390F7
4. Click Create to add the secondary.
5. The secondary database is created and the seeding process begins.
20019535C3F31C49C9E768B2921390F7
6. When the seeding process is complete, the secondary database displays its status.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-active-geo-replication-portal
QUESTION 37
SIMULATION
20019535C3F31C49C9E768B2921390F7


You need to create an Azure SQL database named db3 on an Azure SQL server named SQL10277521.
Db3 must use the Sample (AdventureWorksLT) source.

Section: (none)
Explanation
Explanation:
1. Click Create a resource in the upper left-hand corner of the Azure portal.
2. On the New page, select Databases in the Azure Marketplace section, and then click SQL Database in
the Featured section.
20019535C3F31C49C9E768B2921390F7
3. Fill out the SQL Database form with the following information, as shown below:
Database name: Db3
Select source: Sample (AdventureWorksLT)
Server: SQL10277521
4. Click Select and finish the Wizard using default options.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-design-first-database
QUESTION 38
SIMULATION
20019535C3F31C49C9E768B2921390F7


You plan to query db3 to retrieve a list of sales customers. The query will retrieve several columns that
include the email address of each sales customer.
You need to modify db3 to ensure that a portion of the email addresses is hidden in the query results.

Section: (none)
Explanation
Explanation:
1. Launch the Azure portal.
2. Navigate to the settings page of the database db3 that includes the sensitive data you want to mask.
3. Click the Dynamic Data Masking tile that launches the Dynamic Data Masking configuration page.
Note: Alternatively, you can scroll down to the Operations section and click Dynamic Data Masking.
20019535C3F31C49C9E768B2921390F7
4. In the Dynamic Data Masking configuration page, you may see some database columns that the
recommendations engine has flagged for masking.
20019535C3F31C49C9E768B2921390F7
5. Click ADD MASK for the EmailAddress column
6. Click Save in the data masking rule page to update the set of masking rules in the dynamic data
masking policy.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started-
portal
QUESTION 39
SIMULATION
20019535C3F31C49C9E768B2921390F7


You need to increase the size of db2 to store up to 250 GB of data.

Section: (none)
Explanation
Explanation:
1. In Azure Portal, navigate to the SQL databases page, select the db2 database , and choose Configure
performance
20019535C3F31C49C9E768B2921390F7
2. Click on Standard and Adjust the Storage size to 250 GB
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-single-databases-manage
QUESTION 40
HOTSPOT
You have an enterprise data warehouse in Azure Synapse Analytics that contains a table named
FactOnlineSales. The table contains data from the start of 2009 to the end of 2012.
You need to improve the performance of queries against FactOnlineSales by using table partitions. The
solution must meet the following requirements:
Create four partitions based on the order date.

Ensure that each partition contains all the orders placed during a given calendar year.
How should you complete the T-SQL command? To answer, select the appropriate options in the answer
area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: LEFT
RANGE LEFT: Specifies the boundary value belongs to the partition on the left (lower values). The default
is LEFT.
Box 2: 20090101, 20100101, 20110101, 20120101

FOR VALUES ( boundary_value [,...n] ) specifies the boundary values for the partition. boundary_value is a
constant expression.
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse
QUESTION 41
SIMULATION


20019535C3F31C49C9E768B2921390F7
You need to create an elastic pool that contains an Azure SQL database named db2 and a new SQL
database named db3.

Section: (none)
Explanation
Explanation:
Step 1: Create a new SQL database named db3
1. Select SQL in the left-hand menu of the Azure portal. If SQL is not in the list, select All services, then
type SQL in the search box.
2. Select + Add to open the Select SQL deployment option page. Select Single Database. You can view
additional information about the different databases by selecting Show details on the Databases tile.
3. Select Create:
20019535C3F31C49C9E768B2921390F7
4. Enter the required fields if necessary.
5. Leave the rest of the values as default and select Review + Create at the bottom of the form.
6. Review the final settings and select Create. Use Db3 as database name.
On the SQL Database form, select Create to deploy and provision the resource group, server, and
database.
Step 2: Create your elastic pool using the Azure portal.
1. Select Azure SQL in the left-hand menu of the Azure portal. If Azure SQL is not in the list, select All
services, then type Azure SQL in the search box.
2. Select + Add to open the Select SQL deployment option page.
3. Select Elastic pool from the Resource type drop-down in the SQL Databases tile. Select Create to create
your elastic pool.
4. Configure your elastic pool with the following values:

Name: Provide a unique name for your elastic pool, such as myElasticPool.
Subscription: Select your subscription from the drop-down.
ResourceGroup: Select the resource group.
Server: Select the server
20019535C3F31C49C9E768B2921390F7
5. Select Configure elastic pool
6. On the Configure page, select the Databases tab, and then choose to Add database.
7. Add the Azure SQL database named db2, and the new SQL database named db3 that you created in
Step 1.
8. Select Review + create to review your elastic pool settings and then select Create to create your elastic
pool.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/sql-database/sql-database-elastic-pool-failover-group-tutorial
QUESTION 42
SIMULATION


You need to create an Azure Storage account named account10543936. The solution must meet the
following requirements:
Minimize storage costs.
Ensure that account10543936 can store many image files.
Ensure that account10543936 can quickly retrieve stored image files.

Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Create a general-purpose v2 storage account, which provides access to all of the Azure Storage services:
blobs, files, queues, tables, and disks.
1. On the Azure portal menu, select All services. In the list of resources, type Storage Accounts. As you
begin typing, the list filters based on your input. Select Storage Accounts.
2. On the Storage Accounts window that appears, choose Add.
3. Select the subscription in which to create the storage account.
4. Under the Resource group field, select Create new. Enter the name for your new resource group, as
shown in the following image.
5. Next, enter the name account10543936 for your storage account.
6. Select a location for your storage account, or use the default location.
7. Leave these fields set to their default values:

Deployment model: Resource Manager
Performance: Standard
Account kind: StorageV2 (general-purpose v2)
Replication: Read-access geo-redundant storage (RA-GRS)
Access tier: Hot
8. Select Review + Create to review your storage account settings and create the account.
9. Select Create.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create
QUESTION 43
SIMULATION
20019535C3F31C49C9E768B2921390F7

You need to ensure that users in the West US region can read data from a local copy of an Azure Cosmos
DB database named cosmos10543936.
NOTE: This task might take several minutes to complete. You can perform other tasks while the
task completes or end this section of the exam.

Section: (none)
Explanation
Explanation:
You can enable Availability Zones by using Azure portal when creating an Azure Cosmos account.
You can enable Availability Zones by using Azure portal.
Step 1: enable the Geo-redundancy, Multi-region Writes
1. In Azure Portal search for and select Azure Cosmos DB.
20019535C3F31C49C9E768B2921390F7
2. Locate the Cosmos DB database named cosmos10543936
3. Access the properties for cosmos10543936
4. enable the Geo-redundancy, Multi-region Writes.

Location: West US region
Step 2: Add region from your database account

1. In to Azure portal, go to your Azure Cosmos account, and open the Replicate data globally menu.
2. To add regions, select the hexagons on the map with the + label that corresponds to your desired region
(s). Alternatively, to add a region, select the + Add region option and choose a region from the drop-down
menu.
Add: West US region
3. To save your changes, select OK.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/high-availability
https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-manage-database-account
QUESTION 44
SIMULATION
20019535C3F31C49C9E768B2921390F7


You plan to enable Azure Multi-Factor Authentication (MFA).
You need to ensure that [email protected] can manage any databases hosted on an
Azure SQL server named SQL10543936 by signing in using his Azure Active Directory (Azure AD) user
account.

Section: (none)
Explanation
Explanation:
Provision an Azure Active Directory administrator for your managed instance
Each Azure SQL server (which hosts a SQL Database or SQL Data Warehouse) starts with a single server
20019535C3F31C49C9E768B2921390F7
administrator account that is the administrator of the entire Azure SQL server. A second SQL Server
administrator must be created, that is an Azure AD account. This principal is created as a contained
database user in the master database.
1. In the Azure portal, in the upper-right corner, select your connection to drop down a list of possible Active
Directories. Choose the correct Active Directory as the default Azure AD. This step links the subscription-
associated Active Directory with Azure SQL server making sure that the same subscription is used for both
Azure AD and SQL Server. (The Azure SQL server can be hosting either Azure SQL Database or Azure
SQL Data Warehouse.)
2. Search for and select the SQL server SQL10543936
3. In SQL Server page, select Active Directory admin.
4. In the Active Directory admin page, select Set admin.
20019535C3F31C49C9E768B2921390F7
5. In the Add admin page, search for user [email protected], select it, and then select
Select. (The Active Directory admin page shows all members and groups of your Active Directory. Users or
groups that are grayed out cannot be selected because they are not supported as Azure AD administrators.
20019535C3F31C49C9E768B2921390F7
6. At the top of the Active Directory admin page, select SAVE.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-aad-authentication-configure?
QUESTION 45
HOTSPOT
You have the following Azure Stream Analytics query.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: Yes
You can now use a new extension of Azure Stream Analytics SQL to specify the number of partitions of a
stream when reshuffling the data.
The outcome is a stream that has the same partition scheme. Please see below for an example:
WITH step1 AS (SELECT * FROM [input1] PARTITION BY DeviceID INTO 10),

step2 AS (SELECT * FROM [input2] PARTITION BY DeviceID INTO 10)
SELECT * INTO [output] FROM step1 PARTITION BY DeviceID UNION step2 PARTITION BY DeviceID
Note: The new extension of Azure Stream Analytics SQL includes a keyword INTO that allows you to
specify the number of partitions for a stream when performing reshuffling using a PARTITION BY
statement.
Box 2: Yes
When joining two streams of data explicitly repartitioned, these streams must have the same partition key
and partition count.
Box 3: Yes
Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your
job.
In general, the best practice is to start with 6 SUs for queries that don't use PARTITION BY.
Here there are 10 partitions, so 6x10 = 60 SUs is good.
20019535C3F31C49C9E768B2921390F7
Note: Remember, Streaming Unit (SU) count, which is the unit of scale for Azure Stream Analytics, must
be adjusted so the number of physical resources available to the job can fit the partitioned flow. In general,
six SUs is a good number to assign to each partition. In case there are insufficient resources assigned to
the job, the system will only apply the repartition if it benefits the job.
Reference:
https://azure.microsoft.com/en-in/blog/maximize-throughput-with-repartitioning-in-azure-stream-analytics/
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption
QUESTION 46
DRAG DROP
You have an Azure SQL database named DB1 in the East US 2 region.
You need to build a secondary geo-replicated copy of DB1 in the West US region on a new server.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: From the Geo-replication settings of DB1, select West US

The following steps create a new secondary database in a geo-replication partnership.
1. In the Azure portal, browse to the database that you want to set up for geo-replication.
2. (Step 1) On the SQL database page, select geo-replication, and then select the region to create the
secondary database.
3. (Step 2-3) Select or configure the server and pricing tier for the secondary database.
Step 2: Create a target server and select a pricing tier
Step 3: On the secondary server, create logins that match the SIDs on the primary server.
Incorrect Answers:
Not log shipping: Replication is used.
References:
20019535C3F31C49C9E768B2921390F7
QUESTION 47
HOTSPOT
You have an Azure SQL database that contains a table named Employee. Employee contains sensitive
data in a decimal (10,2) column named Salary.
You need to ensure that nonprivileged users can view the table data, but Salary must display a number
from 0 to 100.
What should you configure? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: SELECT
Users with SELECT permission on a table can view the table data. Columns that are defined as masked,
will display the masked data.
Incorrect:
Grant the UNMASK permission to a user to enable them to retrieve unmasked data from the columns for
which masking is defined.
The CONTROL permission on the database includes both the ALTER ANY MASK and UNMASK
permission.
Box 2: Random number

Random number: Masking method, which generates a random number according to the selected
boundaries and actual data types. If the designated boundaries are equal, then the masking function is a
constant number.
QUESTION 48
in the series contains a unique solution that might meet the stated goals. Some questions sets
You have an Azure subscription that contains an Azure Storage account.
You plan to implement changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 days.
Solution: You apply an Azure policy that tags the storage account.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Instead apply an Azure Blob storage lifecycle policy.
Reference:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-
portal
QUESTION 49
Solution: You apply an expired tag to the blobs in the storage account.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Reference:
portal
QUESTION 50
Solution: You apply an Azure Blob storage lifecycle policy.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
20019535C3F31C49C9E768B2921390F7
Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage
accounts. Use the policy to transition your data to the appropriate access tiers or expire at the end of the
data's lifecycle.
The lifecycle management policy lets you:

Transition blobs to a cooler storage tier (hot to cool, hot to archive, or cool to archive) to optimize for
performance and cost
Delete blobs at the end of their lifecycles
Define rules to be run once per day at the storage account level
Apply rules to containers or a subset of blobs (using prefixes as filters)
Reference:
portal
QUESTION 51

Solution: You implement data masking and use a custom string function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Reference:
QUESTION 52
20019535C3F31C49C9E768B2921390F7
Solution: You schedule an Azure Data Factory pipeline.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Reference:
portal
QUESTION 53
Analytics.
Solution: You convert the files to compressed delimited text files.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
All file formats have different performance characteristics. For the fastest load, use compressed delimited
text files.
Reference:
20019535C3F31C49C9E768B2921390F7
Testlet 2
Background
Proseware, Inc, develops and manages a product named Poll Taker. The product is used for delivering
public opinion polling and analysis.
Polling data comes from a variety of sources, including online surveys, house-to-house interviews, and
booths at public events.
Polling data
Polling data is stored in one of the two locations:
An on-premises Microsoft SQL Server 2019 database named PollingData

Azure Data Lake Gen 2
Data in Data Lake is queried by using PolyBase
Poll metadata
Each poll has associated metadata with information about the poll including the date and number of
respondents. The data is stored as JSON.
Phone-based polling
Security
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
Data migration and loading
All data migration processes must use Azure Data Factory

All data migrations must run automatically during non-business hours
Data migrations must be reliable and retry when needed
Performance
After six months, raw polling data should be moved to a storage account. The storage must be available in
the event of a regional disaster. The solution must minimize costs.
Deployments
All deployments must be performed by using Azure DevOps. Deployments must use templates used in
multiple environments
No credentials or secrets should be used during deployments
Reliability
All services and processes must be resilient to a regional Azure outage.
Monitoring
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must
be monitored.
QUESTION 1
DRAG DROP
You need to ensure that phone-based polling data can be analyzed in the PollingData database.
list of actions to the answer are and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Scenario:
20019535C3F31C49C9E768B2921390F7
QUESTION 2
DRAG DROP
You need to provision the polling data storage account.
How should you configure the storage account? To answer, drag the appropriate Configuration Value to the
correct Setting. Each Configuration Value may be used once, more than once, or not at all. You may need
to drag the split bar between panes or scroll to view content.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Account type: StorageV2

You must create new storage accounts as type StorageV2 (general-purpose V2) to take advantage of Data
Lake Storage Gen2 features.
Scenario: Polling data is stored in one of the two locations:

Replication type: RA-GRS

Scenario: All services and processes must be resilient to a regional Azure outage.
Geo-redundant storage (GRS) is designed to provide at least 99.99999999999999% (16 9's) durability of
objects over a given year by replicating your data to a secondary region that is hundreds of miles away
from the primary region. If your storage account has GRS enabled, then your data is durable even in the
case of a complete regional outage or a disaster in which the primary region isn't recoverable.
If you opt for GRS, you have two related options to choose from:
GRS replicates your data to another data center in a secondary region, but that data is available to be
read only if Microsoft initiates a failover from the primary to secondary region.
Read-access geo-redundant storage (RA-GRS) is based on GRS. RA-GRS replicates your data to
another data center in a secondary region, and also provides you with the option to read from the
secondary region. With RA-GRS, you can read from the secondary region regardless of whether
Microsoft initiates a failover from the primary to secondary region.
References:
https://docs.microsoft.com/bs-cyrl-ba/azure/storage/blobs/data-lake-storage-quickstart-create-account
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs
20019535C3F31C49C9E768B2921390F7
Testlet 3
Case Study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on this
exam. You must manage your time to ensure that you are able to complete all questions included on this
exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in
the case study. Case studies might contain exhibits and other resources that provide more information
about the scenario that is described in the case study. Each question is independent of the other question
in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers
and to make changes before you move to the next section of the exam. After you begin a new section, you
cannot return to this section.
To start the case study

To display the first question in this case study, click the Next button. Use the buttons in the left pane to
explore the content of the case study before you answer the questions. Clicking these buttons displays
information such as business requirements, existing environment, and problem statements. If the case
study has an All Information tab, note that the information displayed is identical to the information
displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to
return to the question.
Overview
General Overview
Litware, Inc. is an international car racing and manufacturing company that has 1,000 employees. Most
employees are located in Europe. The company supports racing teams that complete in a worldwide racing
series.
Physical Locations
Litware has two main locations: a main office in London, England, and a manufacturing plant in Berlin,
Germany.
During each race weekend, 100 engineers set up a remote portable office by using a VPN to connect the
datacenter in the London office. The portable office is set up and torn down in approximately 20 different
countries each year.
Existing environment
Race Central
During race weekends, Litware uses a primary application named Race Central. Each car has several
sensors that send real-time telemetry data to the London datacentre. The data is used for real-time tracking
of the cars.
Race Central also sends batch updates to an application named Mechanical Workflow by using Microsoft
SQL Server Integration Services (SSIS).
The telemetry data is sent to a MongoDB database. A custom application then moves the data to
databases in SQL Server 2017. The telemetry data in MongoDB has more than 500 attributes. The
application changes the attribute names when the data is moved to SQL Server 2017.
The database structure contains both OLAP and OLTP databases.
Mechanical Workflow
Mechanical Workflow is used to track changes and improvements made to the cars during their lifetime.
20019535C3F31C49C9E768B2921390F7
Currently, Mechanical Workflow runs on SQL Server 2017 as an OLAP system.
Mechanical Workflow has a table named Table1 that is 1 TB. Large aggregations are performed on a
single column of Table1.
Requirements
Planned Changes
Litware is in the process of rearchitecting its data estate to be hosted in Azure. The company plans to
decommission the London datacentre and move all its applications to an Azure datacenter.
Technical Requirements
Litware identifies the following technical requirements:
Data collection for Race Central must be moved to Azure Cosmos DB and Azure SQL Database. The
data must be written to the Azure datacenter closest to each race and must converge in the least
amount of time.
The query performance of Race Central must be stable, and the administrative time it takes to perform
optimizations must be minimized.
The database for Mechanical Workflow must be moved to Azure SQL Data Warehouse.
Transparent data encryption (TDE) must be enabled on all data stores, whenever possible.
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for
Race Central. If the data load takes longer than 20 minutes, configuration changes must be made to
Data Factory.
The telemetry data must migrate toward a solution that is native to Azure.
The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB
Request Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the RU/s.
Data Masking Requirements
During race weekends, visitors will be able to enter the remote portable offices. Litware is concerned that
some proprietary information might be exposed. The company identifies the following data masking
requirements for the Race Central data that will be stored in SQL Database:
Only show the last four digits of the values in a column named SuspensionSprings.
Only show a zero value for the values in a column named ShockOilWeight.
QUESTION 1
HOTSPOT
You need to build a solution to collect the telemetry data for Race Central.
What should you use? To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
API: Table
Azure Cosmos DB provides native support for wire protocol-compatible APIs for popular databases. These
include MongoDB, Apache Cassandra, Gremlin, and Azure Table storage.
Scenario: The telemetry data must migrate toward a solution that is native to Azure.
Consistency level: Strong
20019535C3F31C49C9E768B2921390F7
Use the strongest consistency Strong to minimize convergence time.
Scenario: The data must be written to the Azure datacenter closest to each race and must converge in the
least amount of time.
Reference:
QUESTION 2
On which data store should you configure TDE to meet the technical requirements?
A. Cosmos DB
B. Azure Synapse Analytics
C. SQL Database
Correct Answer: B
Section: (none)
Explanation
Explanation:
Scenario: Transparent data encryption (TDE) must be enabled on all data stores, whenever possible.
The database for Mechanical Workflow must be moved to Azure Synapse Analytics.
Incorrect Answers:
A: Cosmos DB does not support TDE.
QUESTION 3
HOTSPOT
You are building the data store solution for Mechanical Workflow.
How should you configure Table1? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Table Type: Hash distributed.
Hash-distributed tables improve query performance on large fact tables.
Index type: Clusted columnstore
Scenario:
Mechanical Workflow has a named Table1 that is 1 TB. Large aggregations are performed on a single
column of Table 1.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute
QUESTION 4
HOTSPOT
Which masking functions should you implement for each column to meet the data masking requirements?
To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: Credit Card

The Credit Card Masking method exposes the last four digits of the designated fields and adds a constant
string as a prefix in the form of a credit card.
Example: XXXX-XXXX-XXXX-1234
Box 2: Default
Default uses a zero value for numeric data types (bigint, bit, decimal, int, money, numeric, smallint,
smallmoney, tinyint, float, real).
Scenario:
The company identifies the following data masking requirements for the Race Central data that will be
stored in SQL Database:
Reference: https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview
QUESTION 5
HOTSPOT
Which masking functions should you implement for each column to meet the data masking requirements?
To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Box 1: Custom text/string: A masking method, which exposes the first and/or last characters and adds a
custom padding string in the middle.
Box 2: Default
Default uses a zero value for numeric data types (bigint, bit, decimal, int, money, numeric, smallint,
smallmoney, tinyint, float, real).
Scenario: Only show a zero value for the values in a column named ShockOilWeight.
Scenario:
The company identifies the following data masking requirements for the Race Central data that will be
stored in SQL Database:
Reference:
https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview
20019535C3F31C49C9E768B2921390F7
Testlet 4
Case study
Overview
ADatum Corporation is a retailer that sells products through two sales channels: retail stores and a website.
Existing Environment
ADatum has one database server that has Microsoft SQL Server 2016 installed. The server hosts three
mission-critical databases named SALESDB, DOCDB, and REPORTINGDB.
SALESDB collects data from the stored and the website.
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in two
different JSON formats based on the sales channel.
REPORTINGDB stores reporting data and contains server columnstore indexes. A daily process creates
reporting data in REPORTINGDB from the data in SALESDB. The process is implemented as a SQL
Server Integration Services (SSIS) package that runs a stored procedure from SALESDB.
Requirements
Planned Changes
ADatum plans to move the current data infrastructure to Azure. The new infrastructure has the following
requirements:
Migrate SALESDB and REPORTINGDB to an Azure SQL database.

Migrate DOCDB to Azure Cosmos DB.
The sales data including the documents in JSON format, must be gathered as it arrives and analyzed
online by using Azure Stream Analytics. The analytic process will perform aggregations that must be
done continuously, without gaps, and without overlapping.
As they arrive, all the sales documents in JSON format must be transformed into one consistent format.
Azure Data Factory will replace the SSIS process of copying the data from SALESDB to
REPORTINGDB.
The new Azure data infrastructure must meet the following technical requirements:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must
use your own key.
SALESDB must be restorable to any given minute within the past three weeks.
Real-time processing must be monitored to ensure that workloads are sized properly based on actual
usage patterns.
Missing indexes must be created automatically for REPORTINGDB.
Disk IO, CPU, and memory usage must be monitored for SALESDB.
QUESTION 1
You need to configure a disaster recovery solution for SALESDB to meet the technical requirements.
What should you configure in the backup policy?
A. weekly long-term retention backups that are retained for three weeks
B. failover groups
C. a point-in-time restore
D. geo-replication
20019535C3F31C49C9E768B2921390F7
Correct Answer: C
Section: (none)
Explanation
Explanation:
Scenario: SALESDB must be restorable to any given minute within the past three weeks.
The Azure SQL Database service protects all databases with an automated backup system. These
backups are retained for 7 days for Basic, 35 days for Standard and 35 days for Premium. Point-in-time
restore is a self-service capability, allowing customers to restore a Basic, Standard or Premium database
from these backups to any point within the retention period.
References:
https://azure.microsoft.com/en-us/blog/azure-sql-database-point-in-time-restore/
QUESTION 2
You need to implement event processing by using Stream Analytics to produce consistent JSON
documents.
A. Define an output to Cosmos DB.

B. Define a query that contains a JavaScript user-defined aggregates (UDA) function.
C. Define a reference input.
D. Define a transformation query.
E. Define an output to Azure Data Lake Storage Gen2.
F. Define a stream input.
Correct Answer: DEF

Section: (none)
Explanation
Explanation:
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in
two different JSON formats based on the sales channel.
20019535C3F31C49C9E768B2921390F7
Manage and develop data processing
Question Set 1
QUESTION 1
You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to
an Azure Blob storage account.
You need to output the count of tweets during the last five minutes every five minutes.
Which windowing function should you use?
A. a five-minute Sliding window

B. a five-minute Session window
C. a five-minute Tumbling window
D. a five-minute Hopping window that has a one-minute hop
Correct Answer: C
Section: (none)
Explanation
Explanation:
Tumbling window functions are used to segment a data stream into distinct time segments and perform a
function against them, such as the example below. The key differentiators of a Tumbling window are that
they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
Incorrect Answers:
D: Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as
Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To
make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the
window size.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
QUESTION 2
20019535C3F31C49C9E768B2921390F7
You are developing a solution that will stream to Azure Stream Analytics. The solution will have both
streaming data and reference data.
Which input type should you use for the reference data?
A. Azure Cosmos DB
B. Azure Event Hubs
C. Azure Blob storage
D. Azure IoT Hub
Correct Answer: C
Section: (none)
Explanation
Explanation:
Stream Analytics supports Azure Blob storage and Azure SQL Database as the storage layer for Reference
Data.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data
QUESTION 3
HOTSPOT
You are implementing Azure Stream Analytics windowing functions.
Which windowing function should you use for each requirement? To answer, select the appropriate options
in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Box 1: Tumbling
Tumbling window functions are used to segment a data stream into distinct time segments and perform a
function against them, such as the example below. The key differentiators of a Tumbling window are that
they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
Box 2: Hoppping
20019535C3F31C49C9E768B2921390F7
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as
Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To
make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the
window size.
Box 3: Sliding
Sliding window functions, unlike Tumbling or Hopping windows, produce an output only when an event
occurs. Every window will have at least one event and the window continuously moves forward by an €
(epsilon). Like hopping windows, events can belong to more than one sliding window.
References:
QUESTION 4
DRAG DROP
You have an Azure Data Lake Storage Gen2 account that contains JSON files for customers. The files
contain two attributes named FirstName and LastName.
You need to copy the data from the JSON files to an Azure Synapse Analytics table by using Azure
Databricks. A new column must be created that concatenates the FirstName and LastName values.
You create the following components:
A destination table in Azure Synapse

An Azure Blob storage container
A service principal
Which five actions should you perform in sequence next in a Databricks notebook? To answer, move the
20019535C3F31C49C9E768B2921390F7
appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Read the file into a data frame.

You can load the json files as a data frame in Azure Databricks.
Step 2: Perform transformations on the data frame.
Step 3:Specify a temporary folder to stage the data

Specify a temporary folder to use while moving data between Azure Databricks and Azure Synapse.
20019535C3F31C49C9E768B2921390F7
Step 4: Write the results to a table in Azure Synapse.
You upload the transformed data frame into Azure Synapse. You use the Azure Synapse connector for
Azure Databricks to directly upload a dataframe as a table in a Azure Synapse.
Step 5: Drop the data frame

Clean up resources. You can terminate the cluster. From the Azure Databricks workspace, select Clusters
on the left. For the cluster to terminate, under Actions, point to the ellipsis (...) and select the Terminate
icon.
Reference:
https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
QUESTION 5
You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South
region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory.
The solution must meet the following requirements:
Ensure that the data remains in the UK South region at all times.
Minimize administrative effort.
Which type of integration runtime should you use?
A. Azure integration runtime

B. Self-hosted integration runtime
C. Azure-SSIS integration runtime
Correct Answer: A
Section: (none)
Explanation
Explanation:
Incorrect Answers:
B: Self-hosted integration runtime is to be used On-premises.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
QUESTION 6
You plan to perform batch processing in Azure Databricks once daily.
Which type of Databricks cluster should you use?
20019535C3F31C49C9E768B2921390F7
A. automated
B. interactive
C. High Concurrency
Correct Answer: A
Section: (none)
Explanation
Explanation:
Azure Databricks has two types of clusters: interactive and automated. You use interactive clusters to
analyze data collaboratively with interactive notebooks. You use automated clusters to run fast and robust
automated jobs.
Example: Scheduled batch workloads (data engineers running ETL jobs)

This scenario involves running batch job JARs and notebooks on a regular cadence through the Databricks
platform.
The suggested best practice is to launch a new cluster for each run of critical jobs. This helps avoid any
issues (failures, missing SLA, and so on) due to an existing workload (noisy neighbor) on a shared cluster.
Reference:
https://docs.databricks.com/administration-guide/cloud-configurations/aws/cmbp.html#scenario-3-
scheduled-batch-workloads-data-engineers-running-etl-jobs
QUESTION 7
HOTSPOT
You need to implement an Azure Databricks cluster that automatically connects to Azure Data Lake
Storage Gen2 by using Azure Active Directory (Azure AD) integration.
How should you configure the new cluster? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Box 1: High Concurrency
Enable Azure Data Lake Storage credential passthrough for a high-concurrency cluster.
Incorrect:
Support for Azure Data Lake Storage credential passthrough on standard clusters is in Public Preview.
Standard clusters with credential passthrough are supported on Databricks Runtime 5.5 and above and are
limited to a single user.
Box 2: Azure Data Lake Storage Gen1 Credential Passthrough

You can authenticate automatically to Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2
from Azure Databricks clusters using the same Azure Active Directory (Azure AD) identity that you use to
log into Azure Databricks. When you enable your cluster for Azure Data Lake Storage credential
passthrough, commands that you run on that cluster can read and write data in Azure Data Lake Storage
without requiring you to configure service principal credentials for access to storage.
References:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html
QUESTION 8
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data Warehouse. The
data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL Data
Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure storage account
3. Create an external file format and external table using the external data source
4. Load the data using the INSERT…SELECT statement
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
There is no need to convert the parquet files to CSV files.
You load the data using the CREATE TABLE AS SELECT statement.
References:
lake-store
QUESTION 9
Warehouse.
Solution:
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
References:
lake-store
QUESTION 10
Warehouse.
Solution:
2. Create a workload group using the Azure storage account name as the pool name
A. Yes
B. No
Correct Answer: B
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
You need to create an external file format and external table using the external data source.
You then load the data using the CREATE TABLE AS SELECT statement.
References:
lake-store
QUESTION 11
You develop data engineering solutions for a company.
You must integrate the company’s on-premises Microsoft SQL Server data with Microsoft Azure SQL
Database. Data must be transformed incrementally.
You need to implement the data integration solution.
Which tool should you use to configure a pipeline to copy data?
A. Use the Copy Data tool with Blob storage linked service as the source
B. Use Azure PowerShell with SQL Server linked service as a source
C. Use Azure Data Factory UI with Blob storage linked service as a source
D. Use the .NET Data Factory API with Blob storage linked service as the source
Correct Answer: C
Section: (none)
Explanation
Explanation:
The Integration Runtime is a customer managed data integration infrastructure used by Azure Data Factory
to provide data integration capabilities across different network environments.
A linked service defines the information needed for Azure Data Factory to connect to a data resource. We
have three resources in this scenario for which linked services are needed:
On-premises SQL Server
Azure Blob Storage
Azure SQL database
Note: Azure Data Factory is a fully managed cloud-based data integration service that orchestrates and
automates the movement and transformation of data. The key concept in the ADF model is pipeline. A
pipeline is a logical grouping of Activities, each of which defines the actions to perform on the data
contained in Datasets. Linked services are used to define the information needed for Data Factory to
connect to the data resources.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-sql-azure-adf
QUESTION 12
HOTSPOT
A company runs Microsoft Dynamics CRM with Microsoft SQL Server on-premises. SQL Server Integration
Services (SSIS) packages extract data from Dynamics CRM APIs, and load the data into a SQL Server
data warehouse.
The datacenter is running out of capacity. Because of the network configuration, you must extract on
premises data to the cloud over https. You cannot open any additional ports. The solution must implement
the least amount of effort.
You need to create the pipeline system.
20019535C3F31C49C9E768B2921390F7
Which component should you use? To answer, select the appropriate technology in the dialog box in the
answer area.
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: Source
For Copy activity, it requires source and sink linked services to define the direction of data flow.
Copying between a cloud data source and a data source in private network: if either source or sink linked
service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
Box 2: Self-hosted integration runtime

A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a
private network, and it can dispatch transform activities against compute resources in an on-premises
network or an Azure virtual network. The installation of a self-hosted integration runtime needs on an on-
premises machine or a virtual machine (VM) inside a private network.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime
QUESTION 13
DRAG DROP
A project requires analysis of real-time Twitter feeds. Posts that contain specific keywords must be stored
and processed on Microsoft Azure and then displayed by using Microsoft Power BI. You need to implement
the solution.
Which five actions should you perform in sequence? To answer, move the appropriate actions from the list
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Create an HDInisght cluster with the Spark cluster type
20019535C3F31C49C9E768B2921390F7
Step 2: Create a Jyputer Notebook
Step 3: Create a table

The Jupyter Notebook that you created in the previous step includes code to create an hvac table.
Step 4: Run a job that uses the Spark Streaming API to ingest data from Twitter
Step 5: Load the hvac table into Power BI Desktop

You use Power BI to create visualizations, reports, and dashboards from the Spark cluster data.
References:
https://acadgild.com/blog/streaming-twitter-data-using-spark
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-use-with-data-lake-store
QUESTION 14
DRAG DROP
Your company manages on-premises Microsoft SQL Server pipelines by using a custom solution.
The data engineering team must implement a process to pull data from SQL Server and migrate it to Azure
Blob storage. The process must orchestrate and manage the data lifecycle.
You need to configure Azure Data Factory to connect to the on-premises SQL Server database.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: Create a virtual private network (VPN) connection from on-premises to Microsoft Azure.
You can also use IPSec VPN or Azure ExpressRoute to further secure the communication channel
between your on-premises network and Azure.
Azure Virtual Network is a logical representation of your network in the cloud. You can connect an on-
premises network to your virtual network by setting up IPSec VPN (site-to-site) or ExpressRoute (private
peering).
Step 2: Create an Azure Data Factory resource.
Step 3: Configure a self-hosted integration runtime.

You create a self-hosted integration runtime and associate it with an on-premises machine with the SQL
Server database. The self-hosted integration runtime is the component that copies data from the SQL
Server database on your machine to Azure Blob storage.
Note: A self-hosted integration runtime can run copy activities between a cloud data store and a data store
in a private network, and it can dispatch transform activities against compute resources in an on-premises
network or an Azure virtual network. The installation of a self-hosted integration runtime needs on an on-
premises machine or a virtual machine (VM) inside a private network.
References:
https://docs.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-powershell
QUESTION 15
HOTSPOT
You are designing a new Lambda architecture on Microsoft Azure.
The real-time processing layer must meet the following requirements:
Ingestion:
Receive millions of events per second

Act as a fully managed Platform-as-a-Service (PaaS) solution
Integrate with Azure Functions
20019535C3F31C49C9E768B2921390F7
Stream processing:
Process on a per-job basis

Provide seamless connectivity with Azure services
Use a SQL-based query language
Analytical data store:
Act as a managed service

Use a document store
Provide data encryption at rest
You need to identify the correct technologies to build the Lambda architecture using minimal effort. Which
technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: Azure Event Hubs

This portion of a streaming architecture is often referred to as stream buffering. Options include Azure
Event Hubs, Azure IoT Hub, and Kafka.
Incorrect Answers: Not HDInsight Kafka

Azure Functions need a trigger defined in order to run. There is a limited set of supported trigger types, and
Kafka is not one of them.
Box 2: Azure Stream Analytics

Azure Stream Analytics provides a managed stream processing service based on perpetually running SQL
queries that operate on unbounded streams.
You can also use open source Apache streaming technologies like Storm and Spark Streaming in an
HDInsight cluster.
Box 3: Azure Synapse Analytics

Azure Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing.
HDInsight supports Interactive Hive, HBase, and Spark SQL, which can also be used to serve data for
analysis.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/
QUESTION 16
You need to ingest and visualize real-time Twitter data by using Microsoft Azure.
Which three technologies should you use? Each correct answer presents part of the solution.
20019535C3F31C49C9E768B2921390F7
A. Event Grid topic

B. Azure Stream Analytics Job that queries Twitter data from an Event Hub
C. Azure Stream Analytics Job that queries Twitter data from an Event Grid
D. Logic App that sends Twitter posts which have target keywords to Azure
E. Event Grid subscription
F. Event Hub instance
Correct Answer: BDF

Section: (none)
Explanation
Explanation:
You can use Azure Logic apps to send tweets to an event hub and then use a Stream Analytics job to read
from event hub and send them to PowerBI.
References:
https://community.powerbi.com/t5/Integrations-with-Files-and/Twitter-streaming-analytics-step-by-step/td-
p/9594
QUESTION 17
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain
the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks
environments:
The data engineers must share a cluster.

The job cluster will be managed by using a request process whereby data scientists and data engineers
provide packaged notebooks for deployment to the cluster.
All the data scientists must be assigned their own cluster that terminates automatically after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data
engineers, and a Standard cluster for the jobs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
We would need a High Concurrency cluster for the jobs.
Note:
Standard clusters are recommended for a single user. Standard can run workloads developed in any
language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are
that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum
query latencies.
References:
https://docs.azuredatabricks.net/clusters/configure.html
QUESTION 18
environments:

Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data
engineers, and a High Concurrency cluster for the jobs.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
We need a High Concurrency cluster for the data engineers and the jobs.
Note:
query latencies.
20019535C3F31C49C9E768B2921390F7
References:
QUESTION 19
environments:

Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the
data engineers, and a Standard cluster for the jobs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
No need for a High Concurrency cluster for each data scientist.
query latencies.
References:
QUESTION 20
You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct
values for a column named clusterID.
You monitor the Stream Analytics job and discover high latency.
You need to reduce the latency.
Which two actions should you perform? Each correct answer presents a complete solution.
20019535C3F31C49C9E768B2921390F7
A. Add a pass-through query.

B. Add a temporal analytic function.
C. Scale out the query by using PARTITION BY.
D. Convert the query to a reference query.
E. Increase the number of streaming units.
Correct Answer: CE
Section: (none)
Explanation
Explanation:
C: Scaling a Stream Analytics job takes advantage of partitions in the input or output. Partitioning lets you
divide data into subsets based on a partition key. A process that consumes the data (such as a Streaming
Analytics job) can consume and write different partitions in parallel, which increases throughput.
E: Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
job. This capacity lets you focus on the query logic and abstracts the need to manage the hardware to run
your Stream Analytics job in a timely manner.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
QUESTION 21
SIMULATION
20019535C3F31C49C9E768B2921390F7

You plan to generate large amounts of real-time data that will be copied to Azure Blob storage.
You plan to create reports that will read the data from an Azure Cosmos DB database.
You need to create an Azure Stream Analytics job that will input the data from a blob storage named
storage10277521 to the Cosmos DB database.

Section: (none)
Explanation
Explanation:
Step 1: Create a Stream Analytics job
1. Sign in to the Azure portal.
2. Select Create a resource in the upper left-hand corner of the Azure portal.
3. Select Analytics > Stream Analytics job from the results list.
4. Fill out the Stream Analytics job page.
20019535C3F31C49C9E768B2921390F7
5. Check the Pin to dashboard box to place your job on your dashboard and then select Create.
6. You should see a Deployment in progress... notification displayed in the top right of your browser
window.
Step 2: Configure job input
1. Navigate to your Stream Analytics job.
2. Select Inputs > Add Stream input > Azure Blob storage
20019535C3F31C49C9E768B2921390F7
3. In the Azure Blob storage setting choose: storage10277521. Leave other options to default values and
select Save to save the settings.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal
QUESTION 22
SIMULATION
20019535C3F31C49C9E768B2921390F7


You plan to deploy an integration runtime named Runtime1 to an Azure virtual machine.
You need to create an Azure Data Factory V2, and then prepare the required Data Factory resources for
App1.

Section: (none)
Explanation
Explanation:
Step 1: Create an Azure Data Factory V2
1. Go to the Azure portal.
2. Select Create a resource on the left menu, select Analytics, and then select Data Factory.
20019535C3F31C49C9E768B2921390F7
4. On the New data factory page, enter a name.
5. For Subscription, select your Azure subscription in which you want to create the data factory.
6. For Resource Group, use one of the following steps:
Select Use existing, and select an existing resource group from the list.
Select Create new, and enter the name of a resource group.
7. For Version, select V2.
8. For Location, select the location for the data factory.
9. Select Create.
10. After the creation is complete, you see the Data Factory page.
Step 2: Setup of the Integration Runtime Runtime1

High-level steps for creating a linked self-hosted IR
1. In the self-hosted IR Runtime to be shared, click Connections and Grant permission to another Data
Factory. .
20019535C3F31C49C9E768B2921390F7
2. Select the data factory you just created.
Note the resource ID of the self-hosted IR to be shared.
20019535C3F31C49C9E768B2921390F7
3. In the data factory to which the permissions were granted, create a new self-hosted IR (linked) and enter
the resource ID.
20019535C3F31C49C9E768B2921390F7
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#sharing-the-
self-hosted-integration-runtime-ir-with-multiple-data-factories
QUESTION 23
SIMULATION
20019535C3F31C49C9E768B2921390F7


You plan to create multiple pipelines in a new Azure Data Factory V2.
You need to create the data factory, and then create a scheduled trigger for the planned pipelines. The
trigger must execute every two hours starting at 24:00:00.

Section: (none)
Explanation
Explanation:
Step 1: Create a new Azure Data Factory V2

2. Select Create a resource on the left menu, select Analytics, and then select Data Factory.
20019535C3F31C49C9E768B2921390F7
4. On the New data factory page, enter a name.
5. For Subscription, select your Azure subscription in which you want to create the data factory.
6. For Resource Group, use one of the following steps:
Select Use existing, and select an existing resource group from the list.
Select Create new, and enter the name of a resource group.
7. For Version, select V2.
8. For Location, select the location for the data factory.
9. Select Create.
10. After the creation is complete, you see the Data Factory page.
Step 2: Create a schedule trigger for the Data Factory
1. Select the Data Factory you created, and switch to the Edit tab.
20019535C3F31C49C9E768B2921390F7
2. Click Trigger on the menu, and click New/Edit.
3. In the Add Triggers page, click Choose trigger..., and click New.
20019535C3F31C49C9E768B2921390F7
4. In the New Trigger page, do the following steps:
a. Confirm that Schedule is selected for Type.
b. Specify the start datetime of the trigger for Start Date (UTC) to: 24:00:00
c. Specify Recurrence for the trigger. Select Every Hour, and enter 2 in the text box.
20019535C3F31C49C9E768B2921390F7
5. In the New Trigger window, check the Activated option, and click Next.
6. In the New Trigger page, review the warning message, and click Finish.
7. Click Publish to publish changes to Data Factory. Until you publish changes to Data Factory, the trigger
does not start triggering the pipeline runs.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-schedule-trigger
QUESTION 24
Each day, company plans to store hundreds of files in Azure Blob Storage and Azure Data Lake Storage.
The company uses the parquet format.
You must develop a pipeline that meets the following requirements:
Process data every six hours

Offer interactive data analysis capabilities
Offer the ability to process data using solid-state drive (SSD) caching
Use Directed Acyclic Graph(DAG) processing mechanisms
Provide support for REST API calls to monitor processes
Provide native support for Python
Integrate with Microsoft Power BI
You need to select the appropriate data technology to implement the pipeline.
Which data technology should you implement?
A. Azure SQL Data Warehouse

B. HDInsight Apache Storm cluster
C. Azure Stream Analytics
D. HDInsight Apache Hadoop cluster using MapReduce
E. HDInsight Spark cluster
Correct Answer: B
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Storm runs topologies instead of the Apache Hadoop MapReduce jobs that you might be familiar with.
Storm topologies are composed of multiple components that are arranged in a directed acyclic graph
(DAG). Data flows between the components in the graph. Each component consumes one or more data
streams, and can optionally emit one or more streams.
Python can be used to develop Storm components.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-overview
QUESTION 25
HOTSPOT
A company is deploying a service-based data environment. You are developing a solution to process this
data.
Use an Azure HDInsight cluster for data ingestion from a relational database in a different cloud service
Use an Azure Data Lake Storage account to store processed data
Allow users to download processed data
You need to recommend technologies for the solution.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: Apache Sqoop

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and
structured datastores such as relational databases.
Azure HDInsight is a cloud distribution of the Hadoop components from the Hortonworks Data Platform
(HDP).
Incorrect Answers:
DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its
distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to
map tasks, each of which will copy a partition of the files specified in the source list. Its MapReduce
pedigree has endowed it with some quirks in both its semantics and execution.
RevoScaleR is a collection of proprietary functions in Machine Learning Server used for practicing data
science at scale. For data scientists, RevoScaleR gives you data-related functions for import,
transformation and manipulation, summarization, visualization, and analysis.
Box 2: Apache Kafka

Apache Kafka is a distributed streaming platform.
A streaming platform has three key capabilities:
Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
Store streams of records in a fault-tolerant durable way.
Process streams of records as they occur.
Kafka is generally used for two broad classes of applications:

Building real-time streaming data pipelines that reliably get data between systems or applications
20019535C3F31C49C9E768B2921390F7
Building real-time streaming applications that transform or react to the streams of data
Box 3: Ambari Hive View

You can run Hive queries by using Apache Ambari Hive View. The Hive View allows you to author,
optimize, and run Hive queries from your web browser.
References:
https://sqoop.apache.org/
https://kafka.apache.org/intro
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-ambari-view
QUESTION 26
A company uses Azure SQL Database to store sales transaction data. Field sales employees need an
offline copy of the database that includes last year’s sales on their laptops when there is no internet
connection available.
You need to create the offline export copy.
Which three options can you use? Each correct answer presents a complete solution.
A. Export to a BACPAC file by using Azure Cloud Shell, and save the file to an Azure storage account
B. Export to a BACPAC file by using SQL Server Management Studio. Save the file to an Azure storage
account
C. Export to a BACPAC file by using the Azure portal
D. Export to a BACPAC file by using Azure PowerShell and save the file locally
E. Export to a BACPAC file by using the SqlPackage utility
Correct Answer: BCE

Section: (none)
Explanation
Explanation:
You can export to a BACPAC file using the Azure portal.
You can export to a BACPAC file using SQL Server Management Studio (SSMS). The newest versions of
SQL Server Management Studio provide a wizard to export an Azure SQL database to a BACPAC file.
You can export to a BACPAC file using the SQLPackage utility.
Incorrect Answers:
D: You can export to a BACPAC file using PowerShell. Use the New-AzSqlDatabaseExport cmdlet to
submit an export database request to the Azure SQL Database service. Depending on the size of your
database, the export operation may take some time to complete. However, the file is not stored locally.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-export
QUESTION 27
Warehouse.
Solution:
1. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
20019535C3F31C49C9E768B2921390F7
3. Load the data using the CREATE TABLE AS SELECT statement
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
References:
lake-store
QUESTION 28
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure
Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
Solution:
1. Create a remote service binding pointing to the Azure Data Lake Gen 2 storage account
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
You need to create an external file format and external table from an external data source, instead from a
remote service binding pointing.
References:
lake-store
QUESTION 29
storage account.
20019535C3F31C49C9E768B2921390F7
Solution:
2. Create a workload group using the Azure storage account name as the pool name
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Use the Azure Data Lake Gen 2 storage account.
References:
lake-store
QUESTION 30
You need to develop a pipeline for processing data. The pipeline must meet the following requirements:
Scale up and down resources for cost reduction

Use an in-memory data processing engine to speed up ETL and machine learning operations.
Use streaming capabilities
Provide the ability to code in SQL, Python, Scala, and R
Integrate workspace collaboration with Git
What should you use?
A. HDInsight Spark Cluster

B. Azure Stream Analytics
C. HDInsight Hadoop Cluster
D. Azure SQL Data Warehouse
E. HDInsight Kafka Cluster
F. HDInsight Storm Cluster
Correct Answer: A
Section: (none)
Explanation
Explanation:
Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to
boost the performance of big-data analysis applications.
Languages: R, Python, Java, Scala, SQL
You can create an HDInsight Spark cluster using an Azure Resource Manager template. The template can
be found in GitHub.
References:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
QUESTION 31
DRAG DROP
20019535C3F31C49C9E768B2921390F7
You implement an event processing solution using Microsoft Azure Stream Analytics.
Ingest data from Blob storage

Analyze data in real time
Store processed data in Azure Cosmos DB
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Configure Blob storage as input; select items with the TIMESTAMP BY clause
The default timestamp of Blob storage events in Stream Analytics is the timestamp that the blob was last
modified, which is BlobLastModifiedUtcTime. To process the data as a stream using a timestamp in the
event payload, you must use the TIMESTAMP BY keyword.
20019535C3F31C49C9E768B2921390F7
Example:
The following is a TIMESTAMP BY example which uses the EntryTime column as the application time for
events:
SELECT TollId, EntryTime AS VehicleEntryTime, LicensePlate, State, Make, Model, VehicleType,

VehicleWeight, Toll, Tag
FROM TollTagEntry TIMESTAMP BY EntryTime
Step 2: Set up cosmos DB as the output

Creating Cosmos DB as an output in Stream Analytics generates a prompt for information as seen below.
Step 3: Create a query statement with the SELECT INTO statement.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-inputs
QUESTION 32
HOTSPOT
A company plans to use Platform-as-a-Service (PaaS) to create the new data pipeline process. The
process must meet the following requirements:
Ingest:
Access multiple data sources.

Provide the ability to orchestrate workflow.
Provide the capability to run SQL Server Integration Services packages.
20019535C3F31C49C9E768B2921390F7
Store:
Optimize storage for big data workloads

Provide encryption of data at rest.
Operate with no size limits.
Prepare and Train:
Provide a fully-managed and interactive workspace for exploration and visualization.

Provide the ability to program in R, SQL, Python, Scala, and Java.
Provide seamless user authentication with Azure Active Directory.
Model & Serve:
Implement native columnar storage.

Support for the SQL language.
Provide support for structured streaming.
You need to build the data integration pipeline.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Ingest: Azure Data Factory

Azure Data Factory pipelines can execute SSIS packages.
In Azure, the following services and tools will meet the core requirements for pipeline orchestration, control
flow, and data movement: Azure Data Factory, Oozie on HDInsight, and SQL Server Integration Services
(SSIS).
Store: Data Lake Storage

Data Lake Storage Gen1 provides unlimited storage.
Note: Data at rest includes information that resides in persistent storage on physical media, in any digital
format. Microsoft Azure offers a variety of data storage solutions to meet different needs, including file,
disk, blob, and table storage. Microsoft also provides encryption to protect Azure SQL Database, Azure
Cosmos DB, and Azure Data Lake.
20019535C3F31C49C9E768B2921390F7
Prepare and Train: Azure Databricks
Azure Databricks provides enterprise-grade Azure security, including Azure Active Directory integration.
With Azure Databricks, you can set up your Apache Spark environment in minutes, autoscale and
collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R,
Java and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch and scikit-
learn.
Model and Serve: Azure Synapse Analytics

Azure Synapse Analytics/ SQL Data Warehouse stores data into relational tables with columnar storage.
Azure SQL Data Warehouse connector now offers efficient and scalable structured streaming write support
for SQL Data Warehouse. Access SQL Data Warehouse from Azure Databricks using the SQL Data
Warehouse connector.
Note: Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
References:
https://docs.microsoft.com/bs-latn-ba/azure/architecture/data-guide/technology-choices/pipeline-
orchestration-data-movement
https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks
QUESTION 33
HOTSPOT
A company plans to analyze a continuous flow of data from a social media platform by using Microsoft
Azure Stream Analytics. The incoming data is formatted as one record per row.
You need to create the input stream.
How should you complete the REST API segment? To answer, select the appropriate configuration in the
answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV
file stores tabular data (numbers and text) in plain text. Each line of the file is a data record.
JSON and AVRO are not formatted as one record per row.
Box 2: "type":"Microsoft.ServiceBus/EventHub",
Properties include "EventHubName"
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-inputs
https://en.wikipedia.org/wiki/Comma-separated_values
QUESTION 34
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob
storage file named Customers. The file will contain both in-store and online customer details. The online
customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location.
The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to
Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one query, and two outputs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
We need one reference data input for LocationIncomes, which rarely changes.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely
static or changes slowly.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs#stream-and-
reference-inputs
QUESTION 35
20019535C3F31C49C9E768B2921390F7
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, one
query, and two outputs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
We need two queries, on for in-store customers, and one for online customers.
For each query two outputs is needed.
References:
reference-inputs
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
QUESTION 36
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, two
queries, and four outputs.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
We need two queries, on for in-store customers, and one for online customers.
For each query two outputs is needed.
20019535C3F31C49C9E768B2921390F7
References:
reference-inputs
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
QUESTION 37
environments:

Solution: You create a Standard cluster for each data scientist, a Standard cluster for the data engineers,
and a High Concurrency cluster for the jobs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
We need a High Concurrency cluster for the data engineers and the jobs.
Note:
query latencies.
References:
QUESTION 38
20019535C3F31C49C9E768B2921390F7
storage account.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
It is not necessary to convert the parquet files to CSV files.
References:
lake-store
QUESTION 39
You need to implement complex stateful business logic within an Azure Stream Analytics service.
Which type of function should you create in the Stream Analytics topology?
A. JavaScript user-define functions (UDFs)

B. Azure Machine Learning
C. JavaScript user-defined aggregates (UDA)
Correct Answer: C
Section: (none)
Explanation
Explanation:
Azure Stream Analytics supports user-defined aggregates (UDA) written in JavaScript, it enables you to
implement complex stateful business logic. Within UDA you have full control of the state data structure,
state accumulation, state decumulation, and aggregate result computation.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-javascript-user-defined-
aggregates
QUESTION 40
You have an Azure virtual machine that has Microsoft SQL Server installed. The server contains a table
named Table1.
You need to copy the data from Table1 to an Azure Data Lake Storage Gen2 account by using an Azure
Data Factory V2 copy activity.
20019535C3F31C49C9E768B2921390F7
Which type of integration runtime should you use?
A. Azure integration runtime

B. self-hosted integration runtime
C. Azure-SSIS integration runtime
Correct Answer: B
Section: (none)
Explanation
Explanation:
Copying between a cloud data source and a data source in private network: if either source or sink linked
service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
References:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime#determining-which-ir-to-
use
QUESTION 41
DRAG DROP
Your company plans to create an event processing engine to handle streaming data from Twitter.
The data engineering team uses Azure Event Hubs to ingest the streaming data.
You need to implement a solution that uses Azure Databricks to receive the streaming data from the Azure
Event Hubs.
Which three actions should you recommend be performed in sequence? To answer, move the appropriate
actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: Deploy the Azure Databricks service

Create an Azure Databricks workspace by setting up an Azure Databricks Service.
Step 2: Deploy a Spark cluster and then attach the required libraries to the cluster.
To create a Spark cluster in Databricks, in the Azure portal, go to the Databricks workspace that you
created, and then select Launch Workspace.
Attach libraries to Spark cluster: you use the Twitter APIs to send tweets to Event Hubs. You also use the
Apache Spark Event Hubs connector to read and write data into Azure Event Hubs. To use these APIs as
part of your cluster, add them as libraries to Azure Databricks and associate them with your Spark cluster.
Step 3: Create and configure a Notebook that consumes the streaming data.
You create a notebook named ReadTweetsFromEventhub in Databricks workspace.
ReadTweetsFromEventHub is a consumer notebook you use to read the tweets from Event Hubs.
References:
https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-stream-from-eventhubs
QUESTION 42
HOTSPOT
A project requires an in-memory batch data processing solution.
You need to provision an HDInsight cluster for batch processing of data on Microsoft Azure.
How should you complete the PowerShell segment? To answer, select the appropriate options in the
answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: New-AzStorageContainer
# Example: Create a blob container. This holds the default data store for the cluster.
New-AzStorageContainer `
-Name $clusterName `
-Context $defaultStorageContext
$sparkConfig = New-Object "System.Collections.Generic.Dictionary``2[System.String,System.String]"

$sparkConfig.Add("spark", "2.3")
Box 2: Spark
Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into
memory and query it repeatedly. In-memory computing is much faster than disk-based applications than
disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system
(HDFS).
Box 3: New-AzureRMHDInsightCluster
# Create the HDInsight cluster. Example:
New-AzHDInsightCluster `
-ResourceGroupName $resourceGroupName `
-ClusterName $clusterName `
-Location $location `
-ClusterSizeInNodes $clusterSizeInNodes `
-ClusterType $"Spark" `
20019535C3F31C49C9E768B2921390F7
-OSType "Linux" `
Box 4: Spark
References:
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/spark/apache-spark-jupyter-spark-sql-use-powershell
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/spark/apache-spark-overview
QUESTION 43
HOTSPOT
A company plans to develop solutions to perform batch processing of multiple sets of geospatial data.
You need to implement the solutions.
Which Azure services should you use? To answer, select the appropriate configuration in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: HDInsight Tools for Visual Studio

Azure HDInsight Tools for Visual Studio Code is an extension in the Visual Studio Code Marketplace for
developing Hive Interactive Query, Hive Batch Job and PySpark Job against Microsoft HDInsight.
Box 2: Hive View

You can use Apache Ambari Hive View with Apache Hadoop in HDInsight. The Hive View allows you to
author, optimize, and run Hive queries from your web browser.
Box 3: HDInsight REST API

Azure HDInsight REST APIs are used to create and manage HDInsight resources through Azure Resource
Manager.
References:
https://visualstudiomagazine.com/articles/2019/01/25/vscode-hdinsight.aspx
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-ambari-view
https://docs.microsoft.com/en-us/rest/api/hdinsight/
QUESTION 44
DRAG DROP
You are creating a managed data warehouse solution on Microsoft Azure.
You must use PolyBase to retrieve data from Azure Blob storage that resides in parquet format and load
the data into a large table called FactSalesOrderDetails.
20019535C3F31C49C9E768B2921390F7
You need to configure Azure Synapse Analytics to receive the data.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: Create a master key on the database

Create a master key on the database. This is required to encrypt the credential secret.
Step 2: Create an external data source for Azure Blob storage

Create an external data source with CREATE EXTERNAL DATA SOURCE..
Step 3: Create an external file format to map parquet files.

Create an external file format with CREATE EXTERNAL FILE FORMAT.
FORMAT TYPE: Type of format in Hadoop (DELIMITEDTEXT, RCFILE, ORC, PARQUET).
Step 4: Create the external table FactSalesOrderDetails

To query the data in your Hadoop data source, you must define an external table to use in Transact-SQL
queries.
Create an external table pointing to data stored in Azure storage with CREATE EXTERNAL TABLE.
Note: PolyBase is a technology that accesses and combines both non-relational and relational data, all
from within SQL Server. It allows you to run queries on external data in Hadoop or Azure blob storage.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-azure-blob-storage
QUESTION 45
20019535C3F31C49C9E768B2921390F7
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two outputs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
We need one reference data input for LocationIncomes, which rarely changes
Reference:
reference-inputs
QUESTION 46
DRAG DROP
You need to deploy a Microsoft Azure Stream Analytics job for an IoT solution. The solution must:
Minimize latency.
Minimize bandwidth usage between the job and IoT device.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Create and IoT hub and add the Azure Stream Analytics module to the IoT Hub namespace
An IoT Hub in Azure is required.
20019535C3F31C49C9E768B2921390F7
Step 2: Create an Azure Blob Storage container
To prepare your Stream Analytics job to be deployed on an IoT Edge device, you need to associate the job
with a container in a storage account. When you go to deploy your job, the job definition is exported to the
storage container.
Stream Analytics accepts data incoming from several kinds of event sources including Event Hubs, IoT
Hub, and Blob storage.
Step 3: Create an Azure Stream Analytics edge job and configure job definition save location
When you create an Azure Stream Analytics job to run on an IoT Edge device, it needs to be stored in a
way that can be called from the device.
Step 4: Configure routes

You are now ready to deploy the Azure Stream Analytics job on your IoT Edge device.
The routes that you declare define the flow of data through the IoT Edge device.
Example:
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs
https://docs.microsoft.com/en-us/azure/iot-edge/tutorial-deploy-stream-analytics
QUESTION 47
DRAG DROP
You have data stored in thousands of CSV files in Azure Data Lake Storage Gen2. Each file has a header
row followed by a property formatted carriage return (/r) and line feed (/n).
You are implementing a pattern that batch loads the files daily into an enterprise data warehouse in Azure
Synapse Analytics by using PolyBase.
You need to skip the header row when you import the files into the data warehouse.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: Create an external data source and set the First_Row option.
Creates an External File Format object defining external data stored in Hadoop, Azure Blob Storage, or
Azure Data Lake Store. Creating an external file format is a prerequisite for creating an External Table.
FIRST_ROW = First_row_int
Specifies the row number that is read first in all files during a PolyBase load. This parameter can take
values 1-15. If the value is set to two, the first row in every file (header row) is skipped when the data is
loaded. Rows are skipped based on the existence of row terminators (/r/n, /r, /n).
Step 2: Create an external data source that uses the abfs location
The hadoop-azure module provides support for the Azure Data Lake Storage Gen2 storage layer through
the “abfs” connector
Step 3: Use CREATE EXTERNAL TABLE AS SELECT (CETAS) and create a view that removes the empty
row.
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql
https://hadoop.apache.org/docs/r3.2.0/hadoop-azure/abfs.html
QUESTION 48
You are creating a new notebook in Azure Databricks that will support R as the primary language but will
also support Scala and SQL.
Which switch should you use to switch between languages?
A. %<language>
B. \\[<language>]
C. \\(<language>)
D. @<Language>
20019535C3F31C49C9E768B2921390F7
Correct Answer: A
Section: (none)
Explanation
Explanation:
You can override the primary language by specifying the language magic command %<language> at the
beginning of a cell. The supported magic commands are: %python, %r, %scala, and %sql.
References:
https://docs.databricks.com/user-guide/notebooks/notebook-use.html#mix-languages
QUESTION 49
HOTSPOT
You are implementing mapping data flows in Azure Data Factory to convert daily logs of taxi records into
aggregated datasets.
You configure a data flow and receive the error shown in the following exhibit.
You need to resolve the error.
Which setting should you configure? To answer, select the appropriate setting in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
The Inspect tab provides a view into the metadata of the data stream that you're transforming. You can see
column counts, the columns changed, the columns added, data types, the column order, and column
references. Inspect is a read-only view of your metadata. You don't need to have debug mode enabled to
see metadata in the Inspect pane.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview
QUESTION 50
HOTSPOT
You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB.
The data consumed from each source is shown in the following table.
20019535C3F31C49C9E768B2921390F7
You need to implement Azure Stream Analytics to calculate the average fare per mile by driver.
How should you configure the Stream Analytics input for each source? To answer, select the appropriate
options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
HubA: Stream
HubB: Stream
Database1: Reference
Reference data (also known as a lookup table) is a finite data set that is static or slowly changing in nature,
used to perform a lookup or to augment your data streams. For example, in an IoT scenario, you could
store metadata about sensors (which don’t change often) in reference data and join it with real time IoT
data streams. Azure Stream Analytics loads reference data in memory to achieve low latency stream
processing
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data
20019535C3F31C49C9E768B2921390F7
Testlet 2
Background
Polling data

Poll metadata
Phone-based polling
Security

Performance
Deployments
Reliability
Monitoring
be monitored.
QUESTION 1
You need to ensure that phone-based poling data can be analyzed in the PollingData database.
How should you configure Azure Data Factory?
A. Use a tumbling schedule trigger

B. Use an event-based trigger
20019535C3F31C49C9E768B2921390F7
C. Use a schedule trigger
D. Use manual execution
Correct Answer: C
Section: (none)
Explanation
Explanation:
When creating a schedule trigger, you specify a schedule (start date, recurrence, end date etc.) for the
trigger, and associate with a Data Factory pipeline.
Scenario:
References:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-schedule-trigger
QUESTION 2
HOTSPOT
You need to ensure that Azure Data Factory pipelines can be deployed. How should you configure
authentication and authorization for deployments? To answer, select the appropriate options in the answer
choices.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
The way you control access to resources using RBAC is to create role assignments. This is a key concept
to understand – it’s how permissions are enforced. A role assignment consists of three elements: security
principal, role definition, and scope.
Scenario:
References:
https://docs.microsoft.com/en-us/azure/role-based-access-control/overview
20019535C3F31C49C9E768B2921390F7
Testlet 3
Overview
Current environment
Contoso relies on an extensive partner network for marketing, sales, and distribution. Contoso uses
external companies that manufacture everything from the actual pharmaceutical to the packaging.
The majority of the company’s data reside in Microsoft SQL Server database. Application databases fall
into one of the following tiers:
The company has a reporting infrastructure that ingests data from local databases and partner services.
Partners services consists of distributors, wholesales, and retailers across the world. The company
performs daily, weekly, and monthly reporting.
Requirements
Tier 3 and Tier 6 through Tier 8 application must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
Tier 1 internal applications on the premium P2 tier

Tier 2 internal applications on the standard S4 tier
The solution must support migrating databases that support external and internal application to Azure SQL
Database. The migrated databases will be supported by Azure Data Factory pipelines for the continued
movement, migration and updating of data both in the cloud and from local core business systems and
repositories.
Tier 7 and Tier 8 partner access must be restricted to the database only.
In addition to default Azure backup behavior, Tier 4 and 5 databases must be on a backup strategy that
performs a transaction log backup eve hour, a differential backup of databases every day and a full back
up every week.
Back up strategies must be put in place for all other standalone Azure SQL Databases using Azure SQL-
provided backup storage and capabilities.
Databases
Contoso requires their data estate to be designed and implemented in the Azure Cloud. Moving to the
cloud must not inhibit access to or availability of data.
Databases:
20019535C3F31C49C9E768B2921390F7
Tier 1 Database must implement data masking using the following masking logic:
Tier 2 databases must sync between branches and cloud databases and in the event of conflicts must be
set up for conflicts to be won by on-premises databases.
Tier 3 and Tier 6 through Tier 8 applications must use database density on the same server and Elastic
case of a server going offline.

Reporting
Security and monitoring
Security
A method of managing multiple databases in the cloud at the same time is must be implemented to
streamlining data management and limiting management access to only those requiring access.
Monitoring
Monitoring must be set up on every database. Contoso and partners must receive performance reports as
part of contractual agreements.
Tiers 6 through 8 must have unexpected resource storage usage immediately reported to data engineers.
The Azure SQL Data Warehouse cache must be monitored when the database is being used. A dashboard
monitoring key performance indicators (KPIs) indicated by traffic lights must be created and displayed
based on the following metrics:
Existing Data Protection and Security compliances require that all certificates and keys are internally
managed in an on-premises storage.
You identify the following reporting requirements:
Azure Data Warehouse must be used to gather and query data from multiple internal and external
databases
Azure Data Warehouse must be optimized to use data from a cache
20019535C3F31C49C9E768B2921390F7
Reporting data aggregated for external partners must be stored in Azure Storage and be made
available during regular business hours in the connecting regions
Reporting strategies must be improved to real time or near real time reporting cadence to improve
competitiveness and the general supply chain
Tier 9 reporting must be moved to Event Hubs, queried, and persisted in the same Azure region as the
company’s main office
Tier 10 reporting data must be stored in Azure Blobs
Issues
Team members identify the following issues:
Both internal and external client application run complex joins, equality searches and group-by clauses.
Because some systems are managed externally, the queries will not be changed or optimized by
Contoso
External partner organization data formats, types and schemas are controlled by the partner companies
Internal and external database development staff resources are primarily SQL developers familiar with
the Transact-SQL language.
Size and amount of data has led to applications and reporting solutions not performing are required
speeds
Tier 7 and 8 data access is constrained to single endpoints managed by partners for access
The company maintains several legacy client applications. Data for these applications remains isolated
form other applications. This has led to hundreds of databases being provisioned on a per application
basis
QUESTION 1
You need to process and query ingested Tier 9 data.
Which two options should you use? Each correct answer presents part of the solution.
A. Azure Notification Hub

B. Transact-SQL statements
C. Azure Cache for Redis
D. Apache Kafka statements
E. Azure Event Grid
F. Azure Stream Analytics
Correct Answer: EF
Section: (none)
Explanation
Explanation:
Event Hubs provides a Kafka endpoint that can be used by your existing Kafka based applications as an
alternative to running your own Kafka cluster.
You can stream data into Kafka-enabled Event Hubs and process it with Azure Stream Analytics, in the
following steps:
Create a Kafka enabled Event Hubs namespace.
Create a Kafka client that sends messages to the event hub.
Create a Stream Analytics job that copies data from the event hub into an Azure blob storage.
Scenario:
References:
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-stream-analytics
20019535C3F31C49C9E768B2921390F7
QUESTION 2
HOTSPOT
You need set up the Azure Data Factory JSON definition for Tier 10 data.
What should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: Connection String
20019535C3F31C49C9E768B2921390F7
To use storage account key authentication, you use the ConnectionString property, which xpecify the
information needed to connect to Blobl Storage.
Mark this field as a SecureString to store it securely in Data Factory. You can also put account key in Azure
Key Vault and pull the accountKey configuration out of the connection string.
Box 2: Azure Blob

References:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage
QUESTION 3
You need to set up Azure Data Factory pipelines to meet data movement requirements.
Which integration runtime should you use?
A. self-hosted integration runtime

B. Azure-SSIS Integration Runtime
C. .NET Common Language Runtime (CLR)
D. Azure integration runtime
Correct Answer: A
Section: (none)
Explanation
Explanation:
The following table describes the capabilities and network support for each of the integration runtime types:
Scenario: The solution must support migrating databases that support external and internal application to
Azure SQL Database. The migrated databases will be supported by Azure Data Factory pipelines for the
continued movement, migration and updating of data both in the cloud and from local core business
systems and repositories.
References:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
20019535C3F31C49C9E768B2921390F7
Testlet 4
Case Study
in this case study.

Overview
General Overview
series.
Physical Locations
Germany.
datacenter in the London office. The portable office is set up and torn down in approximately 20 different
Race Central
sensors that send real-time telemetry data to the London datacentre. The data is used for real-time tracking
of the cars.
Mechanical Workflow
20019535C3F31C49C9E768B2921390F7
Requirements
Planned Changes
amount of time.
Data Factory.
QUESTION 1
What should you include in the Data Factory pipeline for Race Central?
A. a copy activity that uses a stored procedure as a source

B. a copy activity that contains schema mappings
C. a delete activity that has logging enabled
D. a filter activity that has a condition
Correct Answer: B
Section: (none)
Explanation
Explanation:
Scenario:
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for Race
Central. If the data load takes longer than 20 minutes, configuration changes must be made to Data
Factory.
You can copy data to or from Azure Cosmos DB (SQL API) by using Azure Data Factory pipeline.
20019535C3F31C49C9E768B2921390F7
Column mapping applies when copying data from source to sink. By default, copy activity map source data
to sink by column names. You can specify explicit mapping to customize the column mapping based on
your need. More specifically, copy activity:
Read the data from source and determine the source schema
1. Use default column mapping to map columns by name, or apply explicit column mapping if specified.
2. Write the data to sink
3. Write the data to sink
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping
20019535C3F31C49C9E768B2921390F7
Testlet 5
Case study
Overview
Requirements
Planned Changes
requirements:

REPORTINGDB.
use your own key.
usage patterns.
QUESTION 1
Which windowing function should you use to perform the streaming aggregation of the sales data?
A. Tumbling
B. Hopping
C. Sliding
D. Session
Correct Answer: A
Section: (none)
20019535C3F31C49C9E768B2921390F7
Explanation
Explanation:
Scenario: The analytic process will perform aggregations that must be done continuously, without gaps,
and without overlapping.
The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot
belong to more than one tumbling window.
Incorrect Answers:
B, C: Like hopping windows, events can belong to more than one sliding window.
D: Session windows can have gaps.
References:
QUESTION 2
DRAG DROP
You need to replace the SSIS process by using Data Factory.
Select and Place:
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Scenario: A daily process creates reporting data in REPORTINGDB from the data in SALESDB. The
process is implemented as a SQL Server Integration Services (SSIS) package that runs a stored procedure
from SALESDB.
Step 1: Create a linked service to each database
Step 2: Create two datasets

You can create two datasets: InputDataset and OutputDataset. These datasets are of type AzureBlob.
They refer to the Azure Storage linked service that you created in the previous section.
Step 3: Create a pipeline

You create and validate a pipeline with a copy activity that uses the input and output datasets.
Step 4: Add a copy activity
References:
20019535C3F31C49C9E768B2921390F7
Manage data security
Question Set 1
QUESTION 1
You plan to use Microsoft Azure SQL Database instances with strict user access control. A user object
must:
Move with the database if it is run elsewhere

Be able to create additional users
You need to create the user object with correct permissions.
Which two Transact-SQL commands should you run? Each correct answer presents part of the solution.
A. ALTER LOGIN Mary WITH PASSWORD = 'strong_password';

B. CREATE LOGIN Mary WITH PASSWORD = 'strong_password';
C. ALTER ROLE db_owner ADD MEMBER Mary;
D. CREATE USER Mary WITH PASSWORD = 'strong_password';
E. GRANT ALTER ANY USER TO Mary;
Correct Answer: CD
Section: (none)
Explanation
Explanation:
C: ALTER ROLE adds or removes members to or from a database role, or changes the name of a user-
defined database role.
Members of the db_owner fixed database role can perform all configuration and maintenance activities on
the database, and can also drop the database in SQL Server.
D: CREATE USER adds a user to the current database.
Note: Logins are created at the server level, while users are created at the database level. In other words, a
login allows you to connect to the SQL Server service (also called an instance), and permissions inside the
database are granted to the database users, not the logins. The logins will be assigned to server roles (for
example, serveradmin) and the database users will be assigned to roles within that database (eg.
db_datareader, db_bckupoperator).
References:
https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-role-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-user-transact-sql
QUESTION 2
DRAG DROP
You manage security for a database that supports a line of business application.
Private and personal data stored in the database must be protected and encrypted.
You need to configure the database to use Transparent Data Encryption (TDE).
Which five actions should you perform in sequence? To answer, select the appropriate actions from the list
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Create a master key
Step 2: Create or obtain a certificate protected by the master key
Step 3: Set the context to the company database
20019535C3F31C49C9E768B2921390F7
Step 4: Create a database encryption key and protect it by the certificate
Step 5: Set the database to use encryption
Example code:
USE master;
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<UseStrongPasswordHere>';
go
CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate';
go
USE AdventureWorks2012;
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
GO
ALTER DATABASE AdventureWorks2012
SET ENCRYPTION ON;
GO
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption
QUESTION 3
SIMULATION


20019535C3F31C49C9E768B2921390F7
You need to ensure that only the resources on a virtual network named VNET1 can access an Azure
Storage account named storage10543936.

Section: (none)
Explanation
Explanation:
You can use Private Endpoints for your Azure Storage accounts to allow clients on a virtual network (VNet)
to securely access data over a Private Link.
Create your Private Endpoint

1. On the upper-left side of the screen in the Azure portal, Storage > Storage account, and select your
storage account storage10543936
2. Select Networking.
3. Select Add Private Endpoint.
4. In Create Private Endpoint, enter or select this information:

Virtual network: Select VNET1 from the resource group.
5. Select OK.
20019535C3F31C49C9E768B2921390F7
6. Select Review + create. You're taken to the Review + create page where Azure validates your
configuration.
Reference:
https://docs.microsoft.com/en-us/azure/private-link/create-private-endpoint-storage-portal
QUESTION 4
SIMULATION


You need to replicate db1 to a new Azure SQL server named db1-copy10543936 in the US West region.

Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
1. In the Azure portal, browse to the database db1-copy10543936 that you want to set up for geo-
replication.
2. On the SQL database page, select geo-replication, and then select the region to create the secondary
database: US West region
3. Select or configure the server and pricing tier for the secondary database.
20019535C3F31C49C9E768B2921390F7
4. Click Create to add the secondary.
5. The secondary database is created and the seeding process begins.
20019535C3F31C49C9E768B2921390F7
6. When the seeding process is complete, the secondary database displays its status.
Reference:
QUESTION 5
SIMULATION
20019535C3F31C49C9E768B2921390F7

You need to ensure that you can recover any blob data from an Azure Storage account named
storage10543936 up to 10 days after the data is deleted.

Section: (none)
Explanation
Explanation:
Enable soft delete for blobs on your storage account by using Azure portal:
1. In the Azure portal, select your storage account.
2. Navigate to the Data Protection option under Blob Service.
3. Click Enabled under Blob soft delete
20019535C3F31C49C9E768B2921390F7
4. Enter the number of days you want to retain for under Retention policies. Here enter 10.
5. Choose the Save button to confirm your Data Protection settings
Note: Azure Storage now offers soft delete for blob objects so that you can more easily recover your data
when it is erroneously modified or deleted by an application or other storage account user. Currently you
can retain soft deleted data for between 1 and 365 days.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-soft-delete
QUESTION 6
DRAG DROP
You plan to create a new single database instance of Microsoft Azure SQL Database.
The database must only allow communication from the data engineer’s workstation. You must connect
directly to the instance by using Microsoft SQL Server Management Studio.
You need to create and configure the Database. Which three Azure PowerShell cmdlets should you use to
develop the solution? To answer, move the appropriate cmdlets from the list of cmdlets to the answer area
and arrange them in the correct order.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: New-AzureSqlServer
Create a server.
Step 2: New-AzureRmSqlServerFirewallRule
New-AzureRmSqlServerFirewallRule creates a firewall rule for a SQL Database server.
Can be used to create a server firewall rule that allows access from the specified IP range.
Step 3: New-AzureRmSqlDatabase
Example: Create a database on a specified server
PS C:\>New-AzureRmSqlDatabase -ResourceGroupName "ResourceGroup01" -ServerName "Server01" -

DatabaseName "Database01
References:
https://docs.microsoft.com/en-us/azure/sql-database/scripts/sql-database-create-and-configure-database-
powershell?toc=%2fpowershell%2fmodule%2ftoc.json
QUESTION 7
HOTSPOT
Your company uses Azure SQL Database and Azure Blob storage.
All data at rest must be encrypted by using the company’s own key. The solution must minimize
administrative effort and the impact to applications which use the database.
You need to configure security.
What should you implement? To answer, select the appropriate option in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: transparent data encryption

TDE with customer-managed keys in Azure Key Vault allows to encrypt the Database Encryption Key
(DEK) with a customer-managed asymmetric key called TDE Protector. This is also generally referred to as
Bring Your Own Key (BYOK) support for Transparent Data Encryption.
Note: Transparent data encryption encrypts the storage of an entire database by using a symmetric key
called the database encryption key. This database encryption key is protected by the transparent data
encryption protector.
Transparent data encryption (TDE) helps protect Azure SQL Database, Azure SQL Managed Instance, and
Azure Data Warehouse against the threat of malicious offline activity by encrypting data at rest. It performs
real-time encryption and decryption of the database, associated backups, and transaction log files at rest
without requiring changes to the application.
Box 2: Storage account keys

You can rely on Microsoft-managed keys for the encryption of your storage account, or you can manage
encryption with your own keys, together with Azure Key Vault.
References:
https://docs.microsoft.com/en-us/azure/sql-database/transparent-data-encryption-azure-sql
https://docs.microsoft.com/en-us/azure/storage/common/storage-service-encryption
QUESTION 8
A project requires the deployment of data to Azure Data Lake Storage.
You need to implement role-based access control (RBAC) so that project members can manage the Azure
Data Lake Storage resources.
A. Assign Azure AD security groups to Azure Data Lake Storage.

B. Configure end-user authentication for the Azure Data Lake Storage account.
C. Configure service-to-service authentication for the Azure Data Lake Storage account.
D. Create security groups in Azure Active Directory (Azure AD) and add project members.
E. Configure access control lists (ACL) for the Azure Data Lake Storage account.
Correct Answer: ADE

Section: (none)
Explanation
Explanation:
AD: Create security groups in Azure Active Directory. Assign users or security groups to Data Lake Storage
Gen1 accounts.
E: Assign users or security groups as ACLs to the Data Lake Storage Gen1 file system
References:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
QUESTION 9
DRAG DROP
20019535C3F31C49C9E768B2921390F7
You deploy an Azure SQL database named DB1 to an Azure SQL server named SQL1.
Currently, only the server admin has access to DB1.
An Azure Active Directory (Azure AD) group named Analysts contains all the users who must have access
to DB1.
You have the following data security requirements:
The Analysts group must have read-only access to all the views and tables in the Sales schema of DB1.
A manager will decide who can access DB1. The manager will not interact directly with DB1.
Users must not have to manage a separate password solely to access DB1.
Which four actions should you perform in sequence to meet the data security requirements? To answer,
move the appropriate actions from the list of actions to the answer area and arrange them in the correct
order.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: From the Azure Portal, set the Active Directory admin for SQL1.
Provision an Azure Active Directory administrator for your Azure SQL Database server.
You can provision an Azure Active Directory administrator for your Azure SQL server in the Azure portal
and by using PowerShell.
Step 2: On DB1, create a contained user for the Analysts group by using Transact-SQL
Create contained database users in your database mapped to Azure AD identities.
To create an Azure AD-based contained database user (other than the server administrator that owns the
database), connect to the database with an Azure AD identity, as a user with at least the ALTER ANY
USER permission. Then use the following Transact-SQL syntax:
CREATE USER <Azure_AD_principal_name> FROM EXTERNAL PROVIDER;
Step 3: From Microsoft SQL Server Management Studio (SSMS), sign in to SQL1 by using the account set
as the Active Directory admin.
Connect to the user database or data warehouse by using SSMS or SSDT
20019535C3F31C49C9E768B2921390F7
To confirm the Azure AD administrator is properly set up, connect to the master database using the Azure
AD administrator account. To provision an Azure AD-based contained database user (other than the server
administrator that owns the database), connect to the database with an Azure AD identity that has access
to the database.
Step 4: On DB1, grant the VIEW and SELECT DEFINTION..
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-aad-authentication-configure
QUESTION 10
DRAG DROP
You have an Azure subscription that contains an Azure Databricks environment and an Azure Storage
account.
You need to implement secure communication between Databricks and the storage account.
You create an Azure key vault.
Which four actions should you perform in sequence? To answer, move the actions from the list of actions
to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Mount the storage account
Step 2: Retrieve an access key from the storage account.
Step 3: Add a secret to the key vault.
Step 4: Add a secret scope to the Databricks environment.
20019535C3F31C49C9E768B2921390F7
Managing secrets begins with creating a secret scope.
To reference secrets stored in an Azure Key Vault, you can create a secret scope backed by Azure Key
Vault.
References:
https://docs.microsoft.com/en-us/azure/azure-databricks/store-secrets-azure-key-vault
QUESTION 11
You have an Azure SQL server named Server1 that hosts two development databases named DB1 and
DB2.
You have an administrative workstation that has an IP address of 192.168.8.8. The development team at
your company has an IP addresses in the range of 192.168.8.1 to 192.168.8.5.
You need to set up firewall rules to meet the following requirements:
Allows connection from your workstation to both databases.

The development team must be able connect to DB1 but must be prevented from connecting to DB2.
Web services running in Azure must be able to connect to DB1 but must be prevented from connecting
to DB2.
A. Create a firewall rule on DB1 that has a start IP address of 192.168.8.1 and an end IP address of
192.168.8.5.
B. Create a firewall rule on DB1 that has a start and end IP address of 0.0.0.0.
C. Create a firewall rule on Server1 that has a start IP address of 192.168.8.1 and an end IP address of
192.168.8.5.
D. Create a firewall rule on DB1 that has a start and end IP address of 192.168.8.8.
E. Create a firewall rule on Server1 that has a start and end IP address of 192.168.8.8.
Correct Answer: ACE

Section: (none)
Explanation
QUESTION 12
DRAG DROP
You have an ASP.NET web app that uses an Azure SQL database. The database contains a table named
Employee. The table contains sensitive employee information, including a column named DateOfBirth.
You need to ensure that the data in the DateOfBirth column is encrypted both in the database and when
transmitted between a client and Azure. Only authorized clients must be able to view the data in the
column.
list of actions in the answer area and arrange them in the correct order.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted
QUESTION 13
Your company manages a payroll application for its customers worldwide. The application uses an Azure
SQL database named DB1. The database contains a table named Employee and an identity column
named EmployeeId.
A customer requests the EmployeeId be treated as sensitive data.
Whenever a user queries EmployeeId, you need to return a random value between 1 and 10 instead of the
EmployeeId value.
Which masking format should you use?
A. string
B. number
C. default
Correct Answer: B
Section: (none)
Explanation
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started-
portal
QUESTION 14
SIMULATION
20019535C3F31C49C9E768B2921390F7


You need to ensure that an email notification is sent to [email protected] if a suspicious login to an
Azure SQL database named db2 is detected.

Section: (none)
Explanation
Explanation:
Set up Advanced Threat Protection in the Azure portal.
1. From the Azure portal navigate to the configuration page of the Azure SQL Database db2, which you
want to protect. In the security settings, select Advanced Data Security.
2. On the Advanced Data Security configuration page:
Enable Advanced Data Security on the server.

In Advanced Threat Protection Settings, in the Send alerts to text box, enter [email protected] to
receive security alerts upon detection of anomalous database activities.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-threat-detection
QUESTION 15
SIMULATION
20019535C3F31C49C9E768B2921390F7


You need to classify the following information as Confidential:

Database: db3
Schema: SalesLT
Table: Customer
Column: Phone Information
Type: Contact Info

Section: (none)
Explanation
Explanation:
1. In Azure Portal, locate and select database db3.
2. Select Security and Advance Data Security, and Click Enable advanced Data Security Protection
20019535C3F31C49C9E768B2921390F7
3. Click the Data Discovery & Classification card.
4. Click on Add classification in the top menu of the window.
5. In the context window that opens, select the schema > table > column that you want to classify, and the
information type and sensitivity label. Then click on the blue Add classification button at the bottom of the
context window.
Select/enter the following

Schema: SalesLT
Table: Customer
Column: Phone Information
Information type: Contact Info
20019535C3F31C49C9E768B2921390F7
6. To complete your classification and persistently label (tag) the database columns with the new
classification metadata, click on Save in the top menu of the window.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-data-discovery-and-classification
QUESTION 16
DRAG DROP
You manage the Microsoft Azure Databricks environment for a company. You must be able to access a
private Azure Blob Storage account. Data must be available to all Azure Databricks workspaces. You need
to provide the data access.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Create a secret scope
Step 2: Add secrets to the scope

Note: dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>") gets the key that has been stored
as a secret in a secret scope.
Step 3: Mount the Azure Blob Storage container

You can mount a Blob Storage container or a folder inside a container through Databricks File System -
DBFS. The mount is a pointer to a Blob Storage container, so the data is never synced locally.
Note: To mount a Blob Storage container or a folder inside a container, use the following command:
Python
dbutils.fs.mount(
source = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
where:
dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>") gets the key that has been stored as a
secret in a secret scope.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html
QUESTION 17
DRAG DROP
A company uses Microsoft Azure SQL Database to store sensitive company data. You encrypt the data and
only allow access to specified users from specified locations.
You must monitor data usage, and data copied from the system to prevent data leakage.
You need to configure Azure SQL Database to email a specific user when data leakage occurs.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Step 1: Enable advanced threat protection

Set up threat detection for your database in the Azure portal
1. Launch the Azure portal at https://portal.azure.com.
2. Navigate to the configuration page of the Azure SQL Database server you want to protect. In the security
settings, select Advanced Data Security.
3. On the Advanced Data Security configuration page:

Enable advanced data security on the server.
In Threat Detection Settings, in the Send alerts to text box, provide the list of emails to receive security
20019535C3F31C49C9E768B2921390F7
alerts upon detection of anomalous database activities.
Step 2: Configure the service to send email alerts to [email protected]
Step 3:..of type data exfiltration

The benefits of Advanced Threat Protection for Azure Storage include:
Detection of anomalous access and data exfiltration activities.
Security alerts are triggered when anomalies in activity occur: access from an unusual location, anonymous
access, access by an unusual application, data exfiltration, unexpected delete operations, access
permission change, and so on.
Admins can view these alerts via Azure Security Center and can also choose to be notified of each of them
via email.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-threat-detection
https://www.helpnetsecurity.com/2019/04/04/microsoft-azure-security/
QUESTION 18
HOTSPOT
You develop data engineering solutions for a company. An application creates a database on Microsoft
Azure. You have the following code:
20019535C3F31C49C9E768B2921390F7
Which database and authorization types are used? To answer, select the appropriate option in the answer
area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: Azure Cosmos DB

The DocumentClient.CreateDatabaseAsync(Database, RequestOptions) method creates a database
resource as an asychronous operation in the Azure Cosmos DB service.
Box 2: Master Key

Azure Cosmos DB uses two types of keys to authenticate users and provide access to its data and
resources: Master Key, Resource Tokens
Master keys provide access to the all the administrative resources for the database account. Master keys:
Provide access to accounts, databases, users, and permissions.
Cannot be used to provide granular access to containers and documents.
Are created during the creation of an account.
Can be regenerated at any time.
Incorrect Answers:
Resource Token: Resource tokens provide access to the application resources within a database.
References:
https://docs.microsoft.com/en-us/dotnet/api/
microsoft.azure.documents.client.documentclient.createdatabaseasync
https://docs.microsoft.com/en-us/azure/cosmos-db/secure-access-to-data
QUESTION 19
You have an Azure SQL database that contains a table named Customer. Customer contains the columns
shown in the following table.
20019535C3F31C49C9E768B2921390F7
You apply a masking rule as shown in the following table.
Which users can view the email addresses of the customers?
A. Server administrators and all users who are granted the UNMASK permission to the Customer_Email
column only.
B. All users who are granted the UNMASK permission to the Customer_Email column only.
C. Server administrators only.
D. Server administrators and all users who are granted the SELECT permission to the Customer_Email
column only.
Correct Answer: B
Section: (none)
Explanation
Explanation:
Grant the UNMASK permission to a user to enable them to retrieve unmasked data from the columns for
which masking is defined.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
20019535C3F31C49C9E768B2921390F7
Testlet 2
Background
Polling data

Poll metadata
Phone-based polling
Security

Performance
Deployments
Reliability
Monitoring
be monitored.
QUESTION 1
HOTSPOT
You need to ensure polling data security requirements are met.
Which security technologies should you use? To answer, select the appropriate options in the answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: Azure Active Directory user

Scenario:
Box 2: DataBase Scoped Credential

SQL Server uses a database scoped credential to access non-public Azure blob storage or Kerberos-
secured Hadoop clusters with PolyBase.
PolyBase cannot authenticate by using Azure AD authentication.
References:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql
20019535C3F31C49C9E768B2921390F7
Testlet 3
Overview
Current environment
Requirements

repositories.
up every week.
Databases
Databases:
20019535C3F31C49C9E768B2921390F7

Reporting
Security
Monitoring
20019535C3F31C49C9E768B2921390F7
databases
Issues
Contoso
speeds
basis
QUESTION 1
You need to configure data encryption for external applications.
Solution:
1. Access the Always Encrypted Wizard in SQL Server Management Studio
2. Select the column to be encrypted
3. Set the encryption type to Randomized
4. Configure the master key to use the Windows Certificate Store
5. Validate configuration results and deploy the solution
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Use the Azure Key Vault, not the Windows Certificate Store, to store the master key.
Note: The Master Key Configuration page is where you set up your CMK (Column Master Key) and select
the key store provider where the CMK will be stored. Currently, you can store a CMK in the Windows
certificate store, Azure Key Vault, or a hardware security module (HSM).
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted-azure-key-vault
QUESTION 2
Solution:
3. Set the encryption type to Deterministic
4. Configure the master key to use the Windows Certificate Store
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Use the Azure Key Vault, not the Windows Certificate Store, to store the master key.
References:
QUESTION 3
Solution:
3. Set the encryption type to Deterministic
4. Configure the master key to use the Azure Key Vault
20019535C3F31C49C9E768B2921390F7
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
We use the Azure Key Vault, not the Windows Certificate Store, to store the master key.
References:
QUESTION 4
HOTSPOT
You need to mask tier 1 data. Which functions should you use? To answer, select the appropriate option in
the answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
A: Default
Full masking according to the data types of the designated fields.
For string data types, use XXXX or fewer Xs if the size of the field is less than 4 characters (char, nchar,
varchar, nvarchar, text, ntext).
B: email
C: Custom text
Custom StringMasking method which exposes the first and last letters and adds a custom padding string in
the middle. prefix,[padding],suffix
20019535C3F31C49C9E768B2921390F7
References:
QUESTION 5
DRAG DROP
You need to set up access to Azure SQL Database for Tier 7 and Tier 8 partners.
Which three actions should you perform in sequence? To answer, move the appropriate three actions from
the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: Set the Allow Azure Services to Access Server setting to Disabled
Set Allow access to Azure services to OFF for the most secure configuration.
By default, access through the SQL Database firewall is enabled for all Azure services, under Allow access
to Azure services. Choose OFF to disable access for all Azure services.
Note: The firewall pane has an ON/OFF button that is labeled Allow access to Azure services. The ON
setting allows communications from all Azure IP addresses and all Azure subnets. These Azure IPs or
subnets might not be owned by you. This ON setting is probably more open than you want your SQL
Database to be. The virtual network rule feature offers much finer granular control.
Step 2: In the Azure portal, create a server firewall rule

Set up SQL Database server firewall rules
Server-level IP firewall rules apply to all databases within the same SQL Database server.
To set up a server-level firewall rule:

1. In Azure portal, select SQL databases from the left-hand menu, and select your database on the SQL
databases page.
2. On the Overview page, select Set server firewall. The Firewall settings page for the database server
opens.
Step 3: Connect to the database and use Transact-SQL to create a database firewall rule
Database-level firewall rules can only be configured using Transact-SQL (T-SQL) statements, and only
after you've configured a server-level firewall rule.
To setup a database-level firewall rule:

1. Connect to the database, for example using SQL Server Management Studio.
2. In Object Explorer, right-click the database and select New Query.
3. In the query window, add this statement and modify the IP address to your public IP address:
EXECUTE sp_set_database_firewall_rule N'Example DB Rule','0.0.0.4','0.0.0.4';
4. On the toolbar, select Execute to create the firewall rule.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-security-tutorial
20019535C3F31C49C9E768B2921390F7
Testlet 4
Case study
Overview
Requirements
Planned Changes
requirements:

REPORTINGDB.
use your own key.
usage patterns.
QUESTION 1
DRAG DROP
You need to implement the encryption for SALESDB.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must use
your own key.
Step 1: Implement an Azure key vault

You must create an Azure Key Vault and Key to use for TDE
Step 2: Create a key
Step 3: From the settings of the Azure SQL database …

You turn transparent data encryption on and off on the database level.
References:
https://docs.microsoft.com/en-us/azure/sql-database/transparent-data-encryption-byok-azure-sql-configure
20019535C3F31C49C9E768B2921390F7
Monitor and optimize data solutions
Question Set 1
QUESTION 1
SIMULATION


You need to double the available processing resources available to an Azure SQL data warehouse named
datawarehouse.
NOTE: This task might take several minutes to complete. You can perform other tasks while the
task completes or end this section of the exam.

Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
SQL Data Warehouse compute resources can be scaled by increasing or decreasing data warehouse
units.
1. Click SQL data warehouses in the left page of the Azure portal.
2. Select datawarehouse from the SQL data warehouses page. The data warehouse opens.
3. Click Scale.
4. In the Scale panel, move the slider left or right to change the DWU setting. Double the DWU setting.
20019535C3F31C49C9E768B2921390F7
6. Click Save. A confirmation message appears. Click yes to confirm or no to cancel.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/quickstart-scale-compute-portal
QUESTION 2
A company uses Azure Data Lake Gen 1 Storage to store big data related to consumer behavior.
You need to implement logging.
Solution: Configure Azure Data Lake Storage diagnostics to store logs and metrics in a storage account.
20019535C3F31C49C9E768B2921390F7
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
From the Azure Storage account that contains log data, open the Azure Storage account blade associated
with Data Lake Storage Gen1 for logging, and then click Blobs. The Blob service blade lists two containers.
References:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-diagnostic-logs
QUESTION 3
Solution: Configure an Azure Automation runbook to copy events.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Instead configure Azure Data Lake Storage diagnostics to store logs and metrics in a storage account.
References:
20019535C3F31C49C9E768B2921390F7
QUESTION 4
Your company uses several Azure HDInsight clusters.
The data engineering team reports several errors with some applications using these clusters.
You need to recommend a solution to review the health of the clusters.
What should you include in your recommendation?
A. Azure Automation
B. Log Analytics
C. Application Insights
Correct Answer: B
Section: (none)
Explanation
Explanation:
Azure Monitor logs integration. Azure Monitor logs enables data generated by multiple resources such as
HDInsight clusters, to be collected and aggregated in one place to achieve a unified monitoring experience.
As a prerequisite, you will need a Log Analytics Workspace to store the collected data. If you have not
already created one, you can follow the instructions for creating a Log Analytics Workspace.
You can then easily configure an HDInsight cluster to send many workload-specific metrics to Log
Analytics.
References:
https://azure.microsoft.com/sv-se/blog/monitoring-on-azure-hdinsight-part-2-cluster-health-and-availability/
QUESTION 5
DRAG DROP
Your company uses Microsoft Azure SQL Database configured with Elastic pools. You use Elastic
Database jobs to run queries across all databases in the pool.
You need to analyze, troubleshoot, and report on components responsible for running Elastic Database
jobs.
You need to determine the component responsible for running job service tasks.
Which components should you use for each Elastic pool job services task? To answer, drag the
appropriate component to the correct task. Each component may be used once, more than once, or not at
all. You may need to drag the split bar between panes or scroll to view content.
Select and Place:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Execution results and diagnostics: Azure Storage
Job launcher and tracker: Job Service
Job metadata and state: Control database

The Job database is used for defining jobs and tracking the status and history of job executions. The Job
database is also used to store agent metadata, logs, results, job definitions, and also contains many useful
stored procedures, and other database objects, for creating, running, and managing jobs using T-SQL.
References:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-job-automation-overview
QUESTION 6
Contoso, Ltd. plans to configure existing applications to use Azure SQL Database.
When security-related operations occur, the security team must be informed.
You need to configure Azure Monitor while minimizing administrative efforts.
A. Create a new action group to email [email protected].

B. Use [email protected] as an alert email address.
C. Use all security operations as a condition.
D. Use all Azure SQL Database servers as a resource.
E. Query audit log entries as a condition.
Correct Answer: ACD

Section: (none)
Explanation
References:
https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-action-rules
QUESTION 7
You have a container named Sales in an Azure Cosmos DB database. Sales has 120 GB of data. Each
entry in Sales has the following structure.
The partition key is set to the OrderId attribute.
Users report that when they perform queries that retrieve data by ProductName, the queries take longer
than expected to complete.
You need to reduce the amount of time it takes to execute the problematic queries.
Solution: You create a lookup collection that uses ProductName as a partition key.
20019535C3F31C49C9E768B2921390F7
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
One option is to have a lookup collection “ProductName” for the mapping of “ProductName” to “OrderId”.
References:
https://azure.microsoft.com/sv-se/blog/azure-cosmos-db-partitioning-design-patterns-part-1/
QUESTION 8
Solution: You create a lookup collection that uses ProductName as a partition key and OrderId as a
value.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
References:
20019535C3F31C49C9E768B2921390F7
QUESTION 9
Solution: You change the partition key to include ProductName.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
References:
QUESTION 10
HOTSPOT
You have a new Azure Data Factory environment.
You need to periodically analyze pipeline executions from the last 60 days to identify trends in execution
durations. The solution must use Azure Log Analytics to query the data and create charts.
Which diagnostic settings should you configure in Data Factory? To answer, select the appropriate options
in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Log type: PipelineRuns
A pipeline run in Azure Data Factory defines an instance of a pipeline execution.
Storage location: An Azure Storage account

Data Factory stores pipeline-run data for only 45 days. Use Monitor if you want to keep that data for a
longer time. With Monitor, you can route diagnostic logs for analysis. You can also keep them in a storage
account so that you have factory information for your chosen duration.
20019535C3F31C49C9E768B2921390F7
Save your diagnostic logs to a storage account for auditing or manual inspection. You can use the
diagnostic settings to specify the retention time in days.
References:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor
QUESTION 11
HOTSPOT
You are implementing automatic tuning mode for Azure SQL databases.
Automatic tuning mode is configured as shown in the following table.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Automatic tuning options can be independently enabled or disabled per database, or they can be
configured on SQL Database servers and applied on every database that inherits settings from the server.
SQL Database servers can inherit Azure defaults for Automatic tuning settings. Azure defaults at this time
are set to FORCE_LAST_GOOD_PLAN is enabled, CREATE_INDEX is enabled, and DROP_INDEX is
disabled.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning
QUESTION 12
HOTSPOT
You need to receive an alert when Azure Synapse Analytics consumes the maximum allotted resources.
Which resource type and signal should you use to create the alert in Azure Monitor? To answer, select the
appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
Resource type: SQL data warehouse

DWU limit belongs to the SQL data warehouse resource type.
Signal: DWU limit

SQL Data Warehouse capacity limits are maximum values allowed for various components of Azure SQL
Data Warehouse.
Reference:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-insights-alerts-portal
QUESTION 13
You have an Azure SQL database that has masked columns.
You need to identify when a user attempts to infer data from the masked columns.
A. Azure Advanced Threat Protection (ATP)

B. custom masking rules
C. Transparent Data Encryption (TDE)
D. auditing
Correct Answer: D
Section: (none)
Explanation
Explanation:
Dynamic Data Masking is designed to simplify application development by limiting data exposure in a set of
pre-defined queries used by the application. While Dynamic Data Masking can also be useful to prevent
accidental exposure of sensitive data when accessing a production database directly, it is important to note
that unprivileged users with ad-hoc query permissions can apply techniques to gain access to the actual
data. If there is a need to grant such ad-hoc access, Auditing should be used to monitor all database
activity and mitigate this scenario.
References:
QUESTION 14
Solution: Create an Azure Automation runbook to copy events.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
References:
QUESTION 15
You have an Azure data solution that contains an enterprise data warehouse in Azure Synapse Analytics
20019535C3F31C49C9E768B2921390F7
named DW1.
Several users execute adhoc queries to DW1 concurrently.
You regularly perform automated data loads to DW1.
You need to ensure that the automated data loads have enough memory available to complete quickly and
successfully when the adhoc queries run.
What should you do?
A. Hash distribute the large fact tables in DW1 before performing the automated data loads.
B. Assign a larger resource class to the automated data load queries.
C. Create sampled statistics for every column in each table of DW1.
D. Assign a smaller resource class to the automated data load queries.
Correct Answer: B
Section: (none)
Explanation
Explanation:
To ensure the loading user has enough memory to achieve maximum compression rates, use loading
users that are a member of a medium or large resource class.
Reference:
QUESTION 16
DRAG DROP
You plan to monitor an Azure data factory by using the Monitor & Manage app.
You need to identify the status and duration of activities that reference a table in a source database.
Which three actions should you perform in sequence? To answer, move the actions from the list of actions
to the answer are and arrange them in the correct order.
Select and Place:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Step 1: From the Data Factory authoring UI, generate a user property for Source on all activities.
Step 2: From the Data Factory monitoring app, add the Source user property to Activity Runs table.
You can promote any pipeline activity property as a user property so that it becomes an entity that you can
monitor. For example, you can promote the Source and Destination properties of the copy activity in your
pipeline as user properties. You can also select Auto Generate to generate the Source and Destination
user properties for a copy activity.
Step 3: From the Data Factory authoring UI, publish the pipelines
Publish output data to data stores such as Azure SQL Data Warehouse for business intelligence (BI)
applications to consume.
References:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-visually
QUESTION 17
SIMULATION
20019535C3F31C49C9E768B2921390F7


You need to generate an email notification to [email protected] if the available storage in an Azure
Cosmos DB database named cosmos10277521 is less than 100,000,000 bytes.

Section: (none)
Explanation
Explanation:
1. In the Azure portal, click All services, click Azure Cosmos DB, and then click the cosmos10277521
Azure Cosmos DB account.
2. In the resource menu, click Alert Rules to open the Alert rules page.
20019535C3F31C49C9E768B2921390F7
3. In the Alert rules page, click Add alert.
4. In the Add an alert rule page, specify:

Metric: Available storage
Condition: Less than
Threshold: 100,000,000 bytes
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/monitor-accounts
QUESTION 18
You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named
20019535C3F31C49C9E768B2921390F7
Server1.
You need to verify whether the size of the transaction log file for each distribution of DW1 is smaller than
160 GB.
What should you do?
A. On the master database, execute a query against the

sys.dm_pdw_nodes_os_performance_counters dynamic management view.
B. From Azure Monitor in the Azure portal, execute a query against the logs of DW1.
C. On DW1, execute a query against the sys.database_files dynamic management view.
D. Execute a query against the logs of DW1 by using the
Get-AzOperationalInsightSearchResult PowerShell cmdlet.
Correct Answer: A
Section: (none)
Explanation
Explanation:
The following query returns the transaction log size on each distribution. If one of the log files is reaching
160 GB, you should consider scaling up your instance or limiting your transaction size.
-- Transaction log size

SELECT
instance_name as distribution_db,
cntr_value*1.0/1048576 as log_file_size_used_GB,
pdw_node_id
FROM sys.dm_pdw_nodes_os_performance_counters
WHERE
instance_name like 'Distribution_%'
AND counter_name = 'Log File(s) Used Size (KB)'
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-manage-monitor
QUESTION 19
HOTSPOT
You need to collect application metrics, streaming query events, and application log messages for an Azure
Databrick cluster.
Which type of library and workspace should you implement? To answer, select the appropriate options in
the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation:
You can send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the
Azure Databricks Monitoring Library, which is available on GitHub.
References:
https://docs.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs
20019535C3F31C49C9E768B2921390F7
QUESTION 20
DRAG DROP
You are implementing an Azure Blob storage account for an application that has the following
requirements:
Data created during the last 12 months must be readily accessible.

Blobs older than 24 months must use the lowest storage costs. This data will be accessed infrequently.
Data created 12 to 24 months ago will be accessed infrequently but must be readily accessible at the
lowest storage costs.
Select and Place:
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Step 1: Create a block blob in a Blob storage account

First create the block blob.
Step 2: Use an Azure Resource Manager template that has a lifecycle management policy
Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage
accounts.
Step 3: Create a rule that has the rule actions of TierCool, TierToArchive, and Delete
Each rule definition includes a filter set and an action set. The filter set limits rule actions to a certain set of
objects within a container or objects names. The action set applies the tier or delete actions to the filtered
set of objects.
Step 4: Schedule the lifecycle management policy to run.
Incorrect Answers:
Create a rule filter
No need for a rule filter. Rule filters limit rule actions to a subset of blobs within the storage account.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts
QUESTION 21
You have an Azure Cosmos DB database that uses the SQL API.
You need to delete stale data from the database automatically.
A. soft delete
B. Low Latency Analytical Processing (LLAP)
C. schema on read
D. Time to Live (TTL)
Correct Answer: D
Section: (none)
Explanation
Explanation:
With Time to Live or TTL, Azure Cosmos DB provides the ability to delete items automatically from a
container after a certain time period. By default, you can set time to live at the container level and override
the value on a per-item basis. After you set the TTL at a container or at an item level, Azure Cosmos DB
will automatically remove these items after the time period, since the time they were last modified.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live
QUESTION 22
SIMULATION
20019535C3F31C49C9E768B2921390F7


You plan to create large data sets on db2.
You need to ensure that missing indexes are created automatically by Azure in db2. The solution must
apply ONLY to db2.

Section: (none)
Explanation
Explanation:
1. To enable automatic tuning on Azure SQL Database logical server, navigate to the server in Azure portal
and then select Automatic tuning in the menu.
20019535C3F31C49C9E768B2921390F7
2. Select database db2
3. Click the Apply button
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning-enable
QUESTION 23
A project requires the deployment of resources to Microsoft Azure for batch data processing on Azure
HDInsight. Batch processing will run daily and must:
Scale to minimize costs

Be monitored for cluster performance
You need to recommend a tool that will monitor clusters and provide information to suggest how to scale.
Solution: Monitor cluster load using the Ambari Web UI.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
Ambari Web UI does not provide information to suggest how to scale.
Instead monitor clusters by using Azure Log Analytics and HDInsight cluster management solutions.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-oms-log-analytics-tutorial
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-manage-ambari
QUESTION 24
20019535C3F31C49C9E768B2921390F7

Solution: Monitor clusters by using Azure Log Analytics and HDInsight cluster management solutions.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
HDInsight provides cluster-specific management solutions that you can add for Azure Monitor logs.
Management solutions add functionality to Azure Monitor logs, providing additional data and analysis tools.
These solutions collect important performance metrics from your HDInsight clusters and provide the tools
to search the metrics. These solutions also provide visualizations and dashboards for most cluster types
supported in HDInsight. By using the metrics that you collect with the solution, you can create custom
monitoring rules and alerts.
References:
QUESTION 25

Solution: Download Azure HDInsight cluster logs by using Azure PowerShell.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
20019535C3F31C49C9E768B2921390F7
Instead monitor clusters by using Azure Log Analytics and HDInsight cluster management solutions.
References:
QUESTION 26
HOTSPOT
A company is planning to use Microsoft Azure Cosmos DB as the data store for an application. You have
the following Azure CLI command:
az cosmosdb create -–name "cosmosdbdev1" –-resource-group "rgdev"
You need to minimize latency and expose the SQL API. How should you complete the command? To
answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation:
Box 1: Eventual
With Azure Cosmos DB, developers can choose from five well-defined consistency models on the
consistency spectrum. From strongest to more relaxed, the models include strong, bounded staleness,
session, consistent prefix, and eventual consistency.
The following image shows the different consistency levels as a spectrum.
Box 2: GlobalDocumentDB
Select Core(SQL) to create a document database and query by using SQL syntax.
Note: The API determines the type of account to create. Azure Cosmos DB provides five APIs: Core(SQL)
and MongoDB for document databases, Gremlin for graph databases, Azure Table, and Cassandra.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/create-sql-api-dotnet
QUESTION 27
A company has a Microsoft Azure HDInsight solution that uses different cluster types to process and
analyze data. Operations are continuous.
Reports indicate slowdowns during a specific time window.
You need to determine a monitoring solution to track down the issue in the least amount of time.
20019535C3F31C49C9E768B2921390F7
A. Azure Log Analytics log search query

B. Ambari REST API
C. Azure Monitor Metrics
D. HDInsight .NET SDK
E. Azure Log Analytics alert rule query
Correct Answer: B
Section: (none)
Explanation
Explanation:
Ambari is the recommended tool for monitoring the health for any given HDInsight cluster.
Note: Azure HDInsight is a high-availability service that has redundant gateway nodes, head nodes, and
ZooKeeper nodes to keep your HDInsight clusters running smoothly. While this ensures that a single failure
will not affect the functionality of a cluster, you may still want to monitor cluster health so you are alerted
when an issue does arise. Monitoring cluster health refers to monitoring whether all nodes in your cluster
and the components that run on them are available and functioning correctly.
Ambari is the recommended tool for monitoring utilization across the whole cluster. The Ambari dashboard
shows easily glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS
disk usage. The specific metrics shown depend on cluster type. The “Hosts” tab shows metrics for
individual nodes so you can ensure the load on your cluster is evenly distributed.
References:
https://azure.microsoft.com/en-us/blog/monitoring-on-hdinsight-part-1-an-overview/
QUESTION 28
You have the Diagnostics settings of an Azure Storage account as shown in the following exhibit.
20019535C3F31C49C9E768B2921390F7
How long will the logging data be retained?
A. 7 days
B. 365 days
C. indefinitely
D. 90 days
Correct Answer: A
Section: (none)
Explanation
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-analytics-metrics
QUESTION 29
Your company uses Azure Stream Analytics to monitor devices.
The company plans to double the number of devices that are monitored.
You need to monitor a Stream Analytics job to ensure that there are enough processing resources to
handle the additional load.
20019535C3F31C49C9E768B2921390F7
Which metric should you monitor?
A. Input Deserialization Errors

B. Early Input Events
C. Late Input Events
D. Watermark delay
Correct Answer: D
Section: (none)
Explanation
Explanation:
There are a number of other resource constraints that can cause the streaming pipeline to slow down. The
watermark delay metric can rise due to:
Not enough processing resources in Stream Analytics to handle the volume of input events.
Not enough throughput within the input event brokers, so they are throttled.
Output sinks are not provisioned with enough capacity, so they are throttled. The possible solutions vary
widely based on the flavor of output service being used.
Incorrect Answers:
A: Deserialization issues are caused when the input stream of your Stream Analytics job contains
malformed messages.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-time-handling
QUESTION 30
You have an enterprise data warehouse in Azure Synapse Analytics.
You need to monitor the data warehouse to identify whether you must scale up to a higher service level to
accommodate the current workloads.
Which is the best metric to monitor?
More than one answer choice may achieve the goal. Select the BEST answer.
A. CPU percentage
B. DWU used
C. DWU percentage
D. Data IO percentage
Correct Answer: B
Section: (none)
Explanation
Explanation:
DWU used, defined as DWU limit * DWU percentage, represents only a high-level representation of usage
across the SQL pool and is not meant to be a comprehensive indicator of utilization. To determine whether
to scale up or down, consider all factors which can be impacted by DWU such as concurrency, memory,
tempdb, and adaptive cache capacity. We recommend running your workload at different DWU settings to
determine what works best to meet your business objectives.
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
concept-resource-utilization-query-activity
QUESTION 31
DRAG DROP
20019535C3F31C49C9E768B2921390F7
Your company analyzes images from security cameras and sends to security teams that respond to
unusual activity. The solution uses Azure Databricks.
You need to send Apache Spark level events, Spark Structured Streaming metrics, and application metrics
to Azure Monitor.
list of actions in the answer area and arrange them in the correct order.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
You can send application logs and metrics from Azure Databricks to a Log Analytics workspace.
Spark uses a configurable metrics system based on the Dropwizard Metrics Library.
Prerequisites: Configure your Azure Databricks cluster to use the monitoring library.
Note: The monitoring library streams Apache Spark level events and Spark Structured Streaming metrics
from your jobs to Azure Monitor.
To send application metrics from Azure Databricks application code to Azure Monitor, follow these steps:
Step 1. Build the spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR file
Step 2: Create Dropwizard gauges or counters in your application code.
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/architecture/databricks-monitoring/application-logs
QUESTION 32
You manage a solution that uses Azure HDInsight clusters.
20019535C3F31C49C9E768B2921390F7
You need to implement a solution to monitor cluster performance and status.
Which technology should you use?
A. Azure HDInsight .NET SDK

B. Azure HDInsight REST API
C. Ambari REST API
D. Azure Log Analytics
E. Ambari Web UI
Correct Answer: E
Section: (none)
Explanation
Explanation:
Ambari is the recommended tool for monitoring utilization across the whole cluster. The Ambari dashboard
shows easily glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS
disk usage. The specific metrics shown depend on cluster type. The “Hosts” tab shows metrics for
individual nodes so you can ensure the load on your cluster is evenly distributed.
The Apache Ambari project is aimed at making Hadoop management simpler by developing software for
provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use
Hadoop management web UI backed by its RESTful APIs.
References:
https://azure.microsoft.com/en-us/blog/monitoring-on-hdinsight-part-1-an-overview/
https://ambari.apache.org/
QUESTION 33
You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses
PolyBase to load data from comma-separated value (CSV) files stored in Azure Data Lake Gen 2 using an
external table.
Files with an invalid schema cause errors to occur.
You need to monitor for an invalid schema error.
For which error should you monitor?
A. EXTERNAL TABLE access failed due to internal error: 'Java exception raised
on call to HdfsBridge_Connect: Error
[com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing
external file.'
B. EXTERNAL TABLE access failed due to internal error: 'Java exception raised
on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs]
occurred while accessing external file.'
C. Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11"
for linked server "(null)", Query aborted- the maximum reject threshold (0
rows) was reached while reading from an external source: 1 rows rejected out
of total 1 rows processed.
D. EXTERNAL TABLE access failed due to internal error: 'Java exception raised
on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass]
occurred while accessing external file.'
Correct Answer: C
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Customer Scenario:
SQL Server 2016 or SQL DW connected to Azure blob storage. The CREATE EXTERNAL TABLE DDL
points to a directory (and not a specific file) and the directory contains files with different schemas.
SSMS Error:
Select query on the external table gives the following error:
Msg 7320, Level 16, State 110, Line 14
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)".
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external
source: 1 rows rejected out of total 1 rows processed.
Possible Reason:
The reason this error happens is because each file has different schema. The PolyBase external table DDL
when pointed to a directory recursively reads all the files in that directory. When a column or data type
mismatch happens, this error could be seen in SSMS.
Possible Solution:
If the data for each table consists of one file, then use the filename in the LOCATION section prepended by
the directory of the external files. If there are multiple files per table, put each set of files into different
directories in Azure Blob Storage and then you can point LOCATION to the directory instead of a particular
file. The latter suggestion is the best practices recommended by SQLCAT even if you have one file per
table.
Incorrect Answers:
A: Possible Reason: Kerberos is not enabled in Hadoop Cluster.
References:
https://techcommunity.microsoft.com/t5/DataCAT/PolyBase-Setup-Errors-and-Possible-Solutions/ba-
p/305297
QUESTION 34
Solution: Use information stored in Azure Active Directory reports.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation:
References:
QUESTION 35
20019535C3F31C49C9E768B2921390F7
Solution: You increase the Request Units (RUs) for the database.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation:
To scale the provisioned throughput for your application, you can increase or decrease the number of RUs
at any time.
Note: The cost of all database operations is normalized by Azure Cosmos DB and is expressed by Request
Units (or RUs, for short). You can think of RUs per second as the currency for throughput. RUs per second
is a rate-based currency. It abstracts the system resources such as CPU, IOPS, and memory that are
required to perform the database operations supported by Azure Cosmos DB.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/request-units
QUESTION 36
You are monitoring an Azure Stream Analytics job.
You discover that the Backlogged Input Events metric is increasing slowly and is consistently non-zero.
You need to ensure that the job can handle all the events.
What should you do?
A. Change the compatibility level of the Stream Analytics job.

B. Increase the number of streaming units (SUs).
C. Create an additional output stream for the existing input stream.
D. Remove any named consumer groups from the connection and use $default.
20019535C3F31C49C9E768B2921390F7
Correct Answer: B
Section: (none)
Explanation
Explanation:
Backlogged Input Events: Number of input events that are backlogged. A non-zero value for this metric
implies that your job isn't able to keep up with the number of incoming events. If this value is slowly
increasing or consistently non-zero, you should scale out your job. You should increase the Streaming
Units.
Note: Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
job.
Reference:
https://docs.microsoft.com/bs-cyrl-ba/azure/stream-analytics/stream-analytics-monitoring
QUESTION 37
SIMULATION


20019535C3F31C49C9E768B2921390F7
Your company's compliance policy states that administrators must be able to review a list of the database
object changes that occurred in an Azure SQL database named db2 during the last 100 days.
You need to modify your Azure environment to meet the compliance policy requirements.

Section: (none)
Explanation
Explanation:
Set up auditing for your database
The following section describes the configuration of auditing using the Azure portal.
2. Navigate to Auditing under the Security heading in your SQL database db2/server pane
20019535C3F31C49C9E768B2921390F7
3. If you prefer to enable auditing on the database level, switch Auditing to ON.
Note: By default the audit database data retention period is set to 100 days.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auditing
QUESTION 38
SIMULATION


20019535C3F31C49C9E768B2921390F7
Your company's security policy states that administrators must be able to review a list of the failed logins to
an Azure SQL database named db1 during the previous 30 days.
You need to modify your Azure environment to meet the security policy requirements.

Section: (none)
Explanation
Explanation:
Set up auditing for your database
The following section describes the configuration of auditing using the Azure portal.
2. Navigate to Auditing under the Security heading in your SQL database db1/server pane
20019535C3F31C49C9E768B2921390F7
3. If you prefer to enable auditing on the database level, switch Auditing to ON.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auditing
QUESTION 39
SIMULATION


20019535C3F31C49C9E768B2921390F7
You need to ensure that all REST API calls to an Azure Storage account named storage10543936 use
HTTPS only.

Section: (none)
Explanation
Explanation:
You can configure your storage account to accept requests from secure connections only by setting the
Secure transfer required property for the storage account.
Require secure transfer for an existing storage account

1. Select the existing storage account storage10543936 in the Azure portal.
2. In the storage account menu pane, under SETTINGS, select Configuration.
3. Under Secure transfer required, select Enabled.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-require-secure-transfer
QUESTION 40
You have an Azure Stream Analytics job.
You need to ensure that the job has enough streaming units provisioned.
You configure monitoring of the SU% Utilization metric.
Which two additional metrics should you monitor? Each correct answer presents part of the solution.
A. Watermark Delay
B. Late Input Events
C. Out of order Events
D. Backlogged Input Events
E. Function Events
Correct Answer: BD
Section: (none)
Explanation
Explanation:
B: Late Input Events: events that arrived later than the configured late arrival tolerance window.
Note: While comparing utilization over a period of time, use event rate metrics. InputEvents and
OutputEvents metrics show how many events were read and processed.
D: In job diagram, there is a per partition backlog event metric for each input. If the backlog event metric
keeps increasing, it’s also an indicator that the system resource is constrained (either because of output
sink throttling, or high CPU).
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-scale-jobs
20019535C3F31C49C9E768B2921390F7
Testlet 2
Background
Polling data

Poll metadata
Phone-based polling
Security

Performance
Deployments
Reliability
Monitoring
be monitored.
QUESTION 1
HOTSPOT
You need to ensure phone-based polling data upload reliability requirements are met. How should you
configure monitoring? To answer, select the appropriate options in the answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation:
Box 1: FileCapacity
FileCapacity is the amount of storage used by the storage account’s File service in bytes.
Box 2: Avg
The aggregation type of the FileCapacity metric is Avg.
Scenario:
be monitored.
References:
https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported
20019535C3F31C49C9E768B2921390F7
Testlet 3
Overview
Current environment
Requirements

repositories.
up every week.
Databases
Databases:
20019535C3F31C49C9E768B2921390F7

Reporting
Security
Monitoring
20019535C3F31C49C9E768B2921390F7
databases
Issues
Contoso
speeds
basis
QUESTION 1
You need to implement diagnostic logging for Data Warehouse monitoring.
Which log should you use?
A. RequestSteps
B. DmsWorkers
C. SqlRequests
D. ExecRequests
Correct Answer: C
Section: (none)
Explanation
Explanation:
Scenario:
The Azure SQL Data Warehouse cache must be monitored when the database is being used.
References:
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-
20019535C3F31C49C9E768B2921390F7
pdw-sql-requests-transact-sq
QUESTION 2
You need setup monitoring for tiers 6 through 8.
What should you configure?
A. extended events for average storage percentage that emails data engineers
B. an alert rule to monitor CPU percentage in databases that emails data engineers
C. an alert rule to monitor CPU percentage in elastic pools that emails data engineers
D. an alert rule to monitor storage percentage in databases that emails data engineers
E. an alert rule to monitor storage percentage in elastic pools that emails data engineers
Correct Answer: E
Section: (none)
Explanation
Explanation:
Scenario:
20019535C3F31C49C9E768B2921390F7
Testlet 4
Case Study
in this case study.

Overview
General Overview
series.
Physical Locations
Germany.
datacentre in the London office. The portable office is set up and torn down in approximately 20 different
Race Central
sensors that send real-time telemetry data to the London datacenter. The data is used for real-time tracking
of the cars.
Mechanical Workflow
20019535C3F31C49C9E768B2921390F7
Requirements
Planned Changes
amount of time.
Data Factory.
QUESTION 1
You are monitoring the Data Factory pipeline that runs from Cosmos DB to SQL Database for Race
Central.
You discover that the job takes 45 minutes to run.
What should you do to improve the performance of the job?
A. Decrease parallelism for the copy activities.

B. Increase that data integration units.
C. Configure the copy activities to use staged copy.
D. Configure the copy activities to perform compression.
Correct Answer: B
Section: (none)
Explanation
Explanation:
Performance tuning tips and optimization features. In some cases, when you run a copy activity in Azure
Data Factory, you see a "Performance tuning tips" message on top of the copy activity monitoring, as
shown in the following example. The message tells you the bottleneck that was identified for the given copy
20019535C3F31C49C9E768B2921390F7
run. It also guides you on what to change to boost copy throughput. The performance tuning tips currently
provide suggestions like:
Use PolyBase when you copy data into Azure SQL Data Warehouse.
Increase Azure Cosmos DB Request Units or Azure SQL Database DTUs (Database Throughput Units)
when the resource on the data store side is the bottleneck.
Remove the unnecessary staged copy.
References:
https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance
QUESTION 2
What should you implement to optimize SQL Database for Race Central to meet the technical
requirements?
A. the sp_update_stats stored procedure

B. automatic tuning
C. Query Store
D. the dbcc checkdb command
Correct Answer: A
Section: (none)
Explanation
Explanation:
Scenario: The query performance of Race Central must be stable, and the administrative time it takes to
perform optimizations must be minimized.
sp_updatestats updates query optimization statistics on a table or indexed view. By default, the query
optimizer already updates statistics as necessary to improve the query plan; in some cases you can
improve query performance by using UPDATE STATISTICS or the stored procedure sp_updatestats to
update statistics more frequently than the default updates.
Incorrect Answers:
D: dbcc checkdchecks the logical and physical integrity of all the objects in the specified database
References:
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-updatestats-
transact-sql?view=sql-server-ver15
QUESTION 3
Which two metrics should you use to identify the appropriate RU/s for the telemetry data? Each correct
answer presents part of the solution.
A. Number of requests
B. Number of requests exceeded capacity
C. End to end observed read latency at the 99th percentile
D. Session consistency
E. Data + Index storage consumed
F. Avg Throughput/s
Correct Answer: AE
Section: (none)
Explanation
Explanation:
Scenario: The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB
20019535C3F31C49C9E768B2921390F7
With Azure Cosmos DB, you pay for the throughput you provision and the storage you consume on an
hourly basis.
While you estimate the number of RUs per second to provision, consider the following factors:
Item size: As the size of an item increases, the number of RUs consumed to read or write the item also
increases.
20019535C3F31C49C9E768B2921390F7
Testlet 5
Case study
Overview
Requirements
Planned Changes
requirements:

REPORTINGDB.
use your own key.
usage patterns.
QUESTION 1
How should you monitor SALESDB to meet the technical requirements?
A. Query the sys.resource_stats dynamic management view.

B. Review the Query Performance Insights for SALESDB.
C. Query the sys.dm_os_wait_stats dynamic management view.
D. Review the auditing information of SALESDB.
Correct Answer: A
Section: (none)
20019535C3F31C49C9E768B2921390F7
Explanation
Explanation:
Scenario: Disk IO, CPU, and memory usage must be monitored for SALESDB
The sys.resource_stats returns historical data for CPU, IO, DTU consumption. There’s one row every 5
minute for a database in an Azure logical SQL Server if there’s a change in the metrics.
Incorrect Answers:
B: Query Performance Insight helps you to quickly identify what your longest running queries are, how they
change over time, and what waits are affecting them.
C: sys.dm_os_wait_stats: specific types of wait times during query execution can indicate bottlenecks or
stall points within the query. Similarly, high wait times, or wait counts server wide can indicate bottlenecks
or hot spots in interaction query interactions within the server instance. For example, lock waits indicate
data contention by queries; page IO latch waits indicate slow IO response times; page latch update waits
indicate incorrect file layout.
References:
https://dataplatformlabs.com/monitoring-azure-sql-database-with-sys-resource_stats/
QUESTION 2
You need to ensure that the missing indexes for REPORTINGDB are added.
A. SQL Database Advisor

B. extended events
C. Query Performance Insight
D. automatic tuning
Correct Answer: D
Section: (none)
Explanation
Explanation:
Automatic tuning options include create index, which identifies indexes that may improve performance of
your workload, creates indexes, and automatically verifies that performance of queries has improved.
Scenario:
REPORTINGDB stores reporting data and contains server columnstore indexes.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning
QUESTION 3
Which counter should you monitor for real-time processing to meet the technical requirements?
A. Concurrent users
B. SU% Utilization
C. Data Conversion Errors
D. CPU % utilization
Correct Answer: B
Section: (none)
Explanation
Explanation:
20019535C3F31C49C9E768B2921390F7
Scenario:
usage patterns.
online by using Azure Stream Analytics.
Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
job. This capacity lets you focus on the query logic and abstracts the need to manage the hardware to run
your Stream Analytics job in a timely manner.
References:
20019535C3F31C49C9E768B2921390F7
Manage and troubleshoot Azure data solutions
Question Set 1
QUESTION 1
You manage a process that performs analysis of daily web traffic logs on an HDInsight cluster. Each of the
250 web servers generates approximately 10 megabytes (MB) of log data each day. All log data is stored in
a single folder in Microsoft Azure Data Lake Storage Gen 2.
You need to improve the performance of the process.
Which two changes should you make? Each correct answer presents a complete solution.
A. Combine the daily log files for all servers into one file
B. Increase the value of the mapreduce.map.memory parameter
C. Move the log files into folders so that each day’s logs are in their own folder
D. Increase the number of worker nodes
E. Increase the value of the hive.tez.container.size parameter
Correct Answer: AC
Section: (none)
Explanation
Explanation:
A: Typically, analytics engines such as HDInsight and Azure Data Lake Analytics have a per-file overhead.
If you store your data as many small files, this can negatively affect performance. In general, organize your
data into larger sized files for better performance (256MB to 100GB in size). Some engines and
applications might have trouble efficiently processing files that are greater than 100GB in size.
C: For Hive workloads, partition pruning of time-series data can help some queries read only a subset of
the data which improves performance.
Those pipelines that ingest time-series data, often place their files with a very structured naming for files
and folders. Below is a very common example we see for data that is structured by date:
\DataSet\YYYY\MM\DD\datafile_YYYY_MM_DD.tsv
Notice that the datetime information appears both as folders and in the filename.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-performance-tuning-guidance
QUESTION 2
DRAG DROP
A company builds an application to allow developers to share and compare code. The conversations, code
snippets, and links shared by people in the application are stored in a Microsoft Azure SQL Database
instance. The application allows for searches of historical conversations and code snippets.
When users share code snippets, the code snippet is compared against previously share code snippets by
using a combination of Transact-SQL functions including SUBSTRING, FIRST_VALUE, and SQRT. If a
match is found, a link to the match is added to the conversation.
Customers report the following issues:
Delays occur during live conversations

A delay occurs before matching links appear after code snippets are added to conversations
You need to resolve the performance issues.
Which technologies should you use? To answer, drag the appropriate technologies to the correct issues.
20019535C3F31C49C9E768B2921390F7
Each technology may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content.
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation:
Box 1: memory-optimized table

In-Memory OLTP can provide great performance benefits for transaction processing, data ingestion, and
transient data scenarios.
Box 2: materialized view

To support efficient querying, a common solution is to generate, in advance, a view that materializes the
data in a format suited to the required results set. The Materialized View pattern describes generating
prepopulated views of data in environments where the source data isn't in a suitable format for querying,
where generating a suitable query is difficult, or where query performance is poor due to the nature of the
20019535C3F31C49C9E768B2921390F7
data or the data store.
These materialized views, which only contain data required by a query, allow applications to quickly obtain
the information they need. In addition to joining tables or combining data entities, materialized views can
include the current values of calculated columns or data items, the results of combining values or executing
transformations on the data items, and values specified as part of the query. A materialized view can even
be optimized for just a single query.
References:
https://docs.microsoft.com/en-us/azure/architecture/patterns/materialized-view
QUESTION 3
You implement an Azure SQL Data Warehouse instance.
You plan to migrate the largest fact table to Azure Synapse Analytics. The table resides on Microsoft SQL
Server on-premises and is in 10 terabytes (TB) in size.
Incoming queries use the primary key Sale Key column to retrieve data as displayed in the following table:
You need to distribute the large fact table across multiple nodes to optimize performance of the table.
Which technology should you use?
A. hash distributed table with clustered ColumnStore index

B. hash distributed table with clustered index
C. heap table with distribution replicate
D. round robin distributed table with clustered index
E. round robin distributed table with clustered ColumnStore index
Correct Answer: A
Section: (none)
Explanation
Explanation:
Hash-distributed tables improve query performance on large fact tables.
Columnstore indexes can achieve up to 100x better performance on analytics and data warehousing
workloads and up to 10x better data compression than traditional rowstore indexes.
Incorrect Answers:
D, E: Round-robin tables are useful for improving loading speed.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance
QUESTION 4
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance
changes for infrequently used queries.
20019535C3F31C49C9E768B2921390F7
You need to monitor resource utilization to determine the source of the performance issues.
A. Cache used percentage

B. Local tempdb percentage
C. DWU percentage
D. CPU percentage
E. Data IO percentage
Correct Answer: A
Section: (none)
Explanation
Explanation:
The Azure Synapse Analytics storage architecture automatically tiers your most frequently queried
columnstore segments in a cache residing on NVMe based SSDs designed for Gen2 data warehouses.
Greater performance is realized when your queries retrieve segments that are residing in the cache. You
can monitor and troubleshoot slow query performance by determining whether your workload is optimally
leveraging the Gen2 cache.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache
https://docs.microsoft.com/bs-latn-ba/azure/sql-data-warehouse/sql-data-warehouse-concept-resource-
utilization-query-activity
QUESTION 5
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance
changes for infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
A. Data Warehouse Units (DWU) used

B. DWU limit
C. Cache hit percentage
D. Data IO percentage
Correct Answer: C
Section: (none)
Explanation
Explanation:
The Azure Synapse Analytics storage architecture automatically tiers your most frequently queried
columnstore segments in a cache residing on NVMe based SSDs designed for Gen2 data warehouses.
Greater performance is realized when your queries retrieve segments that are residing in the cache. You
can monitor and troubleshoot slow query performance by determining whether your workload is optimally
leveraging the Gen2 cache.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/bs-latn-ba/azure/sql-data-warehouse/sql-data-warehouse-concept-resource-
utilization-query-activity
QUESTION 6
A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses
Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. The cloud job
is configured to use 120 Streaming Units (SU).
You need to optimize performance for the Azure Stream Analytics job.
Which two actions should you perform? Each correct answer presents part of the solution.
A. Implement event ordering

B. Scale the SU count for the job up
C. Implement Azure Stream Analytics user-defined functions (UDF)
D. Scale the SU count for the job down
E. Implement query parallelization by partitioning the data output
F. Implement query parallelization by partitioning the data input
Correct Answer: BF
Section: (none)
Explanation
Explanation:
Scale out the query by allowing the system to process each input partition separately.
F: A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the
data stream from.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
20019535C3F31C49C9E768B2921390F7

DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp

Uploaded by

Copyright:

Available Formats

DP-200.prepaway.premium.exam.

Implementing an Azure Data Solution

You must implement a solution that meets the following requirements:

Provide data warehousing

You need to create an HDInsight cluster to meet the requirements.

Which type of cluster should you create?

Select and Place:

Step 1: Provision an Azure SQL Data Warehouse instance.

Step 4: Run Transact-SQL statements to load data.

What should you do?

A. Install a standalone on-premises Azure data gateway at each location

You need to implement this synchronization solution.

Which synchronization method should you use?

Compare Data Sync with Transactional Replication

NOTE: Each correct answer selection is worth one point.

Which option or options should you configure?

A. Number of transactions only

Each region maintains its own private virtual network.

Microsoft Azure SQL Databases must be provisioned.

Database provisioning must maximize performance and minimize cost

You need to provision Azure SQL database instances.

NOTE: Each correct selection is worth one point.

Select and Place:

Box 1: Azure SQL Database elastic pools

Box 2: Azure SQL Database Hyperscale

Which data technology should you use?

A. Azure SQL Database single database

What should you implement?

Developers need to access data in the database using an API.

NOTE: Each correct selection is worth one point.

Which tool should you use?

A. SQL Server Migration Assistant (SSMA)

You must implement the following data retention requirements:

Daily results must be kept for 90 days

NOTE: Each correct selection is worth one point.

Select and Place:

The Set-AzStorageAccountManagementPolicy cmdlet creates or modifies the management policy of an

PS C:\>$action2 = Add-AzStorageAccountManagementPolicyAction -BaseBlobAction Delete -

You need to implement the solution.

NOTE: Each correct selection is worth one point.

A. Premium service tier

Correct Answer: AEF

You need to configure the storage account.

NOTE: Each correct selection is worth one point.

You are developing a solution to visualize multiple terabytes of geospatial data.

The solution has the following requirements:

Data must be encrypted.

You need to provision storage for the solution.

Select and Place:

The solution has the following requirements:

Data must be partitioned into multiple containers.

You need to provision Azure Cosmos DB.

NOTE: Each correct selection is worth one point.

A. Configure account-level throughput.

NOTE: Each correct selection is worth one point.

A. maximum data size

You are developing a solution using a Lambda architecture on Microsoft Azure.