DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp
DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp
DP 200.prepaway - Premium.14222.exam.201q. Yu7iZWp
201q
Number: DP-200
Passing Score: 800
Time Limit: 120 min
File Version: 9.0
DP-200
Version 9.0
20019535C3F31C49C9E768B2921390F7
Implement data storage solutions
Question Set 1
QUESTION 1
You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source
big data solution to collect, process, and maintain data. The analytical data store performs poorly.
A. Interactive Query
B. Apache Hadoop
C. Apache HBase
D. Apache Spark
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Lambda Architecture with Azure:
Azure offers you a combination of following technologies to accelerate real-time big data analytics:
1. Azure Cosmos DB, a globally distributed and multi-model database service.
2. Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics
applications.
3. Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process.
4. The Spark to Azure Cosmos DB Connector
Note: Lambda architecture is a data-processing architecture designed to handle massive quantities of data
by taking advantage of both batch processing and stream processing methods, and minimizing the latency
20019535C3F31C49C9E768B2921390F7
involved in querying big data.
References:
https://sqlwithmanoj.com/2018/02/16/what-is-lambda-architecture-and-what-azure-offers-with-its-new-
cosmos-db/
QUESTION 2
DRAG DROP
You develop data engineering solutions for a company. You must migrate data from Microsoft Azure Blob
storage to an Azure SQL Data Warehouse for further transformation. You need to implement the solution.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Step 2: Connect to the Azure SQL Data warehouse by using SQL Server Management Studio
Connect to the data warehouse with SSMS (SQL Server Management Studio)
Step 3: Build external tables by using the SQL Server Management Studio
Create external tables for data in Azure blob storage.
You are ready to begin the process of loading data into your new data warehouse. You use external tables
to load data from the Azure storage blob.
References:
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/sql-data-warehouse/load-data-from-
azure-blob-storage-using-polybase.md
QUESTION 3
You develop data engineering solutions for a company. The company has on-premises Microsoft SQL
Server databases at multiple locations.
The company must integrate data with Microsoft Power BI and Microsoft Azure Logic Apps. The solution
must avoid single points of failure during connection and transfer to the cloud. The solution must also
minimize latency.
You need to secure the transfer of data between on-premises databases and Microsoft Azure.
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You can create high availability clusters of On-premises data gateway installations, to ensure your
organization can access on-premises data resources used in Power BI reports and dashboards. Such
clusters allow gateway administrators to group gateways to avoid single points of failure in accessing on-
premises data resources. The Power BI service always uses the primary gateway in the cluster, unless it’s
not available. In that case, the service switches to the next gateway in the cluster, and so on.
References:
https://docs.microsoft.com/en-us/power-bi/service-gateway-high-availability-clusters
QUESTION 4
You are a data architect. The data engineering team needs to configure a synchronization of data between
an on-premises Microsoft SQL Server database to Azure SQL Database.
Ad-hoc and reporting queries are being overutilized the on-premises production instance. The
synchronization process must:
Perform an initial data synchronization to Azure SQL Database with minimal downtime
Perform bi-directional data synchronization after initial synchronization
20019535C3F31C49C9E768B2921390F7
A. transactional replication
B. Data Migration Assistant (DMA)
C. backup and restore
D. SQL Server Agent job
E. Azure SQL Data Sync
Correct Answer: E
Section: (none)
Explanation
Explanation/Reference:
Explanation:
SQL Data Sync is a service built on Azure SQL Database that lets you synchronize the data you select bi-
directionally across multiple SQL databases and SQL Server instances.
With Data Sync, you can keep data synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-sync-data
QUESTION 5
An application will use Microsoft Azure Cosmos DB as its data solution. The application will use the
Cassandra API to support a column-based database type that uses containers to store items.
You need to provision Azure Cosmos DB. Which container name and item name should you use? Each
correct answer presents part of the solutions.
A. collection
B. rows
C. graph
D. entities
E. table
Correct Answer: BE
Section: (none)
Explanation
Explanation/Reference:
Explanation:
B: Depending on the choice of the API, an Azure Cosmos item can represent either a document in a
collection, a row in a table or a node/edge in a graph. The following table shows the mapping between API-
specific entities to an Azure Cosmos item:
20019535C3F31C49C9E768B2921390F7
E: An Azure Cosmos container is specialized into API-specific entities as follows:
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/databases-containers-items
QUESTION 6
A company has a SaaS solution that uses Azure SQL Database with elastic pools. The solution contains a
dedicated database for each customer organization. Customer organizations have peak usage at different
periods during the year.
You need to implement the Azure SQL Database elastic pool to minimize cost.
Correct Answer: E
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The best size for a pool depends on the aggregate resources needed for all databases in the pool. This
involves determining the following:
Maximum resources utilized by all databases in the pool (either maximum DTUs or maximum vCores
depending on your choice of resourcing model).
Maximum storage bytes utilized by all databases in the pool.
Note: Elastic pools enable the developer to purchase resources for a pool shared by multiple databases to
accommodate unpredictable periods of usage by individual databases. You can configure resources for the
pool based either on the DTU-based purchasing model or the vCore-based purchasing model.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-pool
QUESTION 7
HOTSPOT
You are a data engineer. You are designing a Hadoop Distributed File System (HDFS) architecture. You
plan to use Microsoft Azure Data Lake as a data storage repository.
You must provision the repository with a resilient data schema. You need to ensure the resiliency of the
Azure Data Lake Storage. What should you use? To answer, select the appropriate options in the answer
area.
20019535C3F31C49C9E768B2921390F7
NOTE: Each correct selection is worth one point.
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: NameNode
An HDFS cluster consists of a single NameNode, a master server that manages the file system
namespace and regulates access to files by clients.
20019535C3F31C49C9E768B2921390F7
Box 2: DataNode
The DataNodes are responsible for serving read and write requests from the file system’s clients.
Box 3: DataNode
The DataNodes perform block creation, deletion, and replication upon instruction from the NameNode.
Note: HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master
server that manages the file system namespace and regulates access to files by clients. In addition, there
are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the
nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files.
Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The
NameNode executes file system namespace operations like opening, closing, and renaming files and
directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for
serving read and write requests from the file system’s clients. The DataNodes also perform block creation,
deletion, and replication upon instruction from the NameNode.
References:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes
QUESTION 8
DRAG DROP
You are developing the data platform for a global retail company. The company operates during normal
working hours in each region. The analytical database is used once a week for building sales projections.
Building the sales projections is very resource intensive are generates upwards of 20 terabytes (TB) of
data.
How should you provision the database instances? To answer, drag the appropriate Azure SQL products to
the correct databases. Each Azure SQL product may be used once, more than once, or not at all. You may
need to drag the split bar between panes or scroll to view content.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Incorrect Answers:
Azure SQL Database Managed Instance: The managed instance deployment model is designed for
customers looking to migrate a large number of apps from on-premises or IaaS, self-built, or ISV provided
environment to fully managed PaaS cloud environment, with as low migration effort as possible.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-pool
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale-faq
QUESTION 9
A company manages several on-premises Microsoft SQL Server databases.
You need to migrate the databases to Microsoft Azure by using a backup process of Microsoft SQL Server.
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Managed instance is a new deployment option of Azure SQL Database, providing near 100% compatibility
20019535C3F31C49C9E768B2921390F7
with the latest SQL Server on-premises (Enterprise Edition) Database Engine, providing a native virtual
network (VNet) implementation that addresses common security concerns, and a business model
favorable for on-premises SQL Server customers. The managed instance deployment model allows
existing SQL Server customers to lift and shift their on-premises applications to the cloud with minimal
application and database changes.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance
QUESTION 10
The data engineering team manages Azure HDInsight clusters. The team spends a large amount of time
creating and destroying clusters daily because most of the data pipeline process runs in minutes.
You need to implement a solution that deploys multiple HDInsight clusters with minimal effort.
A. Azure Databricks
B. Azure Traffic Manager
C. Azure Resource Manager templates
D. Ambari web user interface
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
A Resource Manager template makes it easy to create the following resources for your application in a
single, coordinated operation:
HDInsight clusters and their dependent resources (such as the default storage account).
Other resources (such as Azure SQL Database to use Apache Sqoop).
In the template, you define the resources that are needed for the application. You also specify deployment
parameters to input values for different environments. The template consists of JSON and expressions that
you use to construct values for your deployment.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-arm-templates
QUESTION 11
You are the data engineer for your company. An application uses a NoSQL database to store data. The
database uses the key-value and wide-column NoSQL database type.
You need to determine which API to use for the database model and type.
Which two APIs should you use? Each correct answer presents a complete solution.
A. Table API
B. MongoDB API
C. Gremlin API
D. SQL API
E. Cassandra API
Correct Answer: BE
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
B: Azure Cosmos DB is the globally distributed, multimodel database service from Microsoft for mission-
critical applications. It is a multimodel database and supports document, key-value, graph, and columnar
data models.
E: Wide-column stores store data together as columns instead of rows and are optimized for queries over
large datasets. The most popular are Cassandra and HBase.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction
https://www.mongodb.com/scale/types-of-nosql-databases
QUESTION 12
A company is designing a hybrid solution to synchronize data and on-premises Microsoft SQL Server
database to Azure SQL Database.
You must perform an assessment of databases to determine whether data will move without compatibility
issues. You need to perform the assessment.
Correct Answer: E
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The Data Migration Assistant (DMA) helps you upgrade to a modern data platform by detecting
compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL
Database. DMA recommends performance and reliability improvements for your target environment and
allows you to move your schema, data, and uncontained objects from your source server to your target
server.
References:
https://docs.microsoft.com/en-us/sql/dma/dma-overview
QUESTION 13
DRAG DROP
You manage a financial computation data analysis process. Microsoft Azure virtual machines (VMs) run the
process in daily jobs, and store the results in virtual hard drives (VHDs.)
The VMs product results using data from the previous day and store the results in a snapshot of the VHD.
When a new month begins, a process creates a new VHD.
You need to enforce the data retention requirements while minimizing cost.
How should you configure the lifecycle policy? To answer, drag the appropriate JSON segments to the
20019535C3F31C49C9E768B2921390F7
correct locations. Each JSON segment may be used once, more than once, or not at all. You may need to
drag the split bat between panes or scroll to view content.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Example: Create or update the management policy of a Storage account with ManagementPolicy rule
objects.
20019535C3F31C49C9E768B2921390F7
Action -BaseBlobAction Delete -daysAfterModificationGreaterThan 100
PS C:\>$action1 = Add-AzStorageAccountManagementPolicyAction -InputObject $action1 -BaseBlobAction
TierToArchive -daysAfterModificationGreaterThan 50
PS C:\>$action1 = Add-AzStorageAccountManagementPolicyAction -InputObject $action1 -BaseBlobAction
TierToCool -daysAfterModificationGreaterThan 30
PS C:\>$action1 = Add-AzStorageAccountManagementPolicyAction -InputObject $action1 -SnapshotAction
Delete -daysAfterCreationGreaterThan 100
PS C:\>$filter1 = New-AzStorageAccountManagementPolicyFilter -PrefixMatch ab,cd
PS C:\>$rule1 = New-AzStorageAccountManagementPolicyRule -Name Test -Action $action1 -Filter
$filter1
References:
https://docs.microsoft.com/en-us/powershell/module/az.storage/set-azstorageaccountmanagementpolicy
QUESTION 14
A company plans to use Azure SQL Database to support a mission-critical application.
The application must be highly available without performance degradation during maintenance windows.
Which three technologies should you implement? Each correct answer presents part of the solution.
Explanation/Reference:
Explanation:
A: Premium/business critical service tier model that is based on a cluster of database engine processes.
This architectural model relies on a fact that there is always a quorum of available database engine nodes
and has minimal performance impact on your workload even during maintenance activities.
E: In the premium model, Azure SQL database integrates compute and storage on the single node. High
availability in this architectural model is achieved by replication of compute (SQL Server Database Engine
process) and storage (locally attached SSD) deployed in 4-node cluster, using technology similar to SQL
Server Always On Availability Groups.
20019535C3F31C49C9E768B2921390F7
F: Zone redundant configuration
By default, the quorum-set replicas for the local storage configurations are created in the same datacenter.
With the introduction of Azure Availability Zones, you have the ability to place the different replicas in the
quorum-sets to different availability zones in the same region. To eliminate a single point of failure, the
control ring is also duplicated across multiple zones as three gateway rings (GW).
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability
QUESTION 15
A company plans to use Azure Storage for file storage purposes. Compliance rules require:
A single storage account to store all operations including reads, writes and deletes
Retention of an on-premises copy of historical operations
Which two actions should you perform? Each correct answer presents part of the solution.
A. Configure the storage account to log read, write and delete operations for service type Blob
B. Use the AzCopy tool to download log data from $logs/blob
C. Configure the storage account to log read, write and delete operations for service-type table
D. Use the storage client to download log data from $logs/table
20019535C3F31C49C9E768B2921390F7
E. Configure the storage account to log read, write and delete operations for service type queue
Correct Answer: AB
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Storage Logging logs request data in a set of blobs in a blob container named $logs in your storage
account. This container does not show up if you list all the blob containers in your account but you can see
its contents if you access it directly.
To view and analyze your log data, you should download the blobs that contain the log data you are
interested in to a local machine. Many storage-browsing tools enable you to download blobs from your
storage account; you can also use the Azure Storage team provided command-line Azure Copy Tool
(AzCopy) to download your log data.
References:
https://docs.microsoft.com/en-us/rest/api/storageservices/enabling-storage-logging-and-accessing-log-data
QUESTION 16
DRAG DROP
Which four actions should you perform in sequence? To answer, move the appropriate action from the list
of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Create a new Azure Data Lake Storage account with Azure Data Lake managed encryption keys
For Azure services, Azure Key Vault is the recommended key storage solution and provides a common
management experience across services. Keys are stored and managed in key vaults, and access to a key
vault can be given to users or services. Azure Key Vault supports customer creation of keys or import of
customer keys for use in customer-managed encryption key scenarios.
Note: Data Lake Storage Gen1 account Encryption Settings. There are three options:
Do not enable encryption.
Use keys managed by Data Lake Storage Gen1, if you want Data Lake Storage Gen1 to manage your
encryption keys.
Use keys from your own Key Vault. You can select an existing Azure Key Vault or create a new Key
Vault. To use the keys from a Key Vault, you must assign permissions for the Data Lake Storage Gen1
account to access the Azure Key Vault.
References:
https://docs.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest
QUESTION 17
You are developing a data engineering solution for a company. The solution will store a large set of key-
value pair data by using Microsoft Azure Cosmos DB.
Which three actions should you perform? Each correct answer presents part of the solution.
20019535C3F31C49C9E768B2921390F7
B. Provision an Azure Cosmos DB account with the Azure Table API. Enable geo-redundancy.
C. Configure table-level throughput.
D. Replicate the data globally by manually adding regions to the Azure Cosmos DB account.
E. Provision an Azure Cosmos DB account with the Azure Table API. Enable multi-region writes.
Correct Answer: E
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scale read and write throughput globally. You can enable every region to be writable and elastically scale
reads and writes all around the world. The throughput that your application configures on an Azure Cosmos
database or a container is guaranteed to be delivered across all regions associated with your Azure
Cosmos account. The provisioned throughput is guaranteed up by financially backed SLAs.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/distribute-data-globally
QUESTION 18
A company has a SaaS solution that uses Azure SQL Database with elastic pools. The solution will have a
dedicated database for each customer organization. Customer organizations have peak usage at different
periods during the year.
Which two factors affect your costs when sizing the Azure SQL Database elastic pools? Each correct
answer presents a complete solution.
Correct Answer: AC
Section: (none)
Explanation
Explanation/Reference:
Explanation:
A: With the vCore purchase model, in the General Purpose tier, you are charged for Premium blob storage
that you provision for your database or elastic pool. Storage can be configured between 5 GB and 4 TB
with 1 GB increments. Storage is priced at GB/month.
C: In the DTU purchase model, elastic pools are available in basic, standard and premium service tiers.
Each tier is distinguished primarily by its overall performance, which is measured in elastic Database
Transaction Units (eDTUs).
References:
https://azure.microsoft.com/en-in/pricing/details/sql-database/elastic/
QUESTION 19
HOTSPOT
Data storage:
20019535C3F31C49C9E768B2921390F7
Implement optimized storage for big data analytics workloads.
Ensure that data can be organized using a hierarchical structure.
Batch processing:
You need to identify the correct technologies to build the Lambda architecture.
Which technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch
processing, you can use Spark, Hive, Hive LLAP, MapReduce.
20019535C3F31C49C9E768B2921390F7
Azure Synapse Analytics Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that uses
Massively Parallel Processing (MPP).
Azure Synapse Analytics stores data into relational tables with columnar storage.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is
QUESTION 20
DRAG DROP
The data engineering team plans to implement a process that copies data from the SQL Server instance to
Azure Blob storage. The process must orchestrate and manage the data lifecycle.
You need to configure Azure Data Factory to connect to the SQL Server instance.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 2: From the on-premises network, install and configure a self-hosted runtime.
To use copy data from a SQL Server database that isn't publicly accessible, you need to set up a self-
hosted integration runtime.
References:
https://docs.microsoft.com/en-us/azure/data-factory/connector-sql-server
QUESTION 21
A company runs Microsoft SQL Server in an on-premises virtual machine (VM).
You must migrate the database to Azure SQL Database. You synchronize users from Active Directory to
Azure Active Directory (Azure AD).
You need to configure Azure SQL Database to use an Azure AD user as administrator.
A. For each Azure SQL Database, set the Access Control to administrator.
B. For each Azure SQL Database server, set the Active Directory to administrator.
C. For each Azure SQL Database, set the Active Directory administrator role.
D. For each Azure SQL Database server, set the Access Control to administrator.
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
There are two administrative accounts (Server admin and Active Directory admin) that act as
administrators.
One Azure Active Directory account, either an individual or security group account, can also be configured
as an administrator. It is optional to configure an Azure AD administrator, but an Azure AD administrator
must be configured if you want to use Azure AD accounts to connect to SQL Database.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-manage-logins
QUESTION 22
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field
named Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements:
20019535C3F31C49C9E768B2921390F7
All other characters must be masked.
Solution: You implement data masking and use a credit card function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Must use Custom Text data masking, which exposes the first and last characters and adds a custom
padding string in the middle.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started
QUESTION 23
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field
named Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements:
Solution: You implement data masking and use an email function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Must use Custom Text data masking, which exposes the first and last characters and adds a custom
padding string in the middle.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started
QUESTION 24
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
20019535C3F31C49C9E768B2921390F7
questions will not appear in the review screen.
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field
named Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements:
Solution: You implement data masking and use a random number function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Must use Custom Text data masking, which exposes the first and last characters and adds a custom
padding string in the middle.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started
QUESTION 25
DRAG DROP
You are responsible for providing access to an Azure Data Lake Storage Gen2 account.
Your user account has contributor access to the storage account, and you have the application ID and
access key.
You plan to use PolyBase to load data into an enterprise data warehouse in Azure Synapse Analytics.
You need to configure PolyBase to connect the data warehouse to the storage account.
Which three components should you create in sequence? To answer, move the appropriate components
from the list of components to the answer are and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: a database scoped credential
To access your Data Lake Storage account, you will need to create a Database Master Key to encrypt your
credential secret used in the next step. You then create a database scoped credential.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
20019535C3F31C49C9E768B2921390F7
QUESTION 26
You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
A. hash distributed
B. heap
C. replicated
D. round-robin
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Usually common dimension tables or tables that doesn’t distribute evenly are good candidates for round-
robin distributed table.
Note: Dimension tables or other lookup tables in a schema can usually be stored as round-robin tables.
Usually these tables connect to more than one fact tables and optimizing for one join may not be the best
idea. Also usually dimension tables are smaller which can leave some distributions empty when hash
distributed. Round-robin by definition guarantees a uniform data distribution.
Reference:
https://blogs.msdn.microsoft.com/sqlcat/2015/08/11/choosing-hash-distributed-table-vs-round-robin-
distributed-table-in-azure-sql-dw-service/
QUESTION 27
You have an enterprise data warehouse in Azure Synapse Analytics.
Using PolyBase, you create table named [Ext].[Items] to query Parquet files stored in Azure Data Lake
Storage Gen2 without importing the data to the data warehouse.
You discover that the Parquet files have a fourth column named ItemID.
Which command should you run to add the ItemID column to the external table?
20019535C3F31C49C9E768B2921390F7
A. Option A
B. Option B
C. Option C
D. Option D
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Incorrect Answers:
B, D: Only these Data Definition Language (DDL) statements are allowed on external tables:
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql
QUESTION 28
DRAG DROP
You have a table named SalesFact in Azure Synapse Analytics. SalesFact contains sales data from the
20019535C3F31C49C9E768B2921390F7
past 36 months and has the following characteristics:
Is partitioned by month
Contains one billion rows
Has clustered columnstore indexes
At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as
quickly as possible.
Which three actions should you perform in sequence in a stored procedure? To answer, move the
appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: Create an empty table named SalesFact_work that has the same schema as SalesFact.
Step 2: Switch the partition containing the stale data from SalesFact to SalesFact_Work.
SQL Data Warehouse supports partition splitting, merging, and switching. To switch partitions between two
tables, you must ensure that the partitions align on their respective boundaries and that the table definitions
match.
Loading data into partitions with partition switching is a convenient way stage new data in a table that is not
visible to users the switch in the new data.
20019535C3F31C49C9E768B2921390F7
Step 3: Drop the SalesFact_Work table.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition
QUESTION 29
You plan to implement an Azure Cosmos DB database that will write 100,000,000 JSON every 24 hours.
The database will be replicated to three regions. Only one region will be writable.
You need to select a consistency level for the database to meet the following requirements:
A. Strong
B. Bounded Staleness
C. Eventual
D. Session
E. Consistent Prefix
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Session: Within a single client session reads are guaranteed to honor the consistent-prefix (assuming a
single “writer” session), monotonic reads, monotonic writes, read-your-writes, and write-follows-reads
guarantees. Clients outside of the session performing writes will see eventual consistency.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
QUESTION 30
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field
named Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements:
Solution: You implement data masking and use a custom text mask.
A. Yes
B. No
Correct Answer: A
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We must use Custom Text data masking, which exposes the first and last characters and adds a custom
padding string in the middle.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started
QUESTION 31
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical
values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse
Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is less than 1 MB.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead convert the files to compressed delimited text files.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
QUESTION 32
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical
values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an Azure SQL data warehouse.
You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is more than 1 MB.
20019535C3F31C49C9E768B2921390F7
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead modify the files to ensure that each row is less than 1 MB.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
QUESTION 33
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical
values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse
Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You copy the files to a table that has a columnstore index.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead convert the files to compressed delimited text files.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
QUESTION 34
You plan to deploy an Azure Cosmos DB database that supports multi-master replication.
You need to select a consistency level for the database to meet the following requirements:
What are three possible consistency levels that you can select? Each correct answer presents a complete
solution.
20019535C3F31C49C9E768B2921390F7
A. Strong
B. Bounded Staleness
C. Eventual
D. Session
E. Consistent Prefix
Explanation/Reference:
Explanation:
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels-choosing
QUESTION 35
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to ensure that you can recover any blob data from an Azure Storage account named storage
10277521 up to 30 days after the data is deleted.
Explanation/Reference:
Explanation:
1. Open Azure Portal and open the Azure Blob storage account named storage10277521.
2. Right-click and select Blob properties
20019535C3F31C49C9E768B2921390F7
3. From the properties window, change the access tier for the blob to Cool.
Note: The cool access tier has lower storage costs and higher access costs compared to hot storage. This
tier is intended for data that will remain in the cool tier for at least 30 days.
Reference:
https://dailydotnettips.com/how-to-update-access-tier-in-azure-storage-blob-level/
QUESTION 36
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to replicate db1 to a new Azure SQL server named REPL10277521 in the Central Canada
region.
NOTE: This task might take several minutes to complete. You can perform other tasks while the task
completes or ends this section of the exam.
Explanation/Reference:
Explanation:
1. In the Azure portal, browse to the database that you want to set up for geo-replication.
2. On the SQL database page, select geo-replication, and then select the region to create the secondary
database.
20019535C3F31C49C9E768B2921390F7
3. Select or configure the server and for the secondary database.
Region: Central Canada
Target server: REPL10277521
20019535C3F31C49C9E768B2921390F7
4. Click Create to add the secondary.
20019535C3F31C49C9E768B2921390F7
6. When the seeding process is complete, the secondary database displays its status.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-active-geo-replication-portal
QUESTION 37
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to create an Azure SQL database named db3 on an Azure SQL server named SQL10277521.
Db3 must use the Sample (AdventureWorksLT) source.
Explanation/Reference:
Explanation:
1. Click Create a resource in the upper left-hand corner of the Azure portal.
2. On the New page, select Databases in the Azure Marketplace section, and then click SQL Database in
the Featured section.
20019535C3F31C49C9E768B2921390F7
3. Fill out the SQL Database form with the following information, as shown below:
Database name: Db3
Select source: Sample (AdventureWorksLT)
Server: SQL10277521
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-design-first-database
QUESTION 38
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You plan to query db3 to retrieve a list of sales customers. The query will retrieve several columns that
include the email address of each sales customer.
You need to modify db3 to ensure that a portion of the email addresses is hidden in the query results.
Explanation/Reference:
Explanation:
1. Launch the Azure portal.
2. Navigate to the settings page of the database db3 that includes the sensitive data you want to mask.
3. Click the Dynamic Data Masking tile that launches the Dynamic Data Masking configuration page.
Note: Alternatively, you can scroll down to the Operations section and click Dynamic Data Masking.
20019535C3F31C49C9E768B2921390F7
4. In the Dynamic Data Masking configuration page, you may see some database columns that the
recommendations engine has flagged for masking.
20019535C3F31C49C9E768B2921390F7
5. Click ADD MASK for the EmailAddress column
6. Click Save in the data masking rule page to update the set of masking rules in the dynamic data
masking policy.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started-
portal
QUESTION 39
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
Explanation/Reference:
Explanation:
1. In Azure Portal, navigate to the SQL databases page, select the db2 database , and choose Configure
performance
20019535C3F31C49C9E768B2921390F7
2. Click on Standard and Adjust the Storage size to 250 GB
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-single-databases-manage
QUESTION 40
HOTSPOT
You have an enterprise data warehouse in Azure Synapse Analytics that contains a table named
FactOnlineSales. The table contains data from the start of 2009 to the end of 2012.
You need to improve the performance of queries against FactOnlineSales by using table partitions. The
solution must meet the following requirements:
How should you complete the T-SQL command? To answer, select the appropriate options in the answer
area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: LEFT
RANGE LEFT: Specifies the boundary value belongs to the partition on the left (lower values). The default
is LEFT.
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse
QUESTION 41
SIMULATION
20019535C3F31C49C9E768B2921390F7
You need to create an elastic pool that contains an Azure SQL database named db2 and a new SQL
database named db3.
Explanation/Reference:
Explanation:
Step 1: Create a new SQL database named db3
1. Select SQL in the left-hand menu of the Azure portal. If SQL is not in the list, select All services, then
type SQL in the search box.
2. Select + Add to open the Select SQL deployment option page. Select Single Database. You can view
additional information about the different databases by selecting Show details on the Databases tile.
3. Select Create:
20019535C3F31C49C9E768B2921390F7
4. Enter the required fields if necessary.
5. Leave the rest of the values as default and select Review + Create at the bottom of the form.
6. Review the final settings and select Create. Use Db3 as database name.
On the SQL Database form, select Create to deploy and provision the resource group, server, and
database.
Step 2: Create your elastic pool using the Azure portal.
1. Select Azure SQL in the left-hand menu of the Azure portal. If Azure SQL is not in the list, select All
services, then type Azure SQL in the search box.
3. Select Elastic pool from the Resource type drop-down in the SQL Databases tile. Select Create to create
your elastic pool.
20019535C3F31C49C9E768B2921390F7
5. Select Configure elastic pool
6. On the Configure page, select the Databases tab, and then choose to Add database.
7. Add the Azure SQL database named db2, and the new SQL database named db3 that you created in
Step 1.
8. Select Review + create to review your elastic pool settings and then select Create to create your elastic
pool.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/sql-database/sql-database-elastic-pool-failover-group-tutorial
QUESTION 42
SIMULATION
You need to create an Azure Storage account named account10543936. The solution must meet the
following requirements:
Minimize storage costs.
Ensure that account10543936 can store many image files.
Ensure that account10543936 can quickly retrieve stored image files.
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
Create a general-purpose v2 storage account, which provides access to all of the Azure Storage services:
blobs, files, queues, tables, and disks.
1. On the Azure portal menu, select All services. In the list of resources, type Storage Accounts. As you
begin typing, the list filters based on your input. Select Storage Accounts.
4. Under the Resource group field, select Create new. Enter the name for your new resource group, as
shown in the following image.
6. Select a location for your storage account, or use the default location.
8. Select Review + Create to review your storage account settings and create the account.
9. Select Create.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create
QUESTION 43
SIMULATION
20019535C3F31C49C9E768B2921390F7
Azure Username: xxxxx
Azure Password: xxxxx
You need to ensure that users in the West US region can read data from a local copy of an Azure Cosmos
DB database named cosmos10543936.
NOTE: This task might take several minutes to complete. You can perform other tasks while the
task completes or end this section of the exam.
Explanation/Reference:
Explanation:
You can enable Availability Zones by using Azure portal when creating an Azure Cosmos account.
You can enable Availability Zones by using Azure portal.
20019535C3F31C49C9E768B2921390F7
2. Locate the Cosmos DB database named cosmos10543936
2. To add regions, select the hexagons on the map with the + label that corresponds to your desired region
(s). Alternatively, to add a region, select the + Add region option and choose a region from the drop-down
menu.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/high-availability
https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-manage-database-account
QUESTION 44
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to ensure that [email protected] can manage any databases hosted on an
Azure SQL server named SQL10543936 by signing in using his Azure Active Directory (Azure AD) user
account.
Explanation/Reference:
Explanation:
Provision an Azure Active Directory administrator for your managed instance
Each Azure SQL server (which hosts a SQL Database or SQL Data Warehouse) starts with a single server
20019535C3F31C49C9E768B2921390F7
administrator account that is the administrator of the entire Azure SQL server. A second SQL Server
administrator must be created, that is an Azure AD account. This principal is created as a contained
database user in the master database.
1. In the Azure portal, in the upper-right corner, select your connection to drop down a list of possible Active
Directories. Choose the correct Active Directory as the default Azure AD. This step links the subscription-
associated Active Directory with Azure SQL server making sure that the same subscription is used for both
Azure AD and SQL Server. (The Azure SQL server can be hosting either Azure SQL Database or Azure
SQL Data Warehouse.)
20019535C3F31C49C9E768B2921390F7
5. In the Add admin page, search for user [email protected], select it, and then select
Select. (The Active Directory admin page shows all members and groups of your Active Directory. Users or
groups that are grayed out cannot be selected because they are not supported as Azure AD administrators.
20019535C3F31C49C9E768B2921390F7
6. At the top of the Active Directory admin page, select SAVE.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-aad-authentication-configure?
QUESTION 45
HOTSPOT
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: Yes
You can now use a new extension of Azure Stream Analytics SQL to specify the number of partitions of a
stream when reshuffling the data.
The outcome is a stream that has the same partition scheme. Please see below for an example:
SELECT * INTO [output] FROM step1 PARTITION BY DeviceID UNION step2 PARTITION BY DeviceID
Note: The new extension of Azure Stream Analytics SQL includes a keyword INTO that allows you to
specify the number of partitions for a stream when performing reshuffling using a PARTITION BY
statement.
Box 2: Yes
When joining two streams of data explicitly repartitioned, these streams must have the same partition key
and partition count.
Box 3: Yes
Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your
job.
In general, the best practice is to start with 6 SUs for queries that don't use PARTITION BY.
Here there are 10 partitions, so 6x10 = 60 SUs is good.
20019535C3F31C49C9E768B2921390F7
Note: Remember, Streaming Unit (SU) count, which is the unit of scale for Azure Stream Analytics, must
be adjusted so the number of physical resources available to the job can fit the partitioned flow. In general,
six SUs is a good number to assign to each partition. In case there are insufficient resources assigned to
the job, the system will only apply the repartition if it benefits the job.
Reference:
https://azure.microsoft.com/en-in/blog/maximize-throughput-with-repartitioning-in-azure-stream-analytics/
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption
QUESTION 46
DRAG DROP
You have an Azure SQL database named DB1 in the East US 2 region.
You need to build a secondary geo-replicated copy of DB1 in the West US region on a new server.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
1. In the Azure portal, browse to the database that you want to set up for geo-replication.
2. (Step 1) On the SQL database page, select geo-replication, and then select the region to create the
secondary database.
3. (Step 2-3) Select or configure the server and pricing tier for the secondary database.
Step 3: On the secondary server, create logins that match the SIDs on the primary server.
Incorrect Answers:
Not log shipping: Replication is used.
References:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-active-geo-replication-portal
QUESTION 47
HOTSPOT
You have an Azure SQL database that contains a table named Employee. Employee contains sensitive
data in a decimal (10,2) column named Salary.
You need to ensure that nonprivileged users can view the table data, but Salary must display a number
from 0 to 100.
What should you configure? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: SELECT
Users with SELECT permission on a table can view the table data. Columns that are defined as masked,
will display the masked data.
Incorrect:
Grant the UNMASK permission to a user to enable them to retrieve unmasked data from the columns for
which masking is defined.
The CONTROL permission on the database includes both the ALTER ANY MASK and UNMASK
permission.
QUESTION 48
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to implement changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 days.
Solution: You apply an Azure policy that tags the storage account.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead apply an Azure Blob storage lifecycle policy.
Reference:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-
portal
QUESTION 49
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to implement changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 days.
Solution: You apply an expired tag to the blobs in the storage account.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead apply an Azure Blob storage lifecycle policy.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-
portal
QUESTION 50
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to implement changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 days.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage
accounts. Use the policy to transition your data to the appropriate access tiers or expire at the end of the
data's lifecycle.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-
portal
QUESTION 51
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field
named Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements:
Solution: You implement data masking and use a custom string function mask.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Must use Custom Text data masking, which exposes the first and last characters and adds a custom
padding string in the middle.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started
QUESTION 52
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to implement changes to a data storage solution to meet regulatory and compliance standards.
20019535C3F31C49C9E768B2921390F7
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 days.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead apply an Azure Blob storage lifecycle policy.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-
portal
QUESTION 53
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical
values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse
Analytics.
You need to prepare the files to ensure that the data copies quickly.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
All file formats have different performance characteristics. For the fastest load, use compressed delimited
text files.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
20019535C3F31C49C9E768B2921390F7
Implement data storage solutions
Testlet 2
Background
Proseware, Inc, develops and manages a product named Poll Taker. The product is used for delivering
public opinion polling and analysis.
Polling data comes from a variety of sources, including online surveys, house-to-house interviews, and
booths at public events.
Polling data
Polling data is stored in one of the two locations:
Poll metadata
Each poll has associated metadata with information about the poll including the date and number of
respondents. The data is stored as JSON.
Phone-based polling
Security
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
Performance
After six months, raw polling data should be moved to a storage account. The storage must be available in
the event of a regional disaster. The solution must minimize costs.
Deployments
All deployments must be performed by using Azure DevOps. Deployments must use templates used in
multiple environments
No credentials or secrets should be used during deployments
Reliability
All services and processes must be resilient to a regional Azure outage.
Monitoring
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must
be monitored.
QUESTION 1
DRAG DROP
You need to ensure that phone-based polling data can be analyzed in the PollingData database.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer are and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Select and Place:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario:
All deployments must be performed by using Azure DevOps. Deployments must use templates used in
multiple environments
20019535C3F31C49C9E768B2921390F7
No credentials or secrets should be used during deployments
QUESTION 2
DRAG DROP
How should you configure the storage account? To answer, drag the appropriate Configuration Value to the
correct Setting. Each Configuration Value may be used once, more than once, or not at all. You may need
to drag the split bar between panes or scroll to view content.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Geo-redundant storage (GRS) is designed to provide at least 99.99999999999999% (16 9's) durability of
objects over a given year by replicating your data to a secondary region that is hundreds of miles away
from the primary region. If your storage account has GRS enabled, then your data is durable even in the
case of a complete regional outage or a disaster in which the primary region isn't recoverable.
If you opt for GRS, you have two related options to choose from:
GRS replicates your data to another data center in a secondary region, but that data is available to be
read only if Microsoft initiates a failover from the primary to secondary region.
Read-access geo-redundant storage (RA-GRS) is based on GRS. RA-GRS replicates your data to
another data center in a secondary region, and also provides you with the option to read from the
secondary region. With RA-GRS, you can read from the secondary region regardless of whether
Microsoft initiates a failover from the primary to secondary region.
References:
https://docs.microsoft.com/bs-cyrl-ba/azure/storage/blobs/data-lake-storage-quickstart-create-account
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs
20019535C3F31C49C9E768B2921390F7
Implement data storage solutions
Testlet 3
Case Study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on this
exam. You must manage your time to ensure that you are able to complete all questions included on this
exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in
the case study. Case studies might contain exhibits and other resources that provide more information
about the scenario that is described in the case study. Each question is independent of the other question
in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers
and to make changes before you move to the next section of the exam. After you begin a new section, you
cannot return to this section.
Overview
General Overview
Litware, Inc. is an international car racing and manufacturing company that has 1,000 employees. Most
employees are located in Europe. The company supports racing teams that complete in a worldwide racing
series.
Physical Locations
Litware has two main locations: a main office in London, England, and a manufacturing plant in Berlin,
Germany.
During each race weekend, 100 engineers set up a remote portable office by using a VPN to connect the
datacenter in the London office. The portable office is set up and torn down in approximately 20 different
countries each year.
Existing environment
Race Central
During race weekends, Litware uses a primary application named Race Central. Each car has several
sensors that send real-time telemetry data to the London datacentre. The data is used for real-time tracking
of the cars.
Race Central also sends batch updates to an application named Mechanical Workflow by using Microsoft
SQL Server Integration Services (SSIS).
The telemetry data is sent to a MongoDB database. A custom application then moves the data to
databases in SQL Server 2017. The telemetry data in MongoDB has more than 500 attributes. The
application changes the attribute names when the data is moved to SQL Server 2017.
Mechanical Workflow
Mechanical Workflow is used to track changes and improvements made to the cars during their lifetime.
20019535C3F31C49C9E768B2921390F7
Currently, Mechanical Workflow runs on SQL Server 2017 as an OLAP system.
Mechanical Workflow has a table named Table1 that is 1 TB. Large aggregations are performed on a
single column of Table1.
Requirements
Planned Changes
Litware is in the process of rearchitecting its data estate to be hosted in Azure. The company plans to
decommission the London datacentre and move all its applications to an Azure datacenter.
Technical Requirements
Data collection for Race Central must be moved to Azure Cosmos DB and Azure SQL Database. The
data must be written to the Azure datacenter closest to each race and must converge in the least
amount of time.
The query performance of Race Central must be stable, and the administrative time it takes to perform
optimizations must be minimized.
The database for Mechanical Workflow must be moved to Azure SQL Data Warehouse.
Transparent data encryption (TDE) must be enabled on all data stores, whenever possible.
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for
Race Central. If the data load takes longer than 20 minutes, configuration changes must be made to
Data Factory.
The telemetry data must migrate toward a solution that is native to Azure.
The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB
Request Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the RU/s.
During race weekends, visitors will be able to enter the remote portable offices. Litware is concerned that
some proprietary information might be exposed. The company identifies the following data masking
requirements for the Race Central data that will be stored in SQL Database:
Only show the last four digits of the values in a column named SuspensionSprings.
Only show a zero value for the values in a column named ShockOilWeight.
QUESTION 1
HOTSPOT
You need to build a solution to collect the telemetry data for Race Central.
What should you use? To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
API: Table
Azure Cosmos DB provides native support for wire protocol-compatible APIs for popular databases. These
include MongoDB, Apache Cassandra, Gremlin, and Azure Table storage.
Scenario: The telemetry data must migrate toward a solution that is native to Azure.
20019535C3F31C49C9E768B2921390F7
Use the strongest consistency Strong to minimize convergence time.
Scenario: The data must be written to the Azure datacenter closest to each race and must converge in the
least amount of time.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
QUESTION 2
On which data store should you configure TDE to meet the technical requirements?
A. Cosmos DB
B. Azure Synapse Analytics
C. SQL Database
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario: Transparent data encryption (TDE) must be enabled on all data stores, whenever possible.
The database for Mechanical Workflow must be moved to Azure Synapse Analytics.
Incorrect Answers:
A: Cosmos DB does not support TDE.
QUESTION 3
HOTSPOT
You are building the data store solution for Mechanical Workflow.
How should you configure Table1? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Table Type: Hash distributed.
Hash-distributed tables improve query performance on large fact tables.
Scenario:
Mechanical Workflow has a named Table1 that is 1 TB. Large aggregations are performed on a single
column of Table 1.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute
QUESTION 4
HOTSPOT
Which masking functions should you implement for each column to meet the data masking requirements?
To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 2: Default
Default uses a zero value for numeric data types (bigint, bit, decimal, int, money, numeric, smallint,
smallmoney, tinyint, float, real).
Only show a zero value for the values in a column named ShockOilWeight.
Scenario:
The company identifies the following data masking requirements for the Race Central data that will be
stored in SQL Database:
Only show a zero value for the values in a column named ShockOilWeight.
Only show the last four digits of the values in a column named SuspensionSprings.
Reference: https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview
QUESTION 5
HOTSPOT
Which masking functions should you implement for each column to meet the data masking requirements?
To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
Box 1: Custom text/string: A masking method, which exposes the first and/or last characters and adds a
custom padding string in the middle.
Only show the last four digits of the values in a column named SuspensionSprings.
Box 2: Default
Default uses a zero value for numeric data types (bigint, bit, decimal, int, money, numeric, smallint,
smallmoney, tinyint, float, real).
Scenario: Only show a zero value for the values in a column named ShockOilWeight.
Scenario:
The company identifies the following data masking requirements for the Race Central data that will be
stored in SQL Database:
Only show a zero value for the values in a column named ShockOilWeight.
Only show the last four digits of the values in a column named SuspensionSprings.
Reference:
https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview
20019535C3F31C49C9E768B2921390F7
Implement data storage solutions
Testlet 4
Case study
Overview
ADatum Corporation is a retailer that sells products through two sales channels: retail stores and a website.
Existing Environment
ADatum has one database server that has Microsoft SQL Server 2016 installed. The server hosts three
mission-critical databases named SALESDB, DOCDB, and REPORTINGDB.
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in two
different JSON formats based on the sales channel.
REPORTINGDB stores reporting data and contains server columnstore indexes. A daily process creates
reporting data in REPORTINGDB from the data in SALESDB. The process is implemented as a SQL
Server Integration Services (SSIS) package that runs a stored procedure from SALESDB.
Requirements
Planned Changes
ADatum plans to move the current data infrastructure to Azure. The new infrastructure has the following
requirements:
Technical Requirements
The new Azure data infrastructure must meet the following technical requirements:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must
use your own key.
SALESDB must be restorable to any given minute within the past three weeks.
Real-time processing must be monitored to ensure that workloads are sized properly based on actual
usage patterns.
Missing indexes must be created automatically for REPORTINGDB.
Disk IO, CPU, and memory usage must be monitored for SALESDB.
QUESTION 1
You need to configure a disaster recovery solution for SALESDB to meet the technical requirements.
A. weekly long-term retention backups that are retained for three weeks
B. failover groups
C. a point-in-time restore
D. geo-replication
20019535C3F31C49C9E768B2921390F7
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario: SALESDB must be restorable to any given minute within the past three weeks.
The Azure SQL Database service protects all databases with an automated backup system. These
backups are retained for 7 days for Basic, 35 days for Standard and 35 days for Premium. Point-in-time
restore is a self-service capability, allowing customers to restore a Basic, Standard or Premium database
from these backups to any point within the retention period.
References:
https://azure.microsoft.com/en-us/blog/azure-sql-database-point-in-time-restore/
QUESTION 2
You need to implement event processing by using Stream Analytics to produce consistent JSON
documents.
Which three actions should you perform? Each correct answer presents part of the solution.
Explanation/Reference:
Explanation:
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in
two different JSON formats based on the sales channel.
The sales data including the documents in JSON format, must be gathered as it arrives and analyzed
online by using Azure Stream Analytics. The analytic process will perform aggregations that must be
done continuously, without gaps, and without overlapping.
As they arrive, all the sales documents in JSON format must be transformed into one consistent format.
20019535C3F31C49C9E768B2921390F7
Manage and develop data processing
Question Set 1
QUESTION 1
You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to
an Azure Blob storage account.
You need to output the count of tweets during the last five minutes every five minutes.
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Tumbling window functions are used to segment a data stream into distinct time segments and perform a
function against them, such as the example below. The key differentiators of a Tumbling window are that
they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
Incorrect Answers:
D: Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as
Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To
make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the
window size.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
QUESTION 2
20019535C3F31C49C9E768B2921390F7
You are developing a solution that will stream to Azure Stream Analytics. The solution will have both
streaming data and reference data.
Which input type should you use for the reference data?
A. Azure Cosmos DB
B. Azure Event Hubs
C. Azure Blob storage
D. Azure IoT Hub
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Stream Analytics supports Azure Blob storage and Azure SQL Database as the storage layer for Reference
Data.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data
QUESTION 3
HOTSPOT
Which windowing function should you use for each requirement? To answer, select the appropriate options
in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Box 1: Tumbling
Tumbling window functions are used to segment a data stream into distinct time segments and perform a
function against them, such as the example below. The key differentiators of a Tumbling window are that
they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
Box 2: Hoppping
20019535C3F31C49C9E768B2921390F7
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as
Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To
make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the
window size.
Box 3: Sliding
Sliding window functions, unlike Tumbling or Hopping windows, produce an output only when an event
occurs. Every window will have at least one event and the window continuously moves forward by an €
(epsilon). Like hopping windows, events can belong to more than one sliding window.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
QUESTION 4
DRAG DROP
You have an Azure Data Lake Storage Gen2 account that contains JSON files for customers. The files
contain two attributes named FirstName and LastName.
You need to copy the data from the JSON files to an Azure Synapse Analytics table by using Azure
Databricks. A new column must be created that concatenates the FirstName and LastName values.
Which five actions should you perform in sequence next in a Databricks notebook? To answer, move the
20019535C3F31C49C9E768B2921390F7
appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Step 4: Write the results to a table in Azure Synapse.
You upload the transformed data frame into Azure Synapse. You use the Azure Synapse connector for
Azure Databricks to directly upload a dataframe as a table in a Azure Synapse.
Reference:
https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
QUESTION 5
You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South
region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory.
Ensure that the data remains in the UK South region at all times.
Minimize administrative effort.
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Incorrect Answers:
B: Self-hosted integration runtime is to be used On-premises.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
QUESTION 6
You plan to perform batch processing in Azure Databricks once daily.
20019535C3F31C49C9E768B2921390F7
A. automated
B. interactive
C. High Concurrency
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Azure Databricks has two types of clusters: interactive and automated. You use interactive clusters to
analyze data collaboratively with interactive notebooks. You use automated clusters to run fast and robust
automated jobs.
The suggested best practice is to launch a new cluster for each run of critical jobs. This helps avoid any
issues (failures, missing SLA, and so on) due to an existing workload (noisy neighbor) on a shared cluster.
Reference:
https://docs.databricks.com/administration-guide/cloud-configurations/aws/cmbp.html#scenario-3-
scheduled-batch-workloads-data-engineers-running-etl-jobs
QUESTION 7
HOTSPOT
You need to implement an Azure Databricks cluster that automatically connects to Azure Data Lake
Storage Gen2 by using Azure Active Directory (Azure AD) integration.
How should you configure the new cluster? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Box 1: High Concurrency
Enable Azure Data Lake Storage credential passthrough for a high-concurrency cluster.
Incorrect:
Support for Azure Data Lake Storage credential passthrough on standard clusters is in Public Preview.
Standard clusters with credential passthrough are supported on Databricks Runtime 5.5 and above and are
limited to a single user.
References:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html
QUESTION 8
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data Warehouse. The
data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL Data
Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure storage account
3. Create an external file format and external table using the external data source
4. Load the data using the INSERT…SELECT statement
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
There is no need to convert the parquet files to CSV files.
You load the data using the CREATE TABLE AS SELECT statement.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 9
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data Warehouse. The
data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL Data
Warehouse.
Solution:
1. Create an external data source pointing to the Azure storage account
2. Create an external file format and external table using the external data source
3. Load the data using the INSERT…SELECT statement
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You load the data using the CREATE TABLE AS SELECT statement.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 10
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data Warehouse. The
data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL Data
Warehouse.
Solution:
1. Create an external data source pointing to the Azure storage account
2. Create a workload group using the Azure storage account name as the pool name
3. Load the data using the INSERT…SELECT statement
A. Yes
B. No
Correct Answer: B
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You need to create an external file format and external table using the external data source.
You then load the data using the CREATE TABLE AS SELECT statement.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 11
You develop data engineering solutions for a company.
You must integrate the company’s on-premises Microsoft SQL Server data with Microsoft Azure SQL
Database. Data must be transformed incrementally.
A. Use the Copy Data tool with Blob storage linked service as the source
B. Use Azure PowerShell with SQL Server linked service as a source
C. Use Azure Data Factory UI with Blob storage linked service as a source
D. Use the .NET Data Factory API with Blob storage linked service as the source
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The Integration Runtime is a customer managed data integration infrastructure used by Azure Data Factory
to provide data integration capabilities across different network environments.
A linked service defines the information needed for Azure Data Factory to connect to a data resource. We
have three resources in this scenario for which linked services are needed:
On-premises SQL Server
Azure Blob Storage
Azure SQL database
Note: Azure Data Factory is a fully managed cloud-based data integration service that orchestrates and
automates the movement and transformation of data. The key concept in the ADF model is pipeline. A
pipeline is a logical grouping of Activities, each of which defines the actions to perform on the data
contained in Datasets. Linked services are used to define the information needed for Data Factory to
connect to the data resources.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-sql-azure-adf
QUESTION 12
HOTSPOT
A company runs Microsoft Dynamics CRM with Microsoft SQL Server on-premises. SQL Server Integration
Services (SSIS) packages extract data from Dynamics CRM APIs, and load the data into a SQL Server
data warehouse.
The datacenter is running out of capacity. Because of the network configuration, you must extract on
premises data to the cloud over https. You cannot open any additional ports. The solution must implement
the least amount of effort.
20019535C3F31C49C9E768B2921390F7
Which component should you use? To answer, select the appropriate technology in the dialog box in the
answer area.
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: Source
For Copy activity, it requires source and sink linked services to define the direction of data flow.
Copying between a cloud data source and a data source in private network: if either source or sink linked
service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime
QUESTION 13
DRAG DROP
A project requires analysis of real-time Twitter feeds. Posts that contain specific keywords must be stored
and processed on Microsoft Azure and then displayed by using Microsoft Power BI. You need to implement
the solution.
Which five actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Step 2: Create a Jyputer Notebook
Step 4: Run a job that uses the Spark Streaming API to ingest data from Twitter
References:
https://acadgild.com/blog/streaming-twitter-data-using-spark
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-use-with-data-lake-store
QUESTION 14
DRAG DROP
Your company manages on-premises Microsoft SQL Server pipelines by using a custom solution.
The data engineering team must implement a process to pull data from SQL Server and migrate it to Azure
Blob storage. The process must orchestrate and manage the data lifecycle.
You need to configure Azure Data Factory to connect to the on-premises SQL Server database.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: Create a virtual private network (VPN) connection from on-premises to Microsoft Azure.
You can also use IPSec VPN or Azure ExpressRoute to further secure the communication channel
between your on-premises network and Azure.
Azure Virtual Network is a logical representation of your network in the cloud. You can connect an on-
premises network to your virtual network by setting up IPSec VPN (site-to-site) or ExpressRoute (private
peering).
Note: A self-hosted integration runtime can run copy activities between a cloud data store and a data store
in a private network, and it can dispatch transform activities against compute resources in an on-premises
network or an Azure virtual network. The installation of a self-hosted integration runtime needs on an on-
premises machine or a virtual machine (VM) inside a private network.
References:
https://docs.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-powershell
QUESTION 15
HOTSPOT
Ingestion:
20019535C3F31C49C9E768B2921390F7
Stream processing:
You need to identify the correct technologies to build the Lambda architecture using minimal effort. Which
technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/
QUESTION 16
You develop data engineering solutions for a company.
You need to ingest and visualize real-time Twitter data by using Microsoft Azure.
Which three technologies should you use? Each correct answer presents part of the solution.
20019535C3F31C49C9E768B2921390F7
NOTE: Each correct selection is worth one point.
Explanation/Reference:
Explanation:
You can use Azure Logic apps to send tweets to an event hub and then use a Stream Analytics job to read
from event hub and send them to PowerBI.
References:
https://community.powerbi.com/t5/Integrations-with-Files-and/Twitter-streaming-analytics-step-by-step/td-
p/9594
QUESTION 17
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain
the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks
environments:
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data
engineers, and a Standard cluster for the jobs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
20019535C3F31C49C9E768B2921390F7
Explanation:
We would need a High Concurrency cluster for the jobs.
Note:
Standard clusters are recommended for a single user. Standard can run workloads developed in any
language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are
that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum
query latencies.
References:
https://docs.azuredatabricks.net/clusters/configure.html
QUESTION 18
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain
the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks
environments:
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data
engineers, and a High Concurrency cluster for the jobs.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We need a High Concurrency cluster for the data engineers and the jobs.
Note:
Standard clusters are recommended for a single user. Standard can run workloads developed in any
language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are
that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum
query latencies.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.azuredatabricks.net/clusters/configure.html
QUESTION 19
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain
the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks
environments:
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the
data engineers, and a Standard cluster for the jobs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
No need for a High Concurrency cluster for each data scientist.
Standard clusters are recommended for a single user. Standard can run workloads developed in any
language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are
that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum
query latencies.
References:
https://docs.azuredatabricks.net/clusters/configure.html
QUESTION 20
You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct
values for a column named clusterID.
You monitor the Stream Analytics job and discover high latency.
Which two actions should you perform? Each correct answer presents a complete solution.
20019535C3F31C49C9E768B2921390F7
NOTE: Each correct selection is worth one point.
Correct Answer: CE
Section: (none)
Explanation
Explanation/Reference:
Explanation:
C: Scaling a Stream Analytics job takes advantage of partitions in the input or output. Partitioning lets you
divide data into subsets based on a partition key. A process that consumes the data (such as a Streaming
Analytics job) can consume and write different partitions in parallel, which increases throughput.
E: Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your
job. This capacity lets you focus on the query logic and abstracts the need to manage the hardware to run
your Stream Analytics job in a timely manner.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption
QUESTION 21
SIMULATION
20019535C3F31C49C9E768B2921390F7
Azure Username: xxxxx
Azure Password: xxxxx
You plan to generate large amounts of real-time data that will be copied to Azure Blob storage.
You plan to create reports that will read the data from an Azure Cosmos DB database.
You need to create an Azure Stream Analytics job that will input the data from a blob storage named
storage10277521 to the Cosmos DB database.
Explanation/Reference:
Explanation:
Step 1: Create a Stream Analytics job
1. Sign in to the Azure portal.
2. Select Create a resource in the upper left-hand corner of the Azure portal.
3. Select Analytics > Stream Analytics job from the results list.
20019535C3F31C49C9E768B2921390F7
5. Check the Pin to dashboard box to place your job on your dashboard and then select Create.
6. You should see a Deployment in progress... notification displayed in the top right of your browser
window.
2. Select Inputs > Add Stream input > Azure Blob storage
20019535C3F31C49C9E768B2921390F7
3. In the Azure Blob storage setting choose: storage10277521. Leave other options to default values and
select Save to save the settings.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal
QUESTION 22
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You plan to deploy an integration runtime named Runtime1 to an Azure virtual machine.
You need to create an Azure Data Factory V2, and then prepare the required Data Factory resources for
App1.
Explanation/Reference:
Explanation:
2. Select Create a resource on the left menu, select Analytics, and then select Data Factory.
20019535C3F31C49C9E768B2921390F7
4. On the New data factory page, enter a name.
5. For Subscription, select your Azure subscription in which you want to create the data factory.
6. For Resource Group, use one of the following steps:
Select Use existing, and select an existing resource group from the list.
Select Create new, and enter the name of a resource group.
7. For Version, select V2.
8. For Location, select the location for the data factory.
9. Select Create.
10. After the creation is complete, you see the Data Factory page.
1. In the self-hosted IR Runtime to be shared, click Connections and Grant permission to another Data
Factory. .
20019535C3F31C49C9E768B2921390F7
2. Select the data factory you just created.
20019535C3F31C49C9E768B2921390F7
3. In the data factory to which the permissions were granted, create a new self-hosted IR (linked) and enter
the resource ID.
20019535C3F31C49C9E768B2921390F7
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#sharing-the-
self-hosted-integration-runtime-ir-with-multiple-data-factories
QUESTION 23
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You plan to create multiple pipelines in a new Azure Data Factory V2.
You need to create the data factory, and then create a scheduled trigger for the planned pipelines. The
trigger must execute every two hours starting at 24:00:00.
Explanation/Reference:
Explanation:
2. Select Create a resource on the left menu, select Analytics, and then select Data Factory.
20019535C3F31C49C9E768B2921390F7
4. On the New data factory page, enter a name.
5. For Subscription, select your Azure subscription in which you want to create the data factory.
6. For Resource Group, use one of the following steps:
Select Use existing, and select an existing resource group from the list.
Select Create new, and enter the name of a resource group.
7. For Version, select V2.
8. For Location, select the location for the data factory.
9. Select Create.
10. After the creation is complete, you see the Data Factory page.
1. Select the Data Factory you created, and switch to the Edit tab.
20019535C3F31C49C9E768B2921390F7
2. Click Trigger on the menu, and click New/Edit.
3. In the Add Triggers page, click Choose trigger..., and click New.
20019535C3F31C49C9E768B2921390F7
4. In the New Trigger page, do the following steps:
a. Confirm that Schedule is selected for Type.
b. Specify the start datetime of the trigger for Start Date (UTC) to: 24:00:00
c. Specify Recurrence for the trigger. Select Every Hour, and enter 2 in the text box.
20019535C3F31C49C9E768B2921390F7
5. In the New Trigger window, check the Activated option, and click Next.
6. In the New Trigger page, review the warning message, and click Finish.
7. Click Publish to publish changes to Data Factory. Until you publish changes to Data Factory, the trigger
does not start triggering the pipeline runs.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-schedule-trigger
QUESTION 24
Each day, company plans to store hundreds of files in Azure Blob Storage and Azure Data Lake Storage.
The company uses the parquet format.
You need to select the appropriate data technology to implement the pipeline.
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
20019535C3F31C49C9E768B2921390F7
Explanation:
Storm runs topologies instead of the Apache Hadoop MapReduce jobs that you might be familiar with.
Storm topologies are composed of multiple components that are arranged in a directed acyclic graph
(DAG). Data flows between the components in the graph. Each component consumes one or more data
streams, and can optionally emit one or more streams.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-overview
QUESTION 25
HOTSPOT
A company is deploying a service-based data environment. You are developing a solution to process this
data.
Use an Azure HDInsight cluster for data ingestion from a relational database in a different cloud service
Use an Azure Data Lake Storage account to store processed data
Allow users to download processed data
Which technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Azure HDInsight is a cloud distribution of the Hadoop components from the Hortonworks Data Platform
(HDP).
Incorrect Answers:
DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its
distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to
map tasks, each of which will copy a partition of the files specified in the source list. Its MapReduce
pedigree has endowed it with some quirks in both its semantics and execution.
RevoScaleR is a collection of proprietary functions in Machine Learning Server used for practicing data
science at scale. For data scientists, RevoScaleR gives you data-related functions for import,
transformation and manipulation, summarization, visualization, and analysis.
20019535C3F31C49C9E768B2921390F7
Building real-time streaming applications that transform or react to the streams of data
References:
https://sqoop.apache.org/
https://kafka.apache.org/intro
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-ambari-view
QUESTION 26
A company uses Azure SQL Database to store sales transaction data. Field sales employees need an
offline copy of the database that includes last year’s sales on their laptops when there is no internet
connection available.
Which three options can you use? Each correct answer presents a complete solution.
A. Export to a BACPAC file by using Azure Cloud Shell, and save the file to an Azure storage account
B. Export to a BACPAC file by using SQL Server Management Studio. Save the file to an Azure storage
account
C. Export to a BACPAC file by using the Azure portal
D. Export to a BACPAC file by using Azure PowerShell and save the file locally
E. Export to a BACPAC file by using the SqlPackage utility
Explanation/Reference:
Explanation:
You can export to a BACPAC file using the Azure portal.
You can export to a BACPAC file using SQL Server Management Studio (SSMS). The newest versions of
SQL Server Management Studio provide a wizard to export an Azure SQL database to a BACPAC file.
You can export to a BACPAC file using the SQLPackage utility.
Incorrect Answers:
D: You can export to a BACPAC file using PowerShell. Use the New-AzSqlDatabaseExport cmdlet to
submit an export database request to the Azure SQL Database service. Depending on the size of your
database, the export operation may take some time to complete. However, the file is not stored locally.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-export
QUESTION 27
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data Warehouse. The
data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL Data
Warehouse.
Solution:
1. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
2. Create an external file format and external table using the external data source
20019535C3F31C49C9E768B2921390F7
3. Load the data using the CREATE TABLE AS SELECT statement
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You need to create an external file format and external table using the external data source.
You load the data using the CREATE TABLE AS SELECT statement.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 28
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure
Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
Solution:
1. Create a remote service binding pointing to the Azure Data Lake Gen 2 storage account
2. Create an external file format and external table using the external data source
3. Load the data using the CREATE TABLE AS SELECT statement
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You need to create an external file format and external table from an external data source, instead from a
remote service binding pointing.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 29
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure
Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
20019535C3F31C49C9E768B2921390F7
Solution:
1. Create an external data source pointing to the Azure storage account
2. Create a workload group using the Azure storage account name as the pool name
3. Load the data using the CREATE TABLE AS SELECT statement
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Use the Azure Data Lake Gen 2 storage account.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 30
You need to develop a pipeline for processing data. The pipeline must meet the following requirements:
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to
boost the performance of big-data analysis applications.
HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch
processing, you can use Spark, Hive, Hive LLAP, MapReduce.
You can create an HDInsight Spark cluster using an Azure Resource Manager template. The template can
be found in GitHub.
References:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
QUESTION 31
DRAG DROP
20019535C3F31C49C9E768B2921390F7
You implement an event processing solution using Microsoft Azure Stream Analytics.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: Configure Blob storage as input; select items with the TIMESTAMP BY clause
The default timestamp of Blob storage events in Stream Analytics is the timestamp that the blob was last
modified, which is BlobLastModifiedUtcTime. To process the data as a stream using a timestamp in the
event payload, you must use the TIMESTAMP BY keyword.
20019535C3F31C49C9E768B2921390F7
Example:
The following is a TIMESTAMP BY example which uses the EntryTime column as the application time for
events:
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-inputs
QUESTION 32
HOTSPOT
A company plans to use Platform-as-a-Service (PaaS) to create the new data pipeline process. The
process must meet the following requirements:
Ingest:
20019535C3F31C49C9E768B2921390F7
Store:
Which technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Note: Data at rest includes information that resides in persistent storage on physical media, in any digital
format. Microsoft Azure offers a variety of data storage solutions to meet different needs, including file,
disk, blob, and table storage. Microsoft also provides encryption to protect Azure SQL Database, Azure
Cosmos DB, and Azure Data Lake.
20019535C3F31C49C9E768B2921390F7
Prepare and Train: Azure Databricks
Azure Databricks provides enterprise-grade Azure security, including Azure Active Directory integration.
With Azure Databricks, you can set up your Apache Spark environment in minutes, autoscale and
collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R,
Java and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch and scikit-
learn.
Note: Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
References:
https://docs.microsoft.com/bs-latn-ba/azure/architecture/data-guide/technology-choices/pipeline-
orchestration-data-movement
https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks
QUESTION 33
HOTSPOT
A company plans to analyze a continuous flow of data from a social media platform by using Microsoft
Azure Stream Analytics. The incoming data is formatted as one record per row.
How should you complete the REST API segment? To answer, select the appropriate configuration in the
answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV
file stores tabular data (numbers and text) in plain text. Each line of the file is a data record.
JSON and AVRO are not formatted as one record per row.
Box 2: "type":"Microsoft.ServiceBus/EventHub",
Properties include "EventHubName"
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-inputs
https://en.wikipedia.org/wiki/Comma-separated_values
QUESTION 34
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob
storage file named Customers. The file will contain both in-store and online customer details. The online
customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location.
The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to
Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one query, and two outputs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We need one reference data input for LocationIncomes, which rarely changes.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely
static or changes slowly.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs#stream-and-
reference-inputs
QUESTION 35
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob
storage file named Customers. The file will contain both in-store and online customer details. The online
customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location.
The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to
Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
20019535C3F31C49C9E768B2921390F7
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, one
query, and two outputs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We need one reference data input for LocationIncomes, which rarely changes.
We need two queries, on for in-store customers, and one for online customers.
For each query two outputs is needed.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely
static or changes slowly.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs#stream-and-
reference-inputs
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
QUESTION 36
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob
storage file named Customers. The file will contain both in-store and online customer details. The online
customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location.
The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to
Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, two
queries, and four outputs.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We need one reference data input for LocationIncomes, which rarely changes.
We need two queries, on for in-store customers, and one for online customers.
For each query two outputs is needed.
20019535C3F31C49C9E768B2921390F7
Note: Stream Analytics also supports input known as reference data. Reference data is either completely
static or changes slowly.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs#stream-and-
reference-inputs
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
QUESTION 37
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain
the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks
environments:
Solution: You create a Standard cluster for each data scientist, a Standard cluster for the data engineers,
and a High Concurrency cluster for the jobs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We need a High Concurrency cluster for the data engineers and the jobs.
Note:
Standard clusters are recommended for a single user. Standard can run workloads developed in any
language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are
that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum
query latencies.
References:
https://docs.azuredatabricks.net/clusters/configure.html
QUESTION 38
20019535C3F31C49C9E768B2921390F7
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure
Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
3. Create an external file format and external table using the external data source
4. Load the data using the CREATE TABLE AS SELECT statement
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
It is not necessary to convert the parquet files to CSV files.
You need to create an external file format and external table using the external data source.
You load the data using the CREATE TABLE AS SELECT statement.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-
lake-store
QUESTION 39
You need to implement complex stateful business logic within an Azure Stream Analytics service.
Which type of function should you create in the Stream Analytics topology?
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Azure Stream Analytics supports user-defined aggregates (UDA) written in JavaScript, it enables you to
implement complex stateful business logic. Within UDA you have full control of the state data structure,
state accumulation, state decumulation, and aggregate result computation.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-javascript-user-defined-
aggregates
QUESTION 40
You have an Azure virtual machine that has Microsoft SQL Server installed. The server contains a table
named Table1.
You need to copy the data from Table1 to an Azure Data Lake Storage Gen2 account by using an Azure
Data Factory V2 copy activity.
20019535C3F31C49C9E768B2921390F7
Which type of integration runtime should you use?
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Copying between a cloud data source and a data source in private network: if either source or sink linked
service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
References:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime#determining-which-ir-to-
use
QUESTION 41
DRAG DROP
Your company plans to create an event processing engine to handle streaming data from Twitter.
The data engineering team uses Azure Event Hubs to ingest the streaming data.
You need to implement a solution that uses Azure Databricks to receive the streaming data from the Azure
Event Hubs.
Which three actions should you recommend be performed in sequence? To answer, move the appropriate
actions from the list of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 2: Deploy a Spark cluster and then attach the required libraries to the cluster.
To create a Spark cluster in Databricks, in the Azure portal, go to the Databricks workspace that you
created, and then select Launch Workspace.
Attach libraries to Spark cluster: you use the Twitter APIs to send tweets to Event Hubs. You also use the
Apache Spark Event Hubs connector to read and write data into Azure Event Hubs. To use these APIs as
part of your cluster, add them as libraries to Azure Databricks and associate them with your Spark cluster.
Step 3: Create and configure a Notebook that consumes the streaming data.
You create a notebook named ReadTweetsFromEventhub in Databricks workspace.
ReadTweetsFromEventHub is a consumer notebook you use to read the tweets from Event Hubs.
References:
https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-stream-from-eventhubs
QUESTION 42
HOTSPOT
You need to provision an HDInsight cluster for batch processing of data on Microsoft Azure.
How should you complete the PowerShell segment? To answer, select the appropriate options in the
answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: New-AzStorageContainer
# Example: Create a blob container. This holds the default data store for the cluster.
New-AzStorageContainer `
-Name $clusterName `
-Context $defaultStorageContext
Box 2: Spark
Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into
memory and query it repeatedly. In-memory computing is much faster than disk-based applications than
disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system
(HDFS).
Box 3: New-AzureRMHDInsightCluster
# Create the HDInsight cluster. Example:
New-AzHDInsightCluster `
-ResourceGroupName $resourceGroupName `
-ClusterName $clusterName `
-Location $location `
-ClusterSizeInNodes $clusterSizeInNodes `
-ClusterType $"Spark" `
20019535C3F31C49C9E768B2921390F7
-OSType "Linux" `
Box 4: Spark
HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch
processing, you can use Spark, Hive, Hive LLAP, MapReduce.
References:
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/spark/apache-spark-jupyter-spark-sql-use-powershell
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/spark/apache-spark-overview
QUESTION 43
HOTSPOT
A company plans to develop solutions to perform batch processing of multiple sets of geospatial data.
Which Azure services should you use? To answer, select the appropriate configuration in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
References:
https://visualstudiomagazine.com/articles/2019/01/25/vscode-hdinsight.aspx
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-ambari-view
https://docs.microsoft.com/en-us/rest/api/hdinsight/
QUESTION 44
DRAG DROP
You must use PolyBase to retrieve data from Azure Blob storage that resides in parquet format and load
the data into a large table called FactSalesOrderDetails.
20019535C3F31C49C9E768B2921390F7
You need to configure Azure Synapse Analytics to receive the data.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Note: PolyBase is a technology that accesses and combines both non-relational and relational data, all
from within SQL Server. It allows you to run queries on external data in Hadoop or Azure blob storage.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-azure-blob-storage
QUESTION 45
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
20019535C3F31C49C9E768B2921390F7
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob
storage file named Customers. The file will contain both in-store and online customer details. The online
customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location.
The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to
Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two outputs.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We need one reference data input for LocationIncomes, which rarely changes
Note: Stream Analytics also supports input known as reference data. Reference data is either completely
static or changes slowly.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs#stream-and-
reference-inputs
QUESTION 46
DRAG DROP
You need to deploy a Microsoft Azure Stream Analytics job for an IoT solution. The solution must:
Minimize latency.
Minimize bandwidth usage between the job and IoT device.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: Create and IoT hub and add the Azure Stream Analytics module to the IoT Hub namespace
An IoT Hub in Azure is required.
20019535C3F31C49C9E768B2921390F7
Step 2: Create an Azure Blob Storage container
To prepare your Stream Analytics job to be deployed on an IoT Edge device, you need to associate the job
with a container in a storage account. When you go to deploy your job, the job definition is exported to the
storage container.
Stream Analytics accepts data incoming from several kinds of event sources including Event Hubs, IoT
Hub, and Blob storage.
Step 3: Create an Azure Stream Analytics edge job and configure job definition save location
When you create an Azure Stream Analytics job to run on an IoT Edge device, it needs to be stored in a
way that can be called from the device.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs
https://docs.microsoft.com/en-us/azure/iot-edge/tutorial-deploy-stream-analytics
QUESTION 47
DRAG DROP
You have data stored in thousands of CSV files in Azure Data Lake Storage Gen2. Each file has a header
row followed by a property formatted carriage return (/r) and line feed (/n).
You are implementing a pattern that batch loads the files daily into an enterprise data warehouse in Azure
Synapse Analytics by using PolyBase.
You need to skip the header row when you import the files into the data warehouse.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: Create an external data source and set the First_Row option.
Creates an External File Format object defining external data stored in Hadoop, Azure Blob Storage, or
Azure Data Lake Store. Creating an external file format is a prerequisite for creating an External Table.
FIRST_ROW = First_row_int
Specifies the row number that is read first in all files during a PolyBase load. This parameter can take
values 1-15. If the value is set to two, the first row in every file (header row) is skipped when the data is
loaded. Rows are skipped based on the existence of row terminators (/r/n, /r, /n).
Step 2: Create an external data source that uses the abfs location
The hadoop-azure module provides support for the Azure Data Lake Storage Gen2 storage layer through
the “abfs” connector
Step 3: Use CREATE EXTERNAL TABLE AS SELECT (CETAS) and create a view that removes the empty
row.
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql
https://hadoop.apache.org/docs/r3.2.0/hadoop-azure/abfs.html
QUESTION 48
You are creating a new notebook in Azure Databricks that will support R as the primary language but will
also support Scala and SQL.
A. %<language>
B. \\[<language>]
C. \\(<language>)
D. @<Language>
20019535C3F31C49C9E768B2921390F7
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You can override the primary language by specifying the language magic command %<language> at the
beginning of a cell. The supported magic commands are: %python, %r, %scala, and %sql.
References:
https://docs.databricks.com/user-guide/notebooks/notebook-use.html#mix-languages
QUESTION 49
HOTSPOT
You are implementing mapping data flows in Azure Data Factory to convert daily logs of taxi records into
aggregated datasets.
You configure a data flow and receive the error shown in the following exhibit.
Which setting should you configure? To answer, select the appropriate setting in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The Inspect tab provides a view into the metadata of the data stream that you're transforming. You can see
column counts, the columns changed, the columns added, data types, the column order, and column
references. Inspect is a read-only view of your metadata. You don't need to have debug mode enabled to
see metadata in the Inspect pane.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview
QUESTION 50
HOTSPOT
You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB.
The data consumed from each source is shown in the following table.
20019535C3F31C49C9E768B2921390F7
You need to implement Azure Stream Analytics to calculate the average fare per mile by driver.
How should you configure the Stream Analytics input for each source? To answer, select the appropriate
options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
HubA: Stream
HubB: Stream
Database1: Reference
Reference data (also known as a lookup table) is a finite data set that is static or slowly changing in nature,
used to perform a lookup or to augment your data streams. For example, in an IoT scenario, you could
store metadata about sensors (which don’t change often) in reference data and join it with real time IoT
data streams. Azure Stream Analytics loads reference data in memory to achieve low latency stream
processing
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data
20019535C3F31C49C9E768B2921390F7
Manage and develop data processing
Testlet 2
Background
Proseware, Inc, develops and manages a product named Poll Taker. The product is used for delivering
public opinion polling and analysis.
Polling data comes from a variety of sources, including online surveys, house-to-house interviews, and
booths at public events.
Polling data
Polling data is stored in one of the two locations:
Poll metadata
Each poll has associated metadata with information about the poll including the date and number of
respondents. The data is stored as JSON.
Phone-based polling
Security
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
Performance
After six months, raw polling data should be moved to a storage account. The storage must be available in
the event of a regional disaster. The solution must minimize costs.
Deployments
All deployments must be performed by using Azure DevOps. Deployments must use templates used in
multiple environments
No credentials or secrets should be used during deployments
Reliability
All services and processes must be resilient to a regional Azure outage.
Monitoring
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must
be monitored.
QUESTION 1
You need to ensure that phone-based poling data can be analyzed in the PollingData database.
20019535C3F31C49C9E768B2921390F7
C. Use a schedule trigger
D. Use manual execution
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
When creating a schedule trigger, you specify a schedule (start date, recurrence, end date etc.) for the
trigger, and associate with a Data Factory pipeline.
Scenario:
All data migration processes must use Azure Data Factory
All data migrations must run automatically during non-business hours
References:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-schedule-trigger
QUESTION 2
HOTSPOT
You need to ensure that Azure Data Factory pipelines can be deployed. How should you configure
authentication and authorization for deployments? To answer, select the appropriate options in the answer
choices.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The way you control access to resources using RBAC is to create role assignments. This is a key concept
to understand – it’s how permissions are enforced. A role assignment consists of three elements: security
principal, role definition, and scope.
Scenario:
No credentials or secrets should be used during deployments
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
References:
https://docs.microsoft.com/en-us/azure/role-based-access-control/overview
20019535C3F31C49C9E768B2921390F7
Manage and develop data processing
Testlet 3
Overview
Current environment
Contoso relies on an extensive partner network for marketing, sales, and distribution. Contoso uses
external companies that manufacture everything from the actual pharmaceutical to the packaging.
The majority of the company’s data reside in Microsoft SQL Server database. Application databases fall
into one of the following tiers:
The company has a reporting infrastructure that ingests data from local databases and partner services.
Partners services consists of distributors, wholesales, and retailers across the world. The company
performs daily, weekly, and monthly reporting.
Requirements
Tier 3 and Tier 6 through Tier 8 application must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
The solution must support migrating databases that support external and internal application to Azure SQL
Database. The migrated databases will be supported by Azure Data Factory pipelines for the continued
movement, migration and updating of data both in the cloud and from local core business systems and
repositories.
Tier 7 and Tier 8 partner access must be restricted to the database only.
In addition to default Azure backup behavior, Tier 4 and 5 databases must be on a backup strategy that
performs a transaction log backup eve hour, a differential backup of databases every day and a full back
up every week.
Back up strategies must be put in place for all other standalone Azure SQL Databases using Azure SQL-
provided backup storage and capabilities.
Databases
Contoso requires their data estate to be designed and implemented in the Azure Cloud. Moving to the
cloud must not inhibit access to or availability of data.
Databases:
20019535C3F31C49C9E768B2921390F7
Tier 1 Database must implement data masking using the following masking logic:
Tier 2 databases must sync between branches and cloud databases and in the event of conflicts must be
set up for conflicts to be won by on-premises databases.
Tier 3 and Tier 6 through Tier 8 applications must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of a server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
Reporting
Security
A method of managing multiple databases in the cloud at the same time is must be implemented to
streamlining data management and limiting management access to only those requiring access.
Monitoring
Monitoring must be set up on every database. Contoso and partners must receive performance reports as
part of contractual agreements.
Tiers 6 through 8 must have unexpected resource storage usage immediately reported to data engineers.
The Azure SQL Data Warehouse cache must be monitored when the database is being used. A dashboard
monitoring key performance indicators (KPIs) indicated by traffic lights must be created and displayed
based on the following metrics:
Existing Data Protection and Security compliances require that all certificates and keys are internally
managed in an on-premises storage.
Azure Data Warehouse must be used to gather and query data from multiple internal and external
databases
Azure Data Warehouse must be optimized to use data from a cache
20019535C3F31C49C9E768B2921390F7
Reporting data aggregated for external partners must be stored in Azure Storage and be made
available during regular business hours in the connecting regions
Reporting strategies must be improved to real time or near real time reporting cadence to improve
competitiveness and the general supply chain
Tier 9 reporting must be moved to Event Hubs, queried, and persisted in the same Azure region as the
company’s main office
Tier 10 reporting data must be stored in Azure Blobs
Issues
Team members identify the following issues:
Both internal and external client application run complex joins, equality searches and group-by clauses.
Because some systems are managed externally, the queries will not be changed or optimized by
Contoso
External partner organization data formats, types and schemas are controlled by the partner companies
Internal and external database development staff resources are primarily SQL developers familiar with
the Transact-SQL language.
Size and amount of data has led to applications and reporting solutions not performing are required
speeds
Tier 7 and 8 data access is constrained to single endpoints managed by partners for access
The company maintains several legacy client applications. Data for these applications remains isolated
form other applications. This has led to hundreds of databases being provisioned on a per application
basis
QUESTION 1
You need to process and query ingested Tier 9 data.
Which two options should you use? Each correct answer presents part of the solution.
Correct Answer: EF
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Event Hubs provides a Kafka endpoint that can be used by your existing Kafka based applications as an
alternative to running your own Kafka cluster.
You can stream data into Kafka-enabled Event Hubs and process it with Azure Stream Analytics, in the
following steps:
Create a Kafka enabled Event Hubs namespace.
Create a Kafka client that sends messages to the event hub.
Create a Stream Analytics job that copies data from the event hub into an Azure blob storage.
Scenario:
Tier 9 reporting must be moved to Event Hubs, queried, and persisted in the same Azure region as the
company’s main office
References:
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-stream-analytics
20019535C3F31C49C9E768B2921390F7
QUESTION 2
HOTSPOT
You need set up the Azure Data Factory JSON definition for Tier 10 data.
What should you use? To answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
To use storage account key authentication, you use the ConnectionString property, which xpecify the
information needed to connect to Blobl Storage.
Mark this field as a SecureString to store it securely in Data Factory. You can also put account key in Azure
Key Vault and pull the accountKey configuration out of the connection string.
References:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage
QUESTION 3
You need to set up Azure Data Factory pipelines to meet data movement requirements.
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The following table describes the capabilities and network support for each of the integration runtime types:
Scenario: The solution must support migrating databases that support external and internal application to
Azure SQL Database. The migrated databases will be supported by Azure Data Factory pipelines for the
continued movement, migration and updating of data both in the cloud and from local core business
systems and repositories.
References:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
20019535C3F31C49C9E768B2921390F7
Manage and develop data processing
Testlet 4
Case Study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on this
exam. You must manage your time to ensure that you are able to complete all questions included on this
exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in
the case study. Case studies might contain exhibits and other resources that provide more information
about the scenario that is described in the case study. Each question is independent of the other question
in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers
and to make changes before you move to the next section of the exam. After you begin a new section, you
cannot return to this section.
Overview
General Overview
Litware, Inc. is an international car racing and manufacturing company that has 1,000 employees. Most
employees are located in Europe. The company supports racing teams that complete in a worldwide racing
series.
Physical Locations
Litware has two main locations: a main office in London, England, and a manufacturing plant in Berlin,
Germany.
During each race weekend, 100 engineers set up a remote portable office by using a VPN to connect the
datacenter in the London office. The portable office is set up and torn down in approximately 20 different
countries each year.
Existing environment
Race Central
During race weekends, Litware uses a primary application named Race Central. Each car has several
sensors that send real-time telemetry data to the London datacentre. The data is used for real-time tracking
of the cars.
Race Central also sends batch updates to an application named Mechanical Workflow by using Microsoft
SQL Server Integration Services (SSIS).
The telemetry data is sent to a MongoDB database. A custom application then moves the data to
databases in SQL Server 2017. The telemetry data in MongoDB has more than 500 attributes. The
application changes the attribute names when the data is moved to SQL Server 2017.
Mechanical Workflow
Mechanical Workflow is used to track changes and improvements made to the cars during their lifetime.
20019535C3F31C49C9E768B2921390F7
Currently, Mechanical Workflow runs on SQL Server 2017 as an OLAP system.
Mechanical Workflow has a table named Table1 that is 1 TB. Large aggregations are performed on a
single column of Table1.
Requirements
Planned Changes
Litware is in the process of rearchitecting its data estate to be hosted in Azure. The company plans to
decommission the London datacentre and move all its applications to an Azure datacenter.
Technical Requirements
Data collection for Race Central must be moved to Azure Cosmos DB and Azure SQL Database. The
data must be written to the Azure datacenter closest to each race and must converge in the least
amount of time.
The query performance of Race Central must be stable, and the administrative time it takes to perform
optimizations must be minimized.
The database for Mechanical Workflow must be moved to Azure SQL Data Warehouse.
Transparent data encryption (TDE) must be enabled on all data stores, whenever possible.
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for
Race Central. If the data load takes longer than 20 minutes, configuration changes must be made to
Data Factory.
The telemetry data must migrate toward a solution that is native to Azure.
The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB
Request Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the RU/s.
During race weekends, visitors will be able to enter the remote portable offices. Litware is concerned that
some proprietary information might be exposed. The company identifies the following data masking
requirements for the Race Central data that will be stored in SQL Database:
Only show the last four digits of the values in a column named SuspensionSprings.
Only show a zero value for the values in a column named ShockOilWeight.
QUESTION 1
What should you include in the Data Factory pipeline for Race Central?
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario:
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for Race
Central. If the data load takes longer than 20 minutes, configuration changes must be made to Data
Factory.
The telemetry data is sent to a MongoDB database. A custom application then moves the data to
databases in SQL Server 2017. The telemetry data in MongoDB has more than 500 attributes. The
application changes the attribute names when the data is moved to SQL Server 2017.
You can copy data to or from Azure Cosmos DB (SQL API) by using Azure Data Factory pipeline.
20019535C3F31C49C9E768B2921390F7
Column mapping applies when copying data from source to sink. By default, copy activity map source data
to sink by column names. You can specify explicit mapping to customize the column mapping based on
your need. More specifically, copy activity:
Read the data from source and determine the source schema
1. Use default column mapping to map columns by name, or apply explicit column mapping if specified.
2. Write the data to sink
3. Write the data to sink
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping
20019535C3F31C49C9E768B2921390F7
Manage and develop data processing
Testlet 5
Case study
Overview
ADatum Corporation is a retailer that sells products through two sales channels: retail stores and a website.
Existing Environment
ADatum has one database server that has Microsoft SQL Server 2016 installed. The server hosts three
mission-critical databases named SALESDB, DOCDB, and REPORTINGDB.
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in two
different JSON formats based on the sales channel.
REPORTINGDB stores reporting data and contains server columnstore indexes. A daily process creates
reporting data in REPORTINGDB from the data in SALESDB. The process is implemented as a SQL
Server Integration Services (SSIS) package that runs a stored procedure from SALESDB.
Requirements
Planned Changes
ADatum plans to move the current data infrastructure to Azure. The new infrastructure has the following
requirements:
Technical Requirements
The new Azure data infrastructure must meet the following technical requirements:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must
use your own key.
SALESDB must be restorable to any given minute within the past three weeks.
Real-time processing must be monitored to ensure that workloads are sized properly based on actual
usage patterns.
Missing indexes must be created automatically for REPORTINGDB.
Disk IO, CPU, and memory usage must be monitored for SALESDB.
QUESTION 1
Which windowing function should you use to perform the streaming aggregation of the sales data?
A. Tumbling
B. Hopping
C. Sliding
D. Session
Correct Answer: A
Section: (none)
20019535C3F31C49C9E768B2921390F7
Explanation
Explanation/Reference:
Explanation:
Scenario: The analytic process will perform aggregations that must be done continuously, without gaps,
and without overlapping.
The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot
belong to more than one tumbling window.
Incorrect Answers:
B, C: Like hopping windows, events can belong to more than one sliding window.
D: Session windows can have gaps.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
QUESTION 2
DRAG DROP
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
Scenario: A daily process creates reporting data in REPORTINGDB from the data in SALESDB. The
process is implemented as a SQL Server Integration Services (SSIS) package that runs a stored procedure
from SALESDB.
References:
https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal
20019535C3F31C49C9E768B2921390F7
Manage data security
Question Set 1
QUESTION 1
You plan to use Microsoft Azure SQL Database instances with strict user access control. A user object
must:
Which two Transact-SQL commands should you run? Each correct answer presents part of the solution.
Correct Answer: CD
Section: (none)
Explanation
Explanation/Reference:
Explanation:
C: ALTER ROLE adds or removes members to or from a database role, or changes the name of a user-
defined database role.
Members of the db_owner fixed database role can perform all configuration and maintenance activities on
the database, and can also drop the database in SQL Server.
Note: Logins are created at the server level, while users are created at the database level. In other words, a
login allows you to connect to the SQL Server service (also called an instance), and permissions inside the
database are granted to the database users, not the logins. The logins will be assigned to server roles (for
example, serveradmin) and the database users will be assigned to roles within that database (eg.
db_datareader, db_bckupoperator).
References:
https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-role-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-user-transact-sql
QUESTION 2
DRAG DROP
You manage security for a database that supports a line of business application.
Private and personal data stored in the database must be protected and encrypted.
You need to configure the database to use Transparent Data Encryption (TDE).
Which five actions should you perform in sequence? To answer, select the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Step 4: Create a database encryption key and protect it by the certificate
Example code:
USE master;
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<UseStrongPasswordHere>';
go
CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate';
go
USE AdventureWorks2012;
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
GO
ALTER DATABASE AdventureWorks2012
SET ENCRYPTION ON;
GO
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption
QUESTION 3
SIMULATION
20019535C3F31C49C9E768B2921390F7
You need to ensure that only the resources on a virtual network named VNET1 can access an Azure
Storage account named storage10543936.
Explanation/Reference:
Explanation:
You can use Private Endpoints for your Azure Storage accounts to allow clients on a virtual network (VNet)
to securely access data over a Private Link.
2. Select Networking.
5. Select OK.
20019535C3F31C49C9E768B2921390F7
6. Select Review + create. You're taken to the Review + create page where Azure validates your
configuration.
Reference:
https://docs.microsoft.com/en-us/azure/private-link/create-private-endpoint-storage-portal
QUESTION 4
SIMULATION
You need to replicate db1 to a new Azure SQL server named db1-copy10543936 in the US West region.
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
1. In the Azure portal, browse to the database db1-copy10543936 that you want to set up for geo-
replication.
2. On the SQL database page, select geo-replication, and then select the region to create the secondary
database: US West region
3. Select or configure the server and pricing tier for the secondary database.
20019535C3F31C49C9E768B2921390F7
4. Click Create to add the secondary.
20019535C3F31C49C9E768B2921390F7
6. When the seeding process is complete, the secondary database displays its status.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-active-geo-replication-portal
QUESTION 5
SIMULATION
20019535C3F31C49C9E768B2921390F7
Azure Username: xxxxx
Azure Password: xxxxx
You need to ensure that you can recover any blob data from an Azure Storage account named
storage10543936 up to 10 days after the data is deleted.
Explanation/Reference:
Explanation:
Enable soft delete for blobs on your storage account by using Azure portal:
20019535C3F31C49C9E768B2921390F7
4. Enter the number of days you want to retain for under Retention policies. Here enter 10.
Note: Azure Storage now offers soft delete for blob objects so that you can more easily recover your data
when it is erroneously modified or deleted by an application or other storage account user. Currently you
can retain soft deleted data for between 1 and 365 days.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-soft-delete
QUESTION 6
DRAG DROP
You plan to create a new single database instance of Microsoft Azure SQL Database.
The database must only allow communication from the data engineer’s workstation. You must connect
directly to the instance by using Microsoft SQL Server Management Studio.
You need to create and configure the Database. Which three Azure PowerShell cmdlets should you use to
develop the solution? To answer, move the appropriate cmdlets from the list of cmdlets to the answer area
and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: New-AzureSqlServer
Create a server.
Step 2: New-AzureRmSqlServerFirewallRule
New-AzureRmSqlServerFirewallRule creates a firewall rule for a SQL Database server.
Can be used to create a server firewall rule that allows access from the specified IP range.
Step 3: New-AzureRmSqlDatabase
Example: Create a database on a specified server
References:
https://docs.microsoft.com/en-us/azure/sql-database/scripts/sql-database-create-and-configure-database-
powershell?toc=%2fpowershell%2fmodule%2ftoc.json
QUESTION 7
HOTSPOT
Your company uses Azure SQL Database and Azure Blob storage.
All data at rest must be encrypted by using the company’s own key. The solution must minimize
administrative effort and the impact to applications which use the database.
What should you implement? To answer, select the appropriate option in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Note: Transparent data encryption encrypts the storage of an entire database by using a symmetric key
called the database encryption key. This database encryption key is protected by the transparent data
encryption protector.
Transparent data encryption (TDE) helps protect Azure SQL Database, Azure SQL Managed Instance, and
Azure Data Warehouse against the threat of malicious offline activity by encrypting data at rest. It performs
real-time encryption and decryption of the database, associated backups, and transaction log files at rest
without requiring changes to the application.
References:
https://docs.microsoft.com/en-us/azure/sql-database/transparent-data-encryption-azure-sql
https://docs.microsoft.com/en-us/azure/storage/common/storage-service-encryption
QUESTION 8
You develop data engineering solutions for a company.
You need to implement role-based access control (RBAC) so that project members can manage the Azure
Data Lake Storage resources.
Which three actions should you perform? Each correct answer presents part of the solution.
Explanation/Reference:
Explanation:
AD: Create security groups in Azure Active Directory. Assign users or security groups to Data Lake Storage
Gen1 accounts.
E: Assign users or security groups as ACLs to the Data Lake Storage Gen1 file system
References:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
QUESTION 9
DRAG DROP
20019535C3F31C49C9E768B2921390F7
You deploy an Azure SQL database named DB1 to an Azure SQL server named SQL1.
An Azure Active Directory (Azure AD) group named Analysts contains all the users who must have access
to DB1.
The Analysts group must have read-only access to all the views and tables in the Sales schema of DB1.
A manager will decide who can access DB1. The manager will not interact directly with DB1.
Users must not have to manage a separate password solely to access DB1.
Which four actions should you perform in sequence to meet the data security requirements? To answer,
move the appropriate actions from the list of actions to the answer area and arrange them in the correct
order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: From the Azure Portal, set the Active Directory admin for SQL1.
Provision an Azure Active Directory administrator for your Azure SQL Database server.
You can provision an Azure Active Directory administrator for your Azure SQL server in the Azure portal
and by using PowerShell.
Step 2: On DB1, create a contained user for the Analysts group by using Transact-SQL
Create contained database users in your database mapped to Azure AD identities.
To create an Azure AD-based contained database user (other than the server administrator that owns the
database), connect to the database with an Azure AD identity, as a user with at least the ALTER ANY
USER permission. Then use the following Transact-SQL syntax:
Step 3: From Microsoft SQL Server Management Studio (SSMS), sign in to SQL1 by using the account set
as the Active Directory admin.
20019535C3F31C49C9E768B2921390F7
To confirm the Azure AD administrator is properly set up, connect to the master database using the Azure
AD administrator account. To provision an Azure AD-based contained database user (other than the server
administrator that owns the database), connect to the database with an Azure AD identity that has access
to the database.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-aad-authentication-configure
QUESTION 10
DRAG DROP
You have an Azure subscription that contains an Azure Databricks environment and an Azure Storage
account.
You need to implement secure communication between Databricks and the storage account.
Which four actions should you perform in sequence? To answer, move the actions from the list of actions
to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Managing secrets begins with creating a secret scope.
To reference secrets stored in an Azure Key Vault, you can create a secret scope backed by Azure Key
Vault.
References:
https://docs.microsoft.com/en-us/azure/azure-databricks/store-secrets-azure-key-vault
QUESTION 11
You have an Azure SQL server named Server1 that hosts two development databases named DB1 and
DB2.
You have an administrative workstation that has an IP address of 192.168.8.8. The development team at
your company has an IP addresses in the range of 192.168.8.1 to 192.168.8.5.
Which three actions should you perform? Each correct answer presents part of the solution.
A. Create a firewall rule on DB1 that has a start IP address of 192.168.8.1 and an end IP address of
192.168.8.5.
B. Create a firewall rule on DB1 that has a start and end IP address of 0.0.0.0.
C. Create a firewall rule on Server1 that has a start IP address of 192.168.8.1 and an end IP address of
192.168.8.5.
D. Create a firewall rule on DB1 that has a start and end IP address of 192.168.8.8.
E. Create a firewall rule on Server1 that has a start and end IP address of 192.168.8.8.
Explanation/Reference:
QUESTION 12
DRAG DROP
You have an ASP.NET web app that uses an Azure SQL database. The database contains a table named
Employee. The table contains sensitive employee information, including a column named DateOfBirth.
You need to ensure that the data in the DateOfBirth column is encrypted both in the database and when
transmitted between a client and Azure. Only authorized clients must be able to view the data in the
column.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions in the answer area and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted
QUESTION 13
Your company manages a payroll application for its customers worldwide. The application uses an Azure
SQL database named DB1. The database contains a table named Employee and an identity column
named EmployeeId.
Whenever a user queries EmployeeId, you need to return a random value between 1 and 10 instead of the
EmployeeId value.
A. string
B. number
C. default
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started-
portal
QUESTION 14
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to ensure that an email notification is sent to [email protected] if a suspicious login to an
Azure SQL database named db2 is detected.
Explanation/Reference:
Explanation:
Set up Advanced Threat Protection in the Azure portal.
1. From the Azure portal navigate to the configuration page of the Azure SQL Database db2, which you
want to protect. In the security settings, select Advanced Data Security.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-threat-detection
QUESTION 15
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
Explanation/Reference:
Explanation:
2. Select Security and Advance Data Security, and Click Enable advanced Data Security Protection
20019535C3F31C49C9E768B2921390F7
3. Click the Data Discovery & Classification card.
5. In the context window that opens, select the schema > table > column that you want to classify, and the
information type and sensitivity label. Then click on the blue Add classification button at the bottom of the
context window.
20019535C3F31C49C9E768B2921390F7
6. To complete your classification and persistently label (tag) the database columns with the new
classification metadata, click on Save in the top menu of the window.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-data-discovery-and-classification
QUESTION 16
DRAG DROP
You manage the Microsoft Azure Databricks environment for a company. You must be able to access a
private Azure Blob Storage account. Data must be available to all Azure Databricks workspaces. You need
to provide the data access.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Note: To mount a Blob Storage container or a folder inside a container, use the following command:
Python
dbutils.fs.mount(
source = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
where:
dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>") gets the key that has been stored as a
secret in a secret scope.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html
QUESTION 17
DRAG DROP
A company uses Microsoft Azure SQL Database to store sensitive company data. You encrypt the data and
only allow access to specified users from specified locations.
You must monitor data usage, and data copied from the system to prevent data leakage.
You need to configure Azure SQL Database to email a specific user when data leakage occurs.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
2. Navigate to the configuration page of the Azure SQL Database server you want to protect. In the security
settings, select Advanced Data Security.
20019535C3F31C49C9E768B2921390F7
alerts upon detection of anomalous database activities.
Security alerts are triggered when anomalies in activity occur: access from an unusual location, anonymous
access, access by an unusual application, data exfiltration, unexpected delete operations, access
permission change, and so on.
Admins can view these alerts via Azure Security Center and can also choose to be notified of each of them
via email.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-threat-detection
https://www.helpnetsecurity.com/2019/04/04/microsoft-azure-security/
QUESTION 18
HOTSPOT
You develop data engineering solutions for a company. An application creates a database on Microsoft
Azure. You have the following code:
20019535C3F31C49C9E768B2921390F7
Which database and authorization types are used? To answer, select the appropriate option in the answer
area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Master keys provide access to the all the administrative resources for the database account. Master keys:
Provide access to accounts, databases, users, and permissions.
Cannot be used to provide granular access to containers and documents.
Are created during the creation of an account.
Can be regenerated at any time.
Incorrect Answers:
Resource Token: Resource tokens provide access to the application resources within a database.
References:
https://docs.microsoft.com/en-us/dotnet/api/
microsoft.azure.documents.client.documentclient.createdatabaseasync
https://docs.microsoft.com/en-us/azure/cosmos-db/secure-access-to-data
QUESTION 19
You have an Azure SQL database that contains a table named Customer. Customer contains the columns
shown in the following table.
20019535C3F31C49C9E768B2921390F7
You apply a masking rule as shown in the following table.
A. Server administrators and all users who are granted the UNMASK permission to the Customer_Email
column only.
B. All users who are granted the UNMASK permission to the Customer_Email column only.
C. Server administrators only.
D. Server administrators and all users who are granted the SELECT permission to the Customer_Email
column only.
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Grant the UNMASK permission to a user to enable them to retrieve unmasked data from the columns for
which masking is defined.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
20019535C3F31C49C9E768B2921390F7
Manage data security
Testlet 2
Background
Proseware, Inc, develops and manages a product named Poll Taker. The product is used for delivering
public opinion polling and analysis.
Polling data comes from a variety of sources, including online surveys, house-to-house interviews, and
booths at public events.
Polling data
Polling data is stored in one of the two locations:
Poll metadata
Each poll has associated metadata with information about the poll including the date and number of
respondents. The data is stored as JSON.
Phone-based polling
Security
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
Performance
After six months, raw polling data should be moved to a storage account. The storage must be available in
the event of a regional disaster. The solution must minimize costs.
Deployments
All deployments must be performed by using Azure DevOps. Deployments must use templates used in
multiple environments
No credentials or secrets should be used during deployments
Reliability
All services and processes must be resilient to a regional Azure outage.
Monitoring
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must
be monitored.
QUESTION 1
HOTSPOT
Which security technologies should you use? To answer, select the appropriate options in the answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
References:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql
20019535C3F31C49C9E768B2921390F7
Manage data security
Testlet 3
Overview
Current environment
Contoso relies on an extensive partner network for marketing, sales, and distribution. Contoso uses
external companies that manufacture everything from the actual pharmaceutical to the packaging.
The majority of the company’s data reside in Microsoft SQL Server database. Application databases fall
into one of the following tiers:
The company has a reporting infrastructure that ingests data from local databases and partner services.
Partners services consists of distributors, wholesales, and retailers across the world. The company
performs daily, weekly, and monthly reporting.
Requirements
Tier 3 and Tier 6 through Tier 8 application must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
The solution must support migrating databases that support external and internal application to Azure SQL
Database. The migrated databases will be supported by Azure Data Factory pipelines for the continued
movement, migration and updating of data both in the cloud and from local core business systems and
repositories.
Tier 7 and Tier 8 partner access must be restricted to the database only.
In addition to default Azure backup behavior, Tier 4 and 5 databases must be on a backup strategy that
performs a transaction log backup eve hour, a differential backup of databases every day and a full back
up every week.
Back up strategies must be put in place for all other standalone Azure SQL Databases using Azure SQL-
provided backup storage and capabilities.
Databases
Contoso requires their data estate to be designed and implemented in the Azure Cloud. Moving to the
cloud must not inhibit access to or availability of data.
Databases:
20019535C3F31C49C9E768B2921390F7
Tier 1 Database must implement data masking using the following masking logic:
Tier 2 databases must sync between branches and cloud databases and in the event of conflicts must be
set up for conflicts to be won by on-premises databases.
Tier 3 and Tier 6 through Tier 8 applications must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of a server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
Reporting
Security
A method of managing multiple databases in the cloud at the same time is must be implemented to
streamlining data management and limiting management access to only those requiring access.
Monitoring
Monitoring must be set up on every database. Contoso and partners must receive performance reports as
part of contractual agreements.
Tiers 6 through 8 must have unexpected resource storage usage immediately reported to data engineers.
The Azure SQL Data Warehouse cache must be monitored when the database is being used. A dashboard
monitoring key performance indicators (KPIs) indicated by traffic lights must be created and displayed
based on the following metrics:
Existing Data Protection and Security compliances require that all certificates and keys are internally
managed in an on-premises storage.
20019535C3F31C49C9E768B2921390F7
Azure Data Warehouse must be used to gather and query data from multiple internal and external
databases
Azure Data Warehouse must be optimized to use data from a cache
Reporting data aggregated for external partners must be stored in Azure Storage and be made
available during regular business hours in the connecting regions
Reporting strategies must be improved to real time or near real time reporting cadence to improve
competitiveness and the general supply chain
Tier 9 reporting must be moved to Event Hubs, queried, and persisted in the same Azure region as the
company’s main office
Tier 10 reporting data must be stored in Azure Blobs
Issues
Team members identify the following issues:
Both internal and external client application run complex joins, equality searches and group-by clauses.
Because some systems are managed externally, the queries will not be changed or optimized by
Contoso
External partner organization data formats, types and schemas are controlled by the partner companies
Internal and external database development staff resources are primarily SQL developers familiar with
the Transact-SQL language.
Size and amount of data has led to applications and reporting solutions not performing are required
speeds
Tier 7 and 8 data access is constrained to single endpoints managed by partners for access
The company maintains several legacy client applications. Data for these applications remains isolated
form other applications. This has led to hundreds of databases being provisioned on a per application
basis
QUESTION 1
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
Solution:
1. Access the Always Encrypted Wizard in SQL Server Management Studio
2. Select the column to be encrypted
3. Set the encryption type to Randomized
4. Configure the master key to use the Windows Certificate Store
5. Validate configuration results and deploy the solution
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Use the Azure Key Vault, not the Windows Certificate Store, to store the master key.
Note: The Master Key Configuration page is where you set up your CMK (Column Master Key) and select
the key store provider where the CMK will be stored. Currently, you can store a CMK in the Windows
certificate store, Azure Key Vault, or a hardware security module (HSM).
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted-azure-key-vault
QUESTION 2
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
Solution:
1. Access the Always Encrypted Wizard in SQL Server Management Studio
2. Select the column to be encrypted
3. Set the encryption type to Deterministic
4. Configure the master key to use the Windows Certificate Store
5. Validate configuration results and deploy the solution
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
Use the Azure Key Vault, not the Windows Certificate Store, to store the master key.
Note: The Master Key Configuration page is where you set up your CMK (Column Master Key) and select
the key store provider where the CMK will be stored. Currently, you can store a CMK in the Windows
certificate store, Azure Key Vault, or a hardware security module (HSM).
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted-azure-key-vault
QUESTION 3
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
Solution:
1. Access the Always Encrypted Wizard in SQL Server Management Studio
2. Select the column to be encrypted
3. Set the encryption type to Deterministic
4. Configure the master key to use the Azure Key Vault
5. Validate configuration results and deploy the solution
20019535C3F31C49C9E768B2921390F7
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
We use the Azure Key Vault, not the Windows Certificate Store, to store the master key.
Note: The Master Key Configuration page is where you set up your CMK (Column Master Key) and select
the key store provider where the CMK will be stored. Currently, you can store a CMK in the Windows
certificate store, Azure Key Vault, or a hardware security module (HSM).
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted-azure-key-vault
QUESTION 4
HOTSPOT
You need to mask tier 1 data. Which functions should you use? To answer, select the appropriate option in
the answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
A: Default
Full masking according to the data types of the designated fields.
For string data types, use XXXX or fewer Xs if the size of the field is less than 4 characters (char, nchar,
varchar, nvarchar, text, ntext).
B: email
C: Custom text
Custom StringMasking method which exposes the first and last letters and adds a custom padding string in
the middle. prefix,[padding],suffix
Tier 1 Database must implement data masking using the following masking logic:
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
QUESTION 5
DRAG DROP
You need to set up access to Azure SQL Database for Tier 7 and Tier 8 partners.
Which three actions should you perform in sequence? To answer, move the appropriate three actions from
the list of actions to the answer area and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Tier 7 and 8 data access is constrained to single endpoints managed by partners for access
Step 1: Set the Allow Azure Services to Access Server setting to Disabled
Set Allow access to Azure services to OFF for the most secure configuration.
By default, access through the SQL Database firewall is enabled for all Azure services, under Allow access
to Azure services. Choose OFF to disable access for all Azure services.
Note: The firewall pane has an ON/OFF button that is labeled Allow access to Azure services. The ON
setting allows communications from all Azure IP addresses and all Azure subnets. These Azure IPs or
subnets might not be owned by you. This ON setting is probably more open than you want your SQL
Database to be. The virtual network rule feature offers much finer granular control.
Step 3: Connect to the database and use Transact-SQL to create a database firewall rule
Database-level firewall rules can only be configured using Transact-SQL (T-SQL) statements, and only
after you've configured a server-level firewall rule.
20019535C3F31C49C9E768B2921390F7
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-security-tutorial
20019535C3F31C49C9E768B2921390F7
Manage data security
Testlet 4
Case study
Overview
ADatum Corporation is a retailer that sells products through two sales channels: retail stores and a website.
Existing Environment
ADatum has one database server that has Microsoft SQL Server 2016 installed. The server hosts three
mission-critical databases named SALESDB, DOCDB, and REPORTINGDB.
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in two
different JSON formats based on the sales channel.
REPORTINGDB stores reporting data and contains server columnstore indexes. A daily process creates
reporting data in REPORTINGDB from the data in SALESDB. The process is implemented as a SQL
Server Integration Services (SSIS) package that runs a stored procedure from SALESDB.
Requirements
Planned Changes
ADatum plans to move the current data infrastructure to Azure. The new infrastructure has the following
requirements:
Technical Requirements
The new Azure data infrastructure must meet the following technical requirements:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must
use your own key.
SALESDB must be restorable to any given minute within the past three weeks.
Real-time processing must be monitored to ensure that workloads are sized properly based on actual
usage patterns.
Missing indexes must be created automatically for REPORTINGDB.
Disk IO, CPU, and memory usage must be monitored for SALESDB.
QUESTION 1
DRAG DROP
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must use
your own key.
References:
https://docs.microsoft.com/en-us/azure/sql-database/transparent-data-encryption-byok-azure-sql-configure
20019535C3F31C49C9E768B2921390F7
Monitor and optimize data solutions
Question Set 1
QUESTION 1
SIMULATION
You need to double the available processing resources available to an Azure SQL data warehouse named
datawarehouse.
NOTE: This task might take several minutes to complete. You can perform other tasks while the
task completes or end this section of the exam.
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
SQL Data Warehouse compute resources can be scaled by increasing or decreasing data warehouse
units.
1. Click SQL data warehouses in the left page of the Azure portal.
2. Select datawarehouse from the SQL data warehouses page. The data warehouse opens.
3. Click Scale.
4. In the Scale panel, move the slider left or right to change the DWU setting. Double the DWU setting.
20019535C3F31C49C9E768B2921390F7
6. Click Save. A confirmation message appears. Click yes to confirm or no to cancel.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/quickstart-scale-compute-portal
QUESTION 2
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
A company uses Azure Data Lake Gen 1 Storage to store big data related to consumer behavior.
Solution: Configure Azure Data Lake Storage diagnostics to store logs and metrics in a storage account.
20019535C3F31C49C9E768B2921390F7
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
From the Azure Storage account that contains log data, open the Azure Storage account blade associated
with Data Lake Storage Gen1 for logging, and then click Blobs. The Blob service blade lists two containers.
References:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-diagnostic-logs
QUESTION 3
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
A company uses Azure Data Lake Gen 1 Storage to store big data related to consumer behavior.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead configure Azure Data Lake Storage diagnostics to store logs and metrics in a storage account.
References:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-diagnostic-logs
QUESTION 4
Your company uses several Azure HDInsight clusters.
The data engineering team reports several errors with some applications using these clusters.
A. Azure Automation
B. Log Analytics
C. Application Insights
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Azure Monitor logs integration. Azure Monitor logs enables data generated by multiple resources such as
HDInsight clusters, to be collected and aggregated in one place to achieve a unified monitoring experience.
As a prerequisite, you will need a Log Analytics Workspace to store the collected data. If you have not
already created one, you can follow the instructions for creating a Log Analytics Workspace.
You can then easily configure an HDInsight cluster to send many workload-specific metrics to Log
Analytics.
References:
https://azure.microsoft.com/sv-se/blog/monitoring-on-azure-hdinsight-part-2-cluster-health-and-availability/
QUESTION 5
DRAG DROP
Your company uses Microsoft Azure SQL Database configured with Elastic pools. You use Elastic
Database jobs to run queries across all databases in the pool.
You need to analyze, troubleshoot, and report on components responsible for running Elastic Database
jobs.
You need to determine the component responsible for running job service tasks.
Which components should you use for each Elastic pool job services task? To answer, drag the
appropriate component to the correct task. Each component may be used once, more than once, or not at
all. You may need to drag the split bar between panes or scroll to view content.
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
References:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-job-automation-overview
QUESTION 6
Contoso, Ltd. plans to configure existing applications to use Azure SQL Database.
Which three actions should you perform? Each correct answer presents part of the solution.
Explanation/Reference:
References:
https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-action-rules
QUESTION 7
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a container named Sales in an Azure Cosmos DB database. Sales has 120 GB of data. Each
entry in Sales has the following structure.
Users report that when they perform queries that retrieve data by ProductName, the queries take longer
than expected to complete.
You need to reduce the amount of time it takes to execute the problematic queries.
Solution: You create a lookup collection that uses ProductName as a partition key.
20019535C3F31C49C9E768B2921390F7
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
One option is to have a lookup collection “ProductName” for the mapping of “ProductName” to “OrderId”.
References:
https://azure.microsoft.com/sv-se/blog/azure-cosmos-db-partitioning-design-patterns-part-1/
QUESTION 8
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a container named Sales in an Azure Cosmos DB database. Sales has 120 GB of data. Each
entry in Sales has the following structure.
Users report that when they perform queries that retrieve data by ProductName, the queries take longer
than expected to complete.
You need to reduce the amount of time it takes to execute the problematic queries.
Solution: You create a lookup collection that uses ProductName as a partition key and OrderId as a
value.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
One option is to have a lookup collection “ProductName” for the mapping of “ProductName” to “OrderId”.
References:
https://azure.microsoft.com/sv-se/blog/azure-cosmos-db-partitioning-design-patterns-part-1/
20019535C3F31C49C9E768B2921390F7
QUESTION 9
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a container named Sales in an Azure Cosmos DB database. Sales has 120 GB of data. Each
entry in Sales has the following structure.
Users report that when they perform queries that retrieve data by ProductName, the queries take longer
than expected to complete.
You need to reduce the amount of time it takes to execute the problematic queries.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
One option is to have a lookup collection “ProductName” for the mapping of “ProductName” to “OrderId”.
References:
https://azure.microsoft.com/sv-se/blog/azure-cosmos-db-partitioning-design-patterns-part-1/
QUESTION 10
HOTSPOT
You need to periodically analyze pipeline executions from the last 60 days to identify trends in execution
durations. The solution must use Azure Log Analytics to query the data and create charts.
Which diagnostic settings should you configure in Data Factory? To answer, select the appropriate options
in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Log type: PipelineRuns
A pipeline run in Azure Data Factory defines an instance of a pipeline execution.
20019535C3F31C49C9E768B2921390F7
Save your diagnostic logs to a storage account for auditing or manual inspection. You can use the
diagnostic settings to specify the retention time in days.
References:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor
QUESTION 11
HOTSPOT
You are implementing automatic tuning mode for Azure SQL databases.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Automatic tuning options can be independently enabled or disabled per database, or they can be
configured on SQL Database servers and applied on every database that inherits settings from the server.
SQL Database servers can inherit Azure defaults for Automatic tuning settings. Azure defaults at this time
are set to FORCE_LAST_GOOD_PLAN is enabled, CREATE_INDEX is enabled, and DROP_INDEX is
disabled.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning
QUESTION 12
HOTSPOT
You need to receive an alert when Azure Synapse Analytics consumes the maximum allotted resources.
Which resource type and signal should you use to create the alert in Azure Monitor? To answer, select the
appropriate options in the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Reference:
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-insights-alerts-portal
QUESTION 13
You have an Azure SQL database that has masked columns.
You need to identify when a user attempts to infer data from the masked columns.
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Dynamic Data Masking is designed to simplify application development by limiting data exposure in a set of
pre-defined queries used by the application. While Dynamic Data Masking can also be useful to prevent
accidental exposure of sensitive data when accessing a production database directly, it is important to note
that unprivileged users with ad-hoc query permissions can apply techniques to gain access to the actual
data. If there is a need to grant such ad-hoc access, Auditing should be used to monitor all database
activity and mitigate this scenario.
References:
https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
QUESTION 14
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
A company uses Azure Data Lake Gen 1 Storage to store big data related to consumer behavior.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead configure Azure Data Lake Storage diagnostics to store logs and metrics in a storage account.
References:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-diagnostic-logs
QUESTION 15
You have an Azure data solution that contains an enterprise data warehouse in Azure Synapse Analytics
20019535C3F31C49C9E768B2921390F7
named DW1.
You need to ensure that the automated data loads have enough memory available to complete quickly and
successfully when the adhoc queries run.
A. Hash distribute the large fact tables in DW1 before performing the automated data loads.
B. Assign a larger resource class to the automated data load queries.
C. Create sampled statistics for every column in each table of DW1.
D. Assign a smaller resource class to the automated data load queries.
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
To ensure the loading user has enough memory to achieve maximum compression rates, use loading
users that are a member of a medium or large resource class.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
QUESTION 16
DRAG DROP
You plan to monitor an Azure data factory by using the Monitor & Manage app.
You need to identify the status and duration of activities that reference a table in a source database.
Which three actions should you perform in sequence? To answer, move the actions from the list of actions
to the answer are and arrange them in the correct order.
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Step 1: From the Data Factory authoring UI, generate a user property for Source on all activities.
Step 2: From the Data Factory monitoring app, add the Source user property to Activity Runs table.
You can promote any pipeline activity property as a user property so that it becomes an entity that you can
monitor. For example, you can promote the Source and Destination properties of the copy activity in your
pipeline as user properties. You can also select Auto Generate to generate the Source and Destination
user properties for a copy activity.
Step 3: From the Data Factory authoring UI, publish the pipelines
Publish output data to data stores such as Azure SQL Data Warehouse for business intelligence (BI)
applications to consume.
References:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-visually
QUESTION 17
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to generate an email notification to [email protected] if the available storage in an Azure
Cosmos DB database named cosmos10277521 is less than 100,000,000 bytes.
Explanation/Reference:
Explanation:
1. In the Azure portal, click All services, click Azure Cosmos DB, and then click the cosmos10277521
Azure Cosmos DB account.
2. In the resource menu, click Alert Rules to open the Alert rules page.
20019535C3F31C49C9E768B2921390F7
3. In the Alert rules page, click Add alert.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/monitor-accounts
QUESTION 18
You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named
20019535C3F31C49C9E768B2921390F7
Server1.
You need to verify whether the size of the transaction log file for each distribution of DW1 is smaller than
160 GB.
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The following query returns the transaction log size on each distribution. If one of the log files is reaching
160 GB, you should consider scaling up your instance or limiting your transaction size.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-manage-monitor
QUESTION 19
HOTSPOT
You need to collect application metrics, streaming query events, and application log messages for an Azure
Databrick cluster.
Which type of library and workspace should you implement? To answer, select the appropriate options in
the answer area.
Hot Area:
20019535C3F31C49C9E768B2921390F7
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You can send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the
Azure Databricks Monitoring Library, which is available on GitHub.
References:
https://docs.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs
20019535C3F31C49C9E768B2921390F7
QUESTION 20
DRAG DROP
You are implementing an Azure Blob storage account for an application that has the following
requirements:
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
20019535C3F31C49C9E768B2921390F7
Explanation/Reference:
Explanation:
Step 2: Use an Azure Resource Manager template that has a lifecycle management policy
Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage
accounts.
Step 3: Create a rule that has the rule actions of TierCool, TierToArchive, and Delete
Each rule definition includes a filter set and an action set. The filter set limits rule actions to a certain set of
objects within a container or objects names. The action set applies the tier or delete actions to the filtered
set of objects.
Incorrect Answers:
Create a rule filter
No need for a rule filter. Rule filters limit rule actions to a subset of blobs within the storage account.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts
QUESTION 21
You have an Azure Cosmos DB database that uses the SQL API.
A. soft delete
B. Low Latency Analytical Processing (LLAP)
C. schema on read
D. Time to Live (TTL)
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
With Time to Live or TTL, Azure Cosmos DB provides the ability to delete items automatically from a
container after a certain time period. By default, you can set time to live at the container level and override
the value on a per-item basis. After you set the TTL at a container or at an item level, Azure Cosmos DB
will automatically remove these items after the time period, since the time they were last modified.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live
QUESTION 22
SIMULATION
20019535C3F31C49C9E768B2921390F7
Use the following login credentials as needed:
You need to ensure that missing indexes are created automatically by Azure in db2. The solution must
apply ONLY to db2.
Explanation/Reference:
Explanation:
1. To enable automatic tuning on Azure SQL Database logical server, navigate to the server in Azure portal
and then select Automatic tuning in the menu.
20019535C3F31C49C9E768B2921390F7
2. Select database db2
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning-enable
QUESTION 23
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
A project requires the deployment of resources to Microsoft Azure for batch data processing on Azure
HDInsight. Batch processing will run daily and must:
You need to recommend a tool that will monitor clusters and provide information to suggest how to scale.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Ambari Web UI does not provide information to suggest how to scale.
Instead monitor clusters by using Azure Log Analytics and HDInsight cluster management solutions.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-oms-log-analytics-tutorial
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-manage-ambari
QUESTION 24
20019535C3F31C49C9E768B2921390F7
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
A project requires the deployment of resources to Microsoft Azure for batch data processing on Azure
HDInsight. Batch processing will run daily and must:
You need to recommend a tool that will monitor clusters and provide information to suggest how to scale.
Solution: Monitor clusters by using Azure Log Analytics and HDInsight cluster management solutions.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
HDInsight provides cluster-specific management solutions that you can add for Azure Monitor logs.
Management solutions add functionality to Azure Monitor logs, providing additional data and analysis tools.
These solutions collect important performance metrics from your HDInsight clusters and provide the tools
to search the metrics. These solutions also provide visualizations and dashboards for most cluster types
supported in HDInsight. By using the metrics that you collect with the solution, you can create custom
monitoring rules and alerts.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-oms-log-analytics-tutorial
QUESTION 25
Note: This question is a part of series of questions that present the same scenario. Each question
in the series contains a unique solution. Determine whether the solution meets the stated goals.
A project requires the deployment of resources to Microsoft Azure for batch data processing on Azure
HDInsight. Batch processing will run daily and must:
You need to recommend a tool that will monitor clusters and provide information to suggest how to scale.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Instead monitor clusters by using Azure Log Analytics and HDInsight cluster management solutions.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-oms-log-analytics-tutorial
QUESTION 26
HOTSPOT
A company is planning to use Microsoft Azure Cosmos DB as the data store for an application. You have
the following Azure CLI command:
az cosmosdb create -–name "cosmosdbdev1" –-resource-group "rgdev"
You need to minimize latency and expose the SQL API. How should you complete the command? To
answer, select the appropriate options in the answer area.
Hot Area:
Correct Answer:
20019535C3F31C49C9E768B2921390F7
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Box 1: Eventual
With Azure Cosmos DB, developers can choose from five well-defined consistency models on the
consistency spectrum. From strongest to more relaxed, the models include strong, bounded staleness,
session, consistent prefix, and eventual consistency.
The following image shows the different consistency levels as a spectrum.
Box 2: GlobalDocumentDB
Select Core(SQL) to create a document database and query by using SQL syntax.
Note: The API determines the type of account to create. Azure Cosmos DB provides five APIs: Core(SQL)
and MongoDB for document databases, Gremlin for graph databases, Azure Table, and Cassandra.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
https://docs.microsoft.com/en-us/azure/cosmos-db/create-sql-api-dotnet
QUESTION 27
A company has a Microsoft Azure HDInsight solution that uses different cluster types to process and
analyze data. Operations are continuous.
You need to determine a monitoring solution to track down the issue in the least amount of time.
20019535C3F31C49C9E768B2921390F7
What should you use?
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Ambari is the recommended tool for monitoring the health for any given HDInsight cluster.
Note: Azure HDInsight is a high-availability service that has redundant gateway nodes, head nodes, and
ZooKeeper nodes to keep your HDInsight clusters running smoothly. While this ensures that a single failure
will not affect the functionality of a cluster, you may still want to monitor cluster health so you are alerted
when an issue does arise. Monitoring cluster health refers to monitoring whether all nodes in your cluster
and the components that run on them are available and functioning correctly.
Ambari is the recommended tool for monitoring utilization across the whole cluster. The Ambari dashboard
shows easily glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS
disk usage. The specific metrics shown depend on cluster type. The “Hosts” tab shows metrics for
individual nodes so you can ensure the load on your cluster is evenly distributed.
References:
https://azure.microsoft.com/en-us/blog/monitoring-on-hdinsight-part-1-an-overview/
QUESTION 28
You have the Diagnostics settings of an Azure Storage account as shown in the following exhibit.
20019535C3F31C49C9E768B2921390F7
How long will the logging data be retained?
A. 7 days
B. 365 days
C. indefinitely
D. 90 days
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-analytics-metrics
QUESTION 29
Your company uses Azure Stream Analytics to monitor devices.
The company plans to double the number of devices that are monitored.
You need to monitor a Stream Analytics job to ensure that there are enough processing resources to
handle the additional load.
20019535C3F31C49C9E768B2921390F7
Which metric should you monitor?
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
There are a number of other resource constraints that can cause the streaming pipeline to slow down. The
watermark delay metric can rise due to:
Not enough processing resources in Stream Analytics to handle the volume of input events.
Not enough throughput within the input event brokers, so they are throttled.
Output sinks are not provisioned with enough capacity, so they are throttled. The possible solutions vary
widely based on the flavor of output service being used.
Incorrect Answers:
A: Deserialization issues are caused when the input stream of your Stream Analytics job contains
malformed messages.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-time-handling
QUESTION 30
You have an enterprise data warehouse in Azure Synapse Analytics.
You need to monitor the data warehouse to identify whether you must scale up to a higher service level to
accommodate the current workloads.
More than one answer choice may achieve the goal. Select the BEST answer.
A. CPU percentage
B. DWU used
C. DWU percentage
D. Data IO percentage
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
DWU used, defined as DWU limit * DWU percentage, represents only a high-level representation of usage
across the SQL pool and is not meant to be a comprehensive indicator of utilization. To determine whether
to scale up or down, consider all factors which can be impacted by DWU such as concurrency, memory,
tempdb, and adaptive cache capacity. We recommend running your workload at different DWU settings to
determine what works best to meet your business objectives.
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
concept-resource-utilization-query-activity
QUESTION 31
DRAG DROP
20019535C3F31C49C9E768B2921390F7
Your company analyzes images from security cameras and sends to security teams that respond to
unusual activity. The solution uses Azure Databricks.
You need to send Apache Spark level events, Spark Structured Streaming metrics, and application metrics
to Azure Monitor.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the
list of actions in the answer area and arrange them in the correct order.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
You can send application logs and metrics from Azure Databricks to a Log Analytics workspace.
Spark uses a configurable metrics system based on the Dropwizard Metrics Library.
Prerequisites: Configure your Azure Databricks cluster to use the monitoring library.
Note: The monitoring library streams Apache Spark level events and Spark Structured Streaming metrics
from your jobs to Azure Monitor.
To send application metrics from Azure Databricks application code to Azure Monitor, follow these steps:
Step 1. Build the spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR file
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/architecture/databricks-monitoring/application-logs
QUESTION 32
You manage a solution that uses Azure HDInsight clusters.
20019535C3F31C49C9E768B2921390F7
You need to implement a solution to monitor cluster performance and status.
Correct Answer: E
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Ambari is the recommended tool for monitoring utilization across the whole cluster. The Ambari dashboard
shows easily glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS
disk usage. The specific metrics shown depend on cluster type. The “Hosts” tab shows metrics for
individual nodes so you can ensure the load on your cluster is evenly distributed.
The Apache Ambari project is aimed at making Hadoop management simpler by developing software for
provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use
Hadoop management web UI backed by its RESTful APIs.
References:
https://azure.microsoft.com/en-us/blog/monitoring-on-hdinsight-part-1-an-overview/
https://ambari.apache.org/
QUESTION 33
You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses
PolyBase to load data from comma-separated value (CSV) files stored in Azure Data Lake Gen 2 using an
external table.
A. EXTERNAL TABLE access failed due to internal error: 'Java exception raised
on call to HdfsBridge_Connect: Error
[com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing
external file.'
B. EXTERNAL TABLE access failed due to internal error: 'Java exception raised
on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs]
occurred while accessing external file.'
C. Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11"
for linked server "(null)", Query aborted- the maximum reject threshold (0
rows) was reached while reading from an external source: 1 rows rejected out
of total 1 rows processed.
D. EXTERNAL TABLE access failed due to internal error: 'Java exception raised
on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass]
occurred while accessing external file.'
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
20019535C3F31C49C9E768B2921390F7
Explanation:
Customer Scenario:
SQL Server 2016 or SQL DW connected to Azure blob storage. The CREATE EXTERNAL TABLE DDL
points to a directory (and not a specific file) and the directory contains files with different schemas.
SSMS Error:
Select query on the external table gives the following error:
Msg 7320, Level 16, State 110, Line 14
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)".
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external
source: 1 rows rejected out of total 1 rows processed.
Possible Reason:
The reason this error happens is because each file has different schema. The PolyBase external table DDL
when pointed to a directory recursively reads all the files in that directory. When a column or data type
mismatch happens, this error could be seen in SSMS.
Possible Solution:
If the data for each table consists of one file, then use the filename in the LOCATION section prepended by
the directory of the external files. If there are multiple files per table, put each set of files into different
directories in Azure Blob Storage and then you can point LOCATION to the directory instead of a particular
file. The latter suggestion is the best practices recommended by SQLCAT even if you have one file per
table.
Incorrect Answers:
A: Possible Reason: Kerberos is not enabled in Hadoop Cluster.
References:
https://techcommunity.microsoft.com/t5/DataCAT/PolyBase-Setup-Errors-and-Possible-Solutions/ba-
p/305297
QUESTION 34
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
A company uses Azure Data Lake Gen 1 Storage to store big data related to consumer behavior.
A. Yes
B. No
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Instead configure Azure Data Lake Storage diagnostics to store logs and metrics in a storage account.
References:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-diagnostic-logs
QUESTION 35
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
20019535C3F31C49C9E768B2921390F7
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a container named Sales in an Azure Cosmos DB database. Sales has 120 GB of data. Each
entry in Sales has the following structure.
Users report that when they perform queries that retrieve data by ProductName, the queries take longer
than expected to complete.
You need to reduce the amount of time it takes to execute the problematic queries.
Solution: You increase the Request Units (RUs) for the database.
A. Yes
B. No
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
To scale the provisioned throughput for your application, you can increase or decrease the number of RUs
at any time.
Note: The cost of all database operations is normalized by Azure Cosmos DB and is expressed by Request
Units (or RUs, for short). You can think of RUs per second as the currency for throughput. RUs per second
is a rate-based currency. It abstracts the system resources such as CPU, IOPS, and memory that are
required to perform the database operations supported by Azure Cosmos DB.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/request-units
QUESTION 36
You are monitoring an Azure Stream Analytics job.
You discover that the Backlogged Input Events metric is increasing slowly and is consistently non-zero.
You need to ensure that the job can handle all the events.
20019535C3F31C49C9E768B2921390F7
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Backlogged Input Events: Number of input events that are backlogged. A non-zero value for this metric
implies that your job isn't able to keep up with the number of incoming events. If this value is slowly
increasing or consistently non-zero, you should scale out your job. You should increase the Streaming
Units.
Note: Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your
job.
Reference:
https://docs.microsoft.com/bs-cyrl-ba/azure/stream-analytics/stream-analytics-monitoring
QUESTION 37
SIMULATION
20019535C3F31C49C9E768B2921390F7
Your company's compliance policy states that administrators must be able to review a list of the database
object changes that occurred in an Azure SQL database named db2 during the last 100 days.
You need to modify your Azure environment to meet the compliance policy requirements.
Explanation/Reference:
Explanation:
Set up auditing for your database
The following section describes the configuration of auditing using the Azure portal.
2. Navigate to Auditing under the Security heading in your SQL database db2/server pane
20019535C3F31C49C9E768B2921390F7
3. If you prefer to enable auditing on the database level, switch Auditing to ON.
Note: By default the audit database data retention period is set to 100 days.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auditing
QUESTION 38
SIMULATION
20019535C3F31C49C9E768B2921390F7
Your company's security policy states that administrators must be able to review a list of the failed logins to
an Azure SQL database named db1 during the previous 30 days.
You need to modify your Azure environment to meet the security policy requirements.
Explanation/Reference:
Explanation:
Set up auditing for your database
The following section describes the configuration of auditing using the Azure portal.
2. Navigate to Auditing under the Security heading in your SQL database db1/server pane
20019535C3F31C49C9E768B2921390F7
3. If you prefer to enable auditing on the database level, switch Auditing to ON.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auditing
QUESTION 39
SIMULATION
20019535C3F31C49C9E768B2921390F7
You need to ensure that all REST API calls to an Azure Storage account named storage10543936 use
HTTPS only.
Explanation/Reference:
Explanation:
You can configure your storage account to accept requests from secure connections only by setting the
Secure transfer required property for the storage account.
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-require-secure-transfer
QUESTION 40
You have an Azure Stream Analytics job.
You need to ensure that the job has enough streaming units provisioned.
Which two additional metrics should you monitor? Each correct answer presents part of the solution.
A. Watermark Delay
B. Late Input Events
C. Out of order Events
D. Backlogged Input Events
E. Function Events
Correct Answer: BD
Section: (none)
Explanation
Explanation/Reference:
Explanation:
B: Late Input Events: events that arrived later than the configured late arrival tolerance window.
Note: While comparing utilization over a period of time, use event rate metrics. InputEvents and
OutputEvents metrics show how many events were read and processed.
D: In job diagram, there is a per partition backlog event metric for each input. If the backlog event metric
keeps increasing, it’s also an indicator that the system resource is constrained (either because of output
sink throttling, or high CPU).
20019535C3F31C49C9E768B2921390F7
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-scale-jobs
20019535C3F31C49C9E768B2921390F7
Monitor and optimize data solutions
Testlet 2
Background
Proseware, Inc, develops and manages a product named Poll Taker. The product is used for delivering
public opinion polling and analysis.
Polling data comes from a variety of sources, including online surveys, house-to-house interviews, and
booths at public events.
Polling data
Polling data is stored in one of the two locations:
Poll metadata
Each poll has associated metadata with information about the poll including the date and number of
respondents. The data is stored as JSON.
Phone-based polling
Security
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
Performance
After six months, raw polling data should be moved to a storage account. The storage must be available in
the event of a regional disaster. The solution must minimize costs.
Deployments
All deployments must be performed by using Azure DevOps. Deployments must use templates used in
multiple environments
No credentials or secrets should be used during deployments
Reliability
All services and processes must be resilient to a regional Azure outage.
Monitoring
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must
be monitored.
QUESTION 1
HOTSPOT
You need to ensure phone-based polling data upload reliability requirements are met. How should you
configure monitoring? To answer, select the appropriate options in the answer area.
20019535C3F31C49C9E768B2921390F7
Hot Area:
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
20019535C3F31C49C9E768B2921390F7
Explanation:
Box 1: FileCapacity
FileCapacity is the amount of storage used by the storage account’s File service in bytes.
Box 2: Avg
The aggregation type of the FileCapacity metric is Avg.
Scenario:
All services and processes must be resilient to a regional Azure outage.
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must
be monitored.
References:
https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported
20019535C3F31C49C9E768B2921390F7
Monitor and optimize data solutions
Testlet 3
Overview
Current environment
Contoso relies on an extensive partner network for marketing, sales, and distribution. Contoso uses
external companies that manufacture everything from the actual pharmaceutical to the packaging.
The majority of the company’s data reside in Microsoft SQL Server database. Application databases fall
into one of the following tiers:
The company has a reporting infrastructure that ingests data from local databases and partner services.
Partners services consists of distributors, wholesales, and retailers across the world. The company
performs daily, weekly, and monthly reporting.
Requirements
Tier 3 and Tier 6 through Tier 8 application must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
The solution must support migrating databases that support external and internal application to Azure SQL
Database. The migrated databases will be supported by Azure Data Factory pipelines for the continued
movement, migration and updating of data both in the cloud and from local core business systems and
repositories.
Tier 7 and Tier 8 partner access must be restricted to the database only.
In addition to default Azure backup behavior, Tier 4 and 5 databases must be on a backup strategy that
performs a transaction log backup eve hour, a differential backup of databases every day and a full back
up every week.
Back up strategies must be put in place for all other standalone Azure SQL Databases using Azure SQL-
provided backup storage and capabilities.
Databases
Contoso requires their data estate to be designed and implemented in the Azure Cloud. Moving to the
cloud must not inhibit access to or availability of data.
Databases:
20019535C3F31C49C9E768B2921390F7
Tier 1 Database must implement data masking using the following masking logic:
Tier 2 databases must sync between branches and cloud databases and in the event of conflicts must be
set up for conflicts to be won by on-premises databases.
Tier 3 and Tier 6 through Tier 8 applications must use database density on the same server and Elastic
pools in a cost-effective manner.
Applications must still have access to data from both internal and external applications keeping the data
encrypted and secure at rest and in transit.
A disaster recovery strategy must be implemented for Tier 3 and Tier 6 through 8 allowing for failover in the
case of a server going offline.
Selected internal applications must have the data hosted in single Microsoft Azure SQL Databases.
Reporting
Security
A method of managing multiple databases in the cloud at the same time is must be implemented to
streamlining data management and limiting management access to only those requiring access.
Monitoring
Monitoring must be set up on every database. Contoso and partners must receive performance reports as
part of contractual agreements.
Tiers 6 through 8 must have unexpected resource storage usage immediately reported to data engineers.
The Azure SQL Data Warehouse cache must be monitored when the database is being used. A dashboard
monitoring key performance indicators (KPIs) indicated by traffic lights must be created and displayed
based on the following metrics:
Existing Data Protection and Security compliances require that all certificates and keys are internally
managed in an on-premises storage.
20019535C3F31C49C9E768B2921390F7
Azure Data Warehouse must be used to gather and query data from multiple internal and external
databases
Azure Data Warehouse must be optimized to use data from a cache
Reporting data aggregated for external partners must be stored in Azure Storage and be made
available during regular business hours in the connecting regions
Reporting strategies must be improved to real time or near real time reporting cadence to improve
competitiveness and the general supply chain
Tier 9 reporting must be moved to Event Hubs, queried, and persisted in the same Azure region as the
company’s main office
Tier 10 reporting data must be stored in Azure Blobs
Issues
Team members identify the following issues:
Both internal and external client application run complex joins, equality searches and group-by clauses.
Because some systems are managed externally, the queries will not be changed or optimized by
Contoso
External partner organization data formats, types and schemas are controlled by the partner companies
Internal and external database development staff resources are primarily SQL developers familiar with
the Transact-SQL language.
Size and amount of data has led to applications and reporting solutions not performing are required
speeds
Tier 7 and 8 data access is constrained to single endpoints managed by partners for access
The company maintains several legacy client applications. Data for these applications remains isolated
form other applications. This has led to hundreds of databases being provisioned on a per application
basis
QUESTION 1
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
A. RequestSteps
B. DmsWorkers
C. SqlRequests
D. ExecRequests
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario:
The Azure SQL Data Warehouse cache must be monitored when the database is being used.
References:
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-
20019535C3F31C49C9E768B2921390F7
pdw-sql-requests-transact-sq
QUESTION 2
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some questions sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
A. extended events for average storage percentage that emails data engineers
B. an alert rule to monitor CPU percentage in databases that emails data engineers
C. an alert rule to monitor CPU percentage in elastic pools that emails data engineers
D. an alert rule to monitor storage percentage in databases that emails data engineers
E. an alert rule to monitor storage percentage in elastic pools that emails data engineers
Correct Answer: E
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario:
Tiers 6 through 8 must have unexpected resource storage usage immediately reported to data engineers.
Tier 3 and Tier 6 through Tier 8 applications must use database density on the same server and Elastic
pools in a cost-effective manner.
20019535C3F31C49C9E768B2921390F7
Monitor and optimize data solutions
Testlet 4
Case Study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on this
exam. You must manage your time to ensure that you are able to complete all questions included on this
exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in
the case study. Case studies might contain exhibits and other resources that provide more information
about the scenario that is described in the case study. Each question is independent of the other question
in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers
and to make changes before you move to the next section of the exam. After you begin a new section, you
cannot return to this section.
Overview
General Overview
Litware, Inc. is an international car racing and manufacturing company that has 1,000 employees. Most
employees are located in Europe. The company supports racing teams that complete in a worldwide racing
series.
Physical Locations
Litware has two main locations: a main office in London, England, and a manufacturing plant in Berlin,
Germany.
During each race weekend, 100 engineers set up a remote portable office by using a VPN to connect the
datacentre in the London office. The portable office is set up and torn down in approximately 20 different
countries each year.
Existing environment
Race Central
During race weekends, Litware uses a primary application named Race Central. Each car has several
sensors that send real-time telemetry data to the London datacenter. The data is used for real-time tracking
of the cars.
Race Central also sends batch updates to an application named Mechanical Workflow by using Microsoft
SQL Server Integration Services (SSIS).
The telemetry data is sent to a MongoDB database. A custom application then moves the data to
databases in SQL Server 2017. The telemetry data in MongoDB has more than 500 attributes. The
application changes the attribute names when the data is moved to SQL Server 2017.
Mechanical Workflow
Mechanical Workflow is used to track changes and improvements made to the cars during their lifetime.
20019535C3F31C49C9E768B2921390F7
Currently, Mechanical Workflow runs on SQL Server 2017 as an OLAP system.
Mechanical Workflow has a table named Table1 that is 1 TB. Large aggregations are performed on a
single column of Table1.
Requirements
Planned Changes
Litware is in the process of rearchitecting its data estate to be hosted in Azure. The company plans to
decommission the London datacentre and move all its applications to an Azure datacenter.
Technical Requirements
Data collection for Race Central must be moved to Azure Cosmos DB and Azure SQL Database. The
data must be written to the Azure datacenter closest to each race and must converge in the least
amount of time.
The query performance of Race Central must be stable, and the administrative time it takes to perform
optimizations must be minimized.
The database for Mechanical Workflow must be moved to Azure SQL Data Warehouse.
Transparent data encryption (TDE) must be enabled on all data stores, whenever possible.
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for
Race Central. If the data load takes longer than 20 minutes, configuration changes must be made to
Data Factory.
The telemetry data must migrate toward a solution that is native to Azure.
The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB
Request Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the RU/s.
During race weekends, visitors will be able to enter the remote portable offices. Litware is concerned that
some proprietary information might be exposed. The company identifies the following data masking
requirements for the Race Central data that will be stored in SQL Database:
Only show the last four digits of the values in a column named SuspensionSprings.
Only show a zero value for the values in a column named ShockOilWeight.
QUESTION 1
You are monitoring the Data Factory pipeline that runs from Cosmos DB to SQL Database for Race
Central.
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Performance tuning tips and optimization features. In some cases, when you run a copy activity in Azure
Data Factory, you see a "Performance tuning tips" message on top of the copy activity monitoring, as
shown in the following example. The message tells you the bottleneck that was identified for the given copy
20019535C3F31C49C9E768B2921390F7
run. It also guides you on what to change to boost copy throughput. The performance tuning tips currently
provide suggestions like:
Use PolyBase when you copy data into Azure SQL Data Warehouse.
Increase Azure Cosmos DB Request Units or Azure SQL Database DTUs (Database Throughput Units)
when the resource on the data store side is the bottleneck.
Remove the unnecessary staged copy.
References:
https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance
QUESTION 2
What should you implement to optimize SQL Database for Race Central to meet the technical
requirements?
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario: The query performance of Race Central must be stable, and the administrative time it takes to
perform optimizations must be minimized.
sp_updatestats updates query optimization statistics on a table or indexed view. By default, the query
optimizer already updates statistics as necessary to improve the query plan; in some cases you can
improve query performance by using UPDATE STATISTICS or the stored procedure sp_updatestats to
update statistics more frequently than the default updates.
Incorrect Answers:
D: dbcc checkdchecks the logical and physical integrity of all the objects in the specified database
References:
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-updatestats-
transact-sql?view=sql-server-ver15
QUESTION 3
Which two metrics should you use to identify the appropriate RU/s for the telemetry data? Each correct
answer presents part of the solution.
A. Number of requests
B. Number of requests exceeded capacity
C. End to end observed read latency at the 99th percentile
D. Session consistency
E. Data + Index storage consumed
F. Avg Throughput/s
Correct Answer: AE
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scenario: The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB
20019535C3F31C49C9E768B2921390F7
Request Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the RU/s.
With Azure Cosmos DB, you pay for the throughput you provision and the storage you consume on an
hourly basis.
While you estimate the number of RUs per second to provision, consider the following factors:
Item size: As the size of an item increases, the number of RUs consumed to read or write the item also
increases.
20019535C3F31C49C9E768B2921390F7
Monitor and optimize data solutions
Testlet 5
Case study
Overview
ADatum Corporation is a retailer that sells products through two sales channels: retail stores and a website.
Existing Environment
ADatum has one database server that has Microsoft SQL Server 2016 installed. The server hosts three
mission-critical databases named SALESDB, DOCDB, and REPORTINGDB.
DOCDB stored documents that connect to the sales data in SALESDB. The documents are stored in two
different JSON formats based on the sales channel.
REPORTINGDB stores reporting data and contains server columnstore indexes. A daily process creates
reporting data in REPORTINGDB from the data in SALESDB. The process is implemented as a SQL
Server Integration Services (SSIS) package that runs a stored procedure from SALESDB.
Requirements
Planned Changes
ADatum plans to move the current data infrastructure to Azure. The new infrastructure has the following
requirements:
Technical Requirements
The new Azure data infrastructure must meet the following technical requirements:
Data in SALESDB must encrypted by using Transparent Data Encryption (TDE). The encryption must
use your own key.
SALESDB must be restorable to any given minute within the past three weeks.
Real-time processing must be monitored to ensure that workloads are sized properly based on actual
usage patterns.
Missing indexes must be created automatically for REPORTINGDB.
Disk IO, CPU, and memory usage must be monitored for SALESDB.
QUESTION 1
How should you monitor SALESDB to meet the technical requirements?
Correct Answer: A
Section: (none)
20019535C3F31C49C9E768B2921390F7
Explanation
Explanation/Reference:
Explanation:
Scenario: Disk IO, CPU, and memory usage must be monitored for SALESDB
The sys.resource_stats returns historical data for CPU, IO, DTU consumption. There’s one row every 5
minute for a database in an Azure logical SQL Server if there’s a change in the metrics.
Incorrect Answers:
B: Query Performance Insight helps you to quickly identify what your longest running queries are, how they
change over time, and what waits are affecting them.
C: sys.dm_os_wait_stats: specific types of wait times during query execution can indicate bottlenecks or
stall points within the query. Similarly, high wait times, or wait counts server wide can indicate bottlenecks
or hot spots in interaction query interactions within the server instance. For example, lock waits indicate
data contention by queries; page IO latch waits indicate slow IO response times; page latch update waits
indicate incorrect file layout.
References:
https://dataplatformlabs.com/monitoring-azure-sql-database-with-sys-resource_stats/
QUESTION 2
You need to ensure that the missing indexes for REPORTINGDB are added.
Correct Answer: D
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Automatic tuning options include create index, which identifies indexes that may improve performance of
your workload, creates indexes, and automatically verifies that performance of queries has improved.
Scenario:
REPORTINGDB stores reporting data and contains server columnstore indexes.
Migrate SALESDB and REPORTINGDB to an Azure SQL database.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning
QUESTION 3
Which counter should you monitor for real-time processing to meet the technical requirements?
A. Concurrent users
B. SU% Utilization
C. Data Conversion Errors
D. CPU % utilization
Correct Answer: B
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
Scenario:
Real-time processing must be monitored to ensure that workloads are sized properly based on actual
usage patterns.
The sales data including the documents in JSON format, must be gathered as it arrives and analyzed
online by using Azure Stream Analytics.
Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream
Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your
job. This capacity lets you focus on the query logic and abstracts the need to manage the hardware to run
your Stream Analytics job in a timely manner.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption
20019535C3F31C49C9E768B2921390F7
Manage and troubleshoot Azure data solutions
Question Set 1
QUESTION 1
You manage a process that performs analysis of daily web traffic logs on an HDInsight cluster. Each of the
250 web servers generates approximately 10 megabytes (MB) of log data each day. All log data is stored in
a single folder in Microsoft Azure Data Lake Storage Gen 2.
Which two changes should you make? Each correct answer presents a complete solution.
A. Combine the daily log files for all servers into one file
B. Increase the value of the mapreduce.map.memory parameter
C. Move the log files into folders so that each day’s logs are in their own folder
D. Increase the number of worker nodes
E. Increase the value of the hive.tez.container.size parameter
Correct Answer: AC
Section: (none)
Explanation
Explanation/Reference:
Explanation:
A: Typically, analytics engines such as HDInsight and Azure Data Lake Analytics have a per-file overhead.
If you store your data as many small files, this can negatively affect performance. In general, organize your
data into larger sized files for better performance (256MB to 100GB in size). Some engines and
applications might have trouble efficiently processing files that are greater than 100GB in size.
C: For Hive workloads, partition pruning of time-series data can help some queries read only a subset of
the data which improves performance.
Those pipelines that ingest time-series data, often place their files with a very structured naming for files
and folders. Below is a very common example we see for data that is structured by date:
\DataSet\YYYY\MM\DD\datafile_YYYY_MM_DD.tsv
Notice that the datetime information appears both as folders and in the filename.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-performance-tuning-guidance
QUESTION 2
DRAG DROP
A company builds an application to allow developers to share and compare code. The conversations, code
snippets, and links shared by people in the application are stored in a Microsoft Azure SQL Database
instance. The application allows for searches of historical conversations and code snippets.
When users share code snippets, the code snippet is compared against previously share code snippets by
using a combination of Transact-SQL functions including SUBSTRING, FIRST_VALUE, and SQRT. If a
match is found, a link to the match is added to the conversation.
Which technologies should you use? To answer, drag the appropriate technologies to the correct issues.
20019535C3F31C49C9E768B2921390F7
Each technology may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content.
Correct Answer:
Section: (none)
Explanation
Explanation/Reference:
Explanation:
20019535C3F31C49C9E768B2921390F7
data or the data store.
These materialized views, which only contain data required by a query, allow applications to quickly obtain
the information they need. In addition to joining tables or combining data entities, materialized views can
include the current values of calculated columns or data items, the results of combining values or executing
transformations on the data items, and values specified as part of the query. A materialized view can even
be optimized for just a single query.
References:
https://docs.microsoft.com/en-us/azure/architecture/patterns/materialized-view
QUESTION 3
You implement an Azure SQL Data Warehouse instance.
You plan to migrate the largest fact table to Azure Synapse Analytics. The table resides on Microsoft SQL
Server on-premises and is in 10 terabytes (TB) in size.
Incoming queries use the primary key Sale Key column to retrieve data as displayed in the following table:
You need to distribute the large fact table across multiple nodes to optimize performance of the table.
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Hash-distributed tables improve query performance on large fact tables.
Columnstore indexes can achieve up to 100x better performance on analytics and data warehousing
workloads and up to 10x better data compression than traditional rowstore indexes.
Incorrect Answers:
D, E: Round-robin tables are useful for improving loading speed.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance
QUESTION 4
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance
changes for infrequently used queries.
20019535C3F31C49C9E768B2921390F7
You need to monitor resource utilization to determine the source of the performance issues.
Correct Answer: A
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The Azure Synapse Analytics storage architecture automatically tiers your most frequently queried
columnstore segments in a cache residing on NVMe based SSDs designed for Gen2 data warehouses.
Greater performance is realized when your queries retrieve segments that are residing in the cache. You
can monitor and troubleshoot slow query performance by determining whether your workload is optimally
leveraging the Gen2 cache.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache
https://docs.microsoft.com/bs-latn-ba/azure/sql-data-warehouse/sql-data-warehouse-concept-resource-
utilization-query-activity
QUESTION 5
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance
changes for infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Correct Answer: C
Section: (none)
Explanation
Explanation/Reference:
Explanation:
The Azure Synapse Analytics storage architecture automatically tiers your most frequently queried
columnstore segments in a cache residing on NVMe based SSDs designed for Gen2 data warehouses.
Greater performance is realized when your queries retrieve segments that are residing in the cache. You
can monitor and troubleshoot slow query performance by determining whether your workload is optimally
leveraging the Gen2 cache.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache
20019535C3F31C49C9E768B2921390F7
https://docs.microsoft.com/bs-latn-ba/azure/sql-data-warehouse/sql-data-warehouse-concept-resource-
utilization-query-activity
QUESTION 6
A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses
Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. The cloud job
is configured to use 120 Streaming Units (SU).
You need to optimize performance for the Azure Stream Analytics job.
Which two actions should you perform? Each correct answer presents part of the solution.
Correct Answer: BF
Section: (none)
Explanation
Explanation/Reference:
Explanation:
Scale out the query by allowing the system to process each input partition separately.
F: A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the
data stream from.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
20019535C3F31C49C9E768B2921390F7