DP 100
DP 100
• Exam List
• Blog
• Logout
• Profile
Answer : CE
Explanation:
C: Make sure your Windows system supports Hardware Virtualization Technology and that
virtualization is enabled.
Ensure that hardware virtualization support is turned on in the BIOS settings. For example:
E: To run Docker, your machine must have a 64-bit operating system running Windows 7 or
higher.
Reference:
https://docs.docker.com/toolbox/toolbox_install_windows/
https://blogs.technet.microsoft.com/canitpro/2015/09/08/step-by-step-enabling-hyper-v-for-
use-on-windows-10/
Next Question
Question 2 ( Question Set 1 )
Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
✑ support Python and Scala
✑ compose data storage, movement, and processing services into automated data pipelines
✑ the same tool should be used for the orchestration of both data engineering and data
science
✑ support workload isolation and interactive workloads
✑ enable scaling across a cluster of machines
You need to create the environment.
What should you do?
• A. Build the environment in Apache Hive for HDInsight and use Azure Data Factory for
orchestration.
• B. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
• C. Build the environment in Apache Spark for HDInsight and use Azure Container Instances
for orchestration.
• D. Build the environment in Azure Databricks and use Azure Container Instances for
orchestration.
Answer : B
Explanation:
In Azure Databricks, we can create two different types of clusters.
✑ Standard, these are the default clusters and can be used with Python, R, Scala and SQL
✑ High-concurrency
Azure Databricks is fully integrated with Azure Data Factory.
Incorrect Answers:
D: Azure Container Instances is good for development or testing. Not suitable for production
workloads.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-
science-and-machine-learning
Next Question
Question 3 ( Question Set 1 )
DRAG DROP -
You are building an intelligent solution using machine learning models.
The environment must support the following requirements:
✑ Data scientists must build notebooks in a cloud environment
✑ Data scientists must use automatic feature engineering and model building in machine
learning pipelines.
✑ Notebooks must be deployed to retrain using Spark instances with dynamic worker
allocation.
✑ Notebooks must be exportable to be version controlled locally.
You need to create the environment.
Which four actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Create an Azure HDInsight cluster to include the Apache Spark Mlib library
Step 2: Install Microsot Machine Learning for Apache Spark
You install AzureML on your Azure HDInsight cluster.
Microsoft Machine Learning for Apache Spark (MMLSpark) provides a number of deep learning
and data science tools for Apache Spark, including seamless integration of Spark Machine
Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly
create powerful, highly-scalable predictive and analytical models for large image and text
datasets.
Step 3: Create and execute the Zeppelin notebooks on the cluster
Step 4: When the cluster is ready, export Zeppelin notebooks to a local environment.
Notebooks must be exportable to be version controlled locally.
Reference:
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-zeppelin-notebook
https://azuremlbuild.blob.core.windows.net/pysparkapi/intro.html
Next Question
Question 4 ( Question Set 1 )
You plan to build a team data science environment. Data for training models in machine
learning pipelines will be over 20 GB in size.
You have the following requirements:
✑ Models must be built using Caffe2 or Chainer frameworks.
✑ Data scientists must be able to use a data science environment to build the machine learning
pipelines and train models on their personal devices in both connected and disconnected
network environments.
Personal devices must support updating machine learning pipelines when connected to a
network.
You need to select a data science environment.
Which environment should you use?
Answer : A
Explanation:
The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft's Azure cloud
built specifically for doing data science. Caffe2 and Chainer are supported by DSVM.
DSVM integrates with Azure Machine Learning.
Incorrect Answers:
B: Use Machine Learning Studio when you want to experiment with machine learning models
quickly and easily, and the built-in machine learning algorithms are sufficient for your solutions.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Question 5 ( Question Set 1 )
You are implementing a machine learning model to predict stock prices.
The model uses a PostgreSQL database and requires GPU processing.
You need to create a virtual machine that is pre-configured with the required tools.
What should you do?
Answer : A
Explanation:
In the DSVM, your training models can use deep learning algorithms on hardware that's based
on graphics processing units (GPUs).
PostgreSQL is available for the following operating systems: Linux (all recent distributions), 64-
bit installers available for macOS (OS X) version 10.6 and newer ג€"
Windows (with installers available for 64-bit version; tested on latest versions and back to
Windows 2012 R2.
Incorrect Answers:
B: The Azure Geo AI Data Science VM (Geo-DSVM) delivers geospatial analytics capabilities from
Microsoft's Data Science VM. Specifically, this VM extends the
AI and data science toolkits in the Data Science VM by adding ESRI's market-leading ArcGIS Pro
Geographic Information System.
C, D: DLVM is a template on top of DSVM image. In terms of the packages, GPU drivers etc are
all there in the DSVM image. Mostly it is for convenience during creation where we only allow
DLVM to be created on GPU VM instances on Azure.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Page: 1 / 56
Total 282 questions
Next Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : A
Explanation:
Azure Cognitive Services expand on Microsoft's evolving portfolio of machine learning APIs and
enable developers to easily add cognitive features ג€" such as emotion and video detection;
facial, speech, and vision recognition; and speech and language understanding ג€" into their
applications. The goal of Azure Cognitive
Services is to help developers create applications that can see, hear, speak, understand, and
even begin to reason. The catalog of services within Azure Cognitive
Services can be categorized into five main pillars - Vision, Speech, Language, Search, and
Knowledge.
Reference:
https://docs.microsoft.com/en-us/azure/cognitive-services/welcome
Next Question
Question 7 ( Question Set 1 )
You must store data in Azure Blob Storage to support Azure Machine Learning.
You need to transfer the data into Azure Blob Storage.
What are three possible ways to achieve the goal? Each correct answer presents a complete
solution.
NOTE: Each correct selection is worth one point.
Answer : BCD
Explanation:
You can move data to and from Azure Blob storage using different technologies:
✑ Azure Storage-Explorer
✑ AzCopy
✑ Python
✑ SSIS
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-
azure-blob
Next Question
Question 8 ( Question Set 1 )
You are moving a large dataset from Azure Machine Learning Studio to a Weka environment.
You need to format the data for the Weka environment.
Which module should you use?
• A. Convert to CSV
• B. Convert to Dataset
• C. Convert to ARFF
• D. Convert to SVMLight
Answer : C
Explanation:
Use the Convert to ARFF module in Azure Machine Learning Studio, to convert datasets and
results in Azure Machine Learning to the attribute-relation file format used by the Weka toolset.
This format is known as ARFF.
The ARFF data specification for Weka supports multiple machine learning tasks, including data
preprocessing, classification, and feature selection. In this format, data is organized by entites
and their attributes, and is contained in a single text file.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-
arff
Next Question
Question 9 ( Question Set 1 )
You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the
Data Science Virtual Machine (DSVM).
What should you recommend?
• A. Rattle
• B. TensorFlow
• C. Weka
• D. Scikit-learn
Answer : B
Explanation:
TensorFlow is an open-source library for numerical computation and large-scale machine
learning. It uses Python to provide a convenient front-end API for building applications with the
framework
TensorFlow can train and run deep neural networks for handwritten digit classification, image
recognition, word embeddings, recurrent neural networks, sequence- to-sequence models for
machine translation, natural language processing, and PDE (partial differential equation) based
simulations.
Incorrect Answers:
A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.
C: Weka is used for visual data mining and machine learning software in Java.
D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy,
SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and
statistical modeling including classification, regression, clustering and dimensionality reduction.
Reference:
https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-
explained.html
Next Question
Question 10 ( Question Set 1 )
You plan to use a Deep Learning Virtual Machine (DLVM) to train deep learning models using
Compute Unified Device Architecture (CUDA) computations.
You need to configure the DLVM to support CUDA.
What should you implement?
Answer : C
Explanation:
A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU
instances.
Reference:
https://azuremarketplace.microsoft.com/en-au/marketplace/apps/microsoft-ads.dsvm-deep-
learning
Next Question
Page: 2 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : E
Explanation:
Caffe2 and PyTorch is supported by Data Science Virtual Machine for Linux.
Microsoft offers Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4.
Only the DSVM on Ubuntu is preconfigured for Caffe2 and PyTorch.
Incorrect Answers:
D: Caffe2 and PytOCH are only supported in the Data Science Virtual Machine for Linux.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Question 12 ( Question Set 1 )
HOTSPOT -
You are performing sentiment analysis using a CSV file that includes 12,000 customer reviews
written in a short sentence format. You add the CSV file to Azure
Machine Learning Studio and configure it as the starting point dataset of an experiment. You
add the Extract N-Gram Features from Text module to the experiment to extract key phrases
from the customer review column in the dataset.
You must create a new n-gram dictionary from the customer review text and set the maximum
n-gram size to trigrams.
What should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Next Question
Question 13 ( Question Set 1 )
You are developing a data science workspace that uses an Azure Machine Learning service.
You need to select a compute target to deploy the workspace.
What should you use?
Answer : C
Explanation:
Azure Container Instances can be used as compute target for testing or development. Use for
low-scale CPU-based workloads that require less than 48 GB of
RAM.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where
Next Question
Question 14 ( Question Set 1 )
You are solving a classification task.
The dataset is imbalanced.
You need to select an Azure Machine Learning Studio module to improve the classification
accuracy.
Which module should you use?
Answer : D
Explanation:
Use the SMOTE module in Azure Machine Learning Studio (classic) to increase the number of
underrepresented cases in a dataset used for machine learning.
SMOTE is a better way of increasing the number of rare cases than simply duplicating existing
cases.
You connect the SMOTE module to a dataset that is imbalanced. There are many reasons why a
dataset might be imbalanced: the category you are targeting might be very rare in the
population, or the data might simply be difficult to collect. Typically, you use SMOTE when the
class you want to analyze is under- represented.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
Next Question
Question 15 ( Question Set 1 )
DRAG DROP -
You configure a Deep Learning Virtual Machine for Windows.
You need to recommend tools and frameworks to perform the following:
✑ Build deep neural network (DNN) models
✑ Perform interactive data exploration and visualization
Which tools and frameworks should you recommend? To answer, drag the appropriate tools to
the correct tasks. Each tool may be used once, more than once, or not at all. You may need to
drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Next Question
Page: 3 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• DMCA & LEGAL
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : C
Explanation:
Partition and Sample with the Stratified split option outputs multiple datasets, partitioned using
the rules you specified.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-
and-sample
Next Question
Question 17 ( Question Set 1 )
DRAG DROP -
You are creating an experiment by using Azure Machine Learning Studio.
You must divide the data into four subsets for evaluation. There is a high degree of missing
values in the data. You must prepare the data for analysis.
You need to select appropriate methods for producing the experiment.
Which three modules should you run in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the
correct orders you select.
Select and Place:
Answer :
Explanation:
The Clean Missing Data module in Azure Machine Learning Studio, to remove, replace, or infer
missing values.
Incorrect Answers:
✑ Latent Direchlet Transformation: Latent Dirichlet Allocation module in Azure Machine
Learning Studio, to group otherwise unclassified text into a number of categories. Latent
Dirichlet Allocation (LDA) is often used in natural language processing (NLP) to find texts that
are similar. Another common term is topic modeling.
✑ Build Counting Transform: Build Counting Transform module in Azure Machine Learning
Studio, to analyze training data. From this data, the module builds a count table as well as a set
of count-based features that can be used in a predictive model.
Missing Value Scrubber: The Missing Values Scrubber module is deprecated.
✑ Feature hashing: Feature hashing is used for linguistics, and works by converting unique
tokens into integers.
✑ Replace discrete values: the Replace Discrete Values module in Azure Machine Learning
Studio is used to generate a probability score that can be used to represent a discrete value.
This score can be useful for understanding the information value of the discrete values.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 18 ( Question Set 1 )
HOTSPOT -
You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based
on the system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Sampling -
Box 2: 0 -
3. Rate of sampling. Random seed for sampling: Optionally, type an integer to use as a seed
value.
This option is important if you want the rows to be divided the same way every time. The default
value is 0, meaning that a starting seed is generated based on the system clock. This can lead to
slightly different results each time you run the experiment.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-
and-sample
Next Question
Question 19 ( Question Set 1 )
You are creating a machine learning model. You have a dataset that contains null rows.
You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify
and resolve the null and missing data in the dataset.
Which parameter should you use?
Answer : C
Explanation:
Remove entire row: Completely removes any row in the dataset that has one or more missing
values. This is useful if the missing value can be considered randomly missing.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 20 ( Question Set 1 )
HOTSPOT -
The finance team asks you to train a model using data in an Azure Storage blob container
named finance-data.
You need to register the container as a datastore in an Azure Machine Learning workspace and
ensure that an error will be raised if the container does not exist.
How should you complete the code? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: register_azure_blob_container
Register an Azure Blob Container to the datastore.
Box 2: create_if_not_exists = False
Create the file share if it does not exist, defaults to False.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore
Next Question
Page: 4 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : ABD
Incorrect Answers:
C, E: The UI is included the Enterprise edition only.
Reference:
https://azure.microsoft.com/en-us/pricing/details/machine-learning/
Next Question
Question 22 ( Question Set 1 )
HOTSPOT -
A coworker registers a datastore in a Machine Learning services workspace by using the
following code:
Explanation:
Box 1: DataStore -
To get a specific datastore registered in the current workspace, use the get() static method on
the Datastore class:
# Get a named datastore from the current workspace
datastore = Datastore.get(ws, datastore_name='your datastore name')
Box 2: ws -
Box 3: demo_datastore -
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data
Next Question
Question 23 ( Question Set 1 )
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv.
Each file is stored in a folder that indicates the month and year when the data was recorded. The
folders are in an Azure blob container for which a datastore has been defined in an Azure
Machine Learning workspace. The folders are organized in a parent folder named sales to create
the following hierarchical structure:
At the end of each month, a new folder with that month's sales file is added to the sales folder.
You plan to use the sales data to train a machine learning model based on the following
requirements:
✑ You must define a dataset that loads all of the sales data to date into a structure that can be
easily converted to a dataframe.
✑ You must be able to create experiments that use only data that was created before a specific
previous month, ignoring any data that was added after that month.
✑ You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace.
What should you do?
• A. Create a tabular dataset that references the datastore and explicitly specifies each
'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name
sales_dataset each month, replacing the existing dataset and specifying a tag named
month indicating the month and year it was registered. Use this dataset for all experiments.
• B. Create a tabular dataset that references the datastore and specifies the path
'sales/*/sales.csv', register the dataset with the name sales_dataset and a tag named month
indicating the month and year it was registered, and use this dataset for all experiments.
• C. Create a new tabular dataset that references the datastore and explicitly specifies each
'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name
sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month
and year. Use the appropriate month-specific dataset for experiments.
• D. Create a tabular dataset that references the datastore and explicitly specifies each
'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month
as a new version and with a tag named month indicating the month and year it was
registered. Use this dataset for all experiments, identifying the version to be used based on
the month tag as necessary.
Answer : B
Explanation:
Specify the path.
Example:
The following code gets the workspace existing workspace and the desired datastore by name.
And then passes the datastore and file locations to the path parameter to create a new
TabularDataset, weather_ds. from azureml.core import Workspace, Datastore, Dataset
datastore_name = 'your datastore name'
# get existing workspace
workspace = Workspace.from_config()
# retrieve an existing datastore in the workspace by name
datastore = Datastore.get(workspace, datastore_name)
# create a TabularDataset from 3 file paths in datastore
datastore_paths = [(datastore, 'weather/2018/11.csv'),
(datastore, 'weather/2018/12.csv'),
(datastore, 'weather/2019/*.csv')]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
Next Question
Question 24 ( Question Set 1 )
DRAG DROP -
An organization uses Azure Machine Learning service and wants to expand their use of machine
learning.
You have the following compute environments. The organization does not want to create
another compute environment.
You need to determine which compute environment to use for the following scenarios.
Which compute types should you use? To answer, drag the appropriate compute environments
to the correct scenarios. Each compute environment may be used once, more than once, or not
at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Box 1: nb_server -
Box 2: mlc_cluster -
With Azure Machine Learning, you can train your model on a variety of resources or
environments, collectively referred to as compute targets. A compute target can be a local
machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDInsight or a
remote virtual machine.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets
Next Question
Question 25 ( Question Set 1 )
HOTSPOT -
You create an Azure Machine Learning compute target named ComputeOne by using the
STANDARD_D1 virtual machine image.
ComputeOne is currently idle and has zero active nodes.
You define a Python variable named ws that references the Azure Machine Learning workspace.
You run the following Python code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Yes -
ComputeTargetException class: An exception related to failures when creating, interacting with,
or configuring a compute target. This exception is commonly raised for failures attaching a
compute target, missing headers, and unsupported configuration values.
Create(workspace, name, provisioning_configuration)
Provision a Compute object by specifying a compute type and related configuration.
This method creates a new compute target rather than attaching an existing one.
Box 2: Yes -
Box 3: No -
The line before print('Step1') will fail.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-
core/azureml.core.compute.computetarget
Next Question
Page: 5 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
Explanation:
CUDA is a parallel computing platform and programming model developed by Nvidia for
general computing on its own GPUs (graphics processing units). CUDA enables developers to
speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable
part of the computation.
Reference:
https://www.infoworld.com/article/3299703/what-is-cuda-parallel-programming-for-gpus.html
Next Question
Question 27 ( Question Set 1 )
DRAG DROP -
You are analyzing a raw dataset that requires cleaning.
You must perform transformations and manipulations by using Azure Machine Learning Studio.
You need to identify the correct modules to perform the transformations.
Which modules should you choose? To answer, drag the appropriate modules to the correct
scenarios. Each module may be used once, more than once, or not at all.
You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Next Question
Question 28 ( Question Set 1 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are using Azure Machine Learning Studio to perform feature engineering on a dataset.
You need to normalize values to produce a feature column grouped into bins.
Solution: Apply an Entropy Minimum Description Length (MDL) binning mode.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
Entropy MDL binning mode: This method requires that you select the column you want to
predict and the column or columns that you want to group into bins. It then makes a pass over
the data and attempts to determine the number of bins that minimizes the entropy. In other
words, it chooses a number of bins that allows the data column to best predict the target
column. It then returns the bin number associated with each row of your data in a column
named <colname>quantized.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-
data-into-bins
Next Question
Question 29 ( Question Set 1 )
HOTSPOT -
You are preparing to use the Azure ML SDK to run an experiment and need to create compute.
You run the following code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: No -
If a compute cluster already exists it will be used.
Box 2: Yes -
The wait_for_completion method waits for the current provisioning operation to finish on the
cluster.
Box 3: Yes -
Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-
empted.
Box 4: No -
Need to use training_compute.delete() to deprovision and delete the AmlCompute target.
Reference:
https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-
azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb
https://docs.microsoft.com/en-us/python/api/azureml-
core/azureml.core.compute.computetarget
Next Question
Question 30 ( Question Set 1 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply a Quantiles normalization with a QuantileIndex normalization.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Use the Entropy MDL binning mode which has a target column.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-
data-into-bins
Next Question
Page: 6 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Designing and Implementing a Data Science Solution
on Azure (beta) v1.0 (DP-100)
Page: 7 / 56
Total 282 questions
• A. Yes
• B. No
• A. Yes
• B. No
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Yes
• B. No
Answer : A
Explanation:
Replace using MICE: For each missing value, this option assigns a new value, which is calculated
by using a method described in the statistical literature as
"Multivariate Imputation using Chained Equations" or "Multiple Imputation by Chained
Equations". With a multiple imputation method, each variable with missing data is modeled
conditionally using the other variables in the data before filling in the missing values.
Note: Multivariate imputation by chained equations (MICE), sometimes called ג€fully conditional
specificationג€ or ג€sequential regression multiple imputationג€ has emerged in the statistical
literature as one principled method of addressing missing data. Creating multiple imputations,
as opposed to single imputations, accounts for the statistical uncertainty in the imputations. In
addition, the chained equations approach is very flexible and can handle variables of varying
types
(e.g., continuous or binary) as well as complexities such as bounds or survey skip patterns.
Reference:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 37 ( Question Set 1 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the
dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Remove the entire column that contains the missing data point.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Use the Multiple Imputation by Chained Equations (MICE) method.
Reference:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 38 ( Question Set 1 )
You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset
that has missing values in many columns. The data does not require the application of
predictors for each column. You plan to use the Clean Missing Data.
You need to select a data cleaning method.
Which method should you use?
Answer : A
Explanation:
Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using
Chained Equations (MICE), this option has the advantage of not requiring the application of
predictors for each column. Instead, it approximates the covariance for the full dataset.
Therefore, it might offer better performance for datasets that have missing values in many
columns.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 39 ( Question Set 1 )
You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.
Which module should you use?
• A. Split Data
• B. Load Trained Model
• C. Assign Data to Clusters
• D. Group Data into Bins
Answer : D
Explanation:
The Group Data into Bins module supports multiple options for binning data. You can customize
how the bin edges are set and how values are apportioned into the bins.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-
data-into-bins
Next Question
Question 40 ( Question Set 1 )
You are a lead data scientist for a project that tracks the health and migration of birds. You
create a multi-class image classification deep learning model that uses a set of labeled bird
photographs collected by experts.
You have 100,000 photographs of birds. All photographs use the JPG format and are stored in
an Azure blob container in an Azure subscription.
You need to access the bird photograph files in the Azure blob container from the Azure
Machine Learning service workspace that will be used for deep learning model training. You
must minimize data movement.
What should you do?
• A. Create an Azure Data Lake store and move the bird photographs to the store.
• B. Create an Azure Cosmos DB database and attach the Azure Blob containing bird
photographs storage to the database.
• C. Create and register a dataset by using TabularDataset class that references the Azure
blob storage containing bird photographs.
• D. Register the Azure blob storage containing the bird photographs as a datastore in Azure
Machine Learning service.
• E. Copy the bird photographs to the blob datastore that was created with your Azure
Machine Learning service workspace.
Answer : D
Explanation:
We recommend creating a datastore for an Azure Blob container. When you create a workspace,
an Azure blob container and an Azure file share are automatically registered to the workspace.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data
Next Question
Page: 8 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Designing and Implementing a Data Science Solution
on Azure (beta) v1.0 (DP-100)
Page: 9 / 56
Total 282 questions
• A. Yes
• B. No
Answer : B
Explanation:
Use the Multiple Imputation by Chained Equations (MICE) method.
Reference:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 42 ( Question Set 1 )
You create an Azure Machine Learning workspace.
You must create a custom role named DataScientist that meets the following requirements:
✑ Role members must not be able to delete the workspace.
✑ Role members must not be able to create, update, or delete compute resources in the
workspace.
✑ Role members must not be able to add new users to the workspace.
You need to create a JSON file for the DataScientist role in the Azure Machine Learning
workspace.
The custom role must enforce the restrictions specified by the IT Operations team.
Which JSON code segment should you use?
A.
B.
C.
D.
Answer : A
Explanation:
The following custom role can do everything in the workspace except for the following actions:
✑ It can't create or update a compute resource.
✑ It can't delete a compute resource.
✑ It can't add, delete, or alter role assignments.
✑ It can't delete the workspace.
To create a custom role, first construct a role definition JSON file that specifies the permission
and scope for the role. The following example defines a custom role named "Data Scientist
Custom" scoped at a specific workspace level: data_scientist_custom_role.json :
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute.",
"Actions": ["*"],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/delete",
"Microsoft.Authorization/*/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microso
ft.MachineLearningServices/workspaces/
<workspace_name>"
]
}
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-assign-roles
Next Question
Question 43 ( Question Set 1 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply an Equal Width with Custom Start and Stop binning mode.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Use the Entropy MDL binning mode which has a target column.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-
data-into-bins
Next Question
Question 44 ( Question Set 1 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply a Quantiles binning mode with a PQuantile normalization.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Use the Entropy MDL binning mode which has a target column.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-
data-into-bins
Next Question
Question 45 ( Question Set 1 )
HOTSPOT -
You are evaluating a Python NumPy array that contains six data points defined as follows: data
= [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the
Python Scikit-learn machine learning library: train: [10 40 50 60], test: [20 30] train: [20 30 40 60],
test: [10 50] train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
How should you complete the code segment? To answer, select the appropriate code segment
in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: k-fold -
Box 2: 3 -
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset
into k consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.
Box 3: data -
Example: Example:
>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
Next Question
Page: 9 / 56
Total 282 questions
Previous PageNext Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Recommender Split
• B. Regular Expression Split
• C. Relative Expression Split
• D. Split Rows with the Randomized split parameter set to true
Answer : D
Explanation:
Split Rows: Use this option if you just want to divide the data into two parts. You can specify the
percentage of data to put in each split, but by default, the data is divided 50-50.
Incorrect Answers:
B: Regular Expression Split: Choose this option when you want to divide your dataset by testing
a single column for a value.
C: Relative Expression Split: Use this option whenever you want to apply a condition to a number
column.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data
Next Question
Question 47 ( Question Set 1 )
HOTSPOT -
You are preparing to build a deep learning convolutional neural network model for image
classification. You create a script to train the model using CUDA devices.
You must submit an experiment that runs this script in the Azure Machine Learning workspace.
The following compute resources are available:
✑ a Microsoft Surface device on which Microsoft Office has been installed. Corporate IT policies
prevent the installation of additional software
✑ a Compute Instance named ds-workstation in the workspace with 2 CPUs and 8 GB of
memory
✑ an Azure Machine Learning compute target named cpu-cluster with eight CPU-based nodes
✑ an Azure Machine Learning compute target named gpu-cluster with four CPU and GPU-
based nodes
You need to specify the compute resources to be used for running the code to submit the
experiment, and for running the script in order to minimize model training time.
Which resources should the data scientist use? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: the ds-workstation compute instance
A workstation notebook instance is good enough to run experiments.
Box 2: the gpu-cluster compute target
Just as GPUs revolutionized deep learning through unprecedented training and inferencing
performance, RAPIDS enables traditional machine learning practitioners to unlock game-
changing performance with GPUs. With RAPIDS on Azure Machine Learning service, users can
accelerate the entire machine learning pipeline, including data processing, training and
inferencing, with GPUs from the NC_v3, NC_v2, ND or ND_v2 families. Users can unlock
performance gains of more than 20X (with 4 GPUs), slashing training times from hours to
minutes and dramatically reducing time-to-insight.
Reference:
https://azure.microsoft.com/sv-se/blog/azure-machine-learning-service-now-supports-nvidia-s-
rapids/
Next Question
Question 48 ( Question Set 1 )
You create an Azure Machine Learning workspace. You are preparing a local Python
environment on a laptop computer. You want to use the laptop to connect to the workspace
and run experiments.
You create the following config.json file.
{
"workspace_name" : "ml-workspace"
}
You must use the Azure Machine Learning SDK to interact with data and experiments in the
workspace.
You need to configure the config.json file to connect to the workspace from the Python
environment.
Which two additional parameters must you add to the config.json file in order to connect to the
workspace? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
• A. login
• B. resource_group
• C. subscription_id
• D. key
• E. region
Answer : BC
Explanation:
To use the same workspace in multiple environments, create a JSON configuration file. The
configuration file saves your subscription (subscription_id), resource
(resource_group), and workspace name so that it can be easily loaded.
The following sample shows how to create a workspace.
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace
Next Question
Question 49 ( Question Set 1 )
HOTSPOT -
You are performing a classification task in Azure Machine Learning Studio.
You must prepare balanced testing and training samples based on a provided data set.
You need to split the data with a 0.75:0.25 ratio.
Which value should you use for each parameter? To answer, select the appropriate options in
the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 2: 0.75 -
If you specify a number as a percentage, or if you use a string that contains the "%" character,
the value is interpreted as a percentage. All percentage values must be within the range (0, 100),
not including the values 0 and 100.
Box 3: Yes -
To ensure splits are balanced.
Box 4: No -
If you use the option for a stratified split, the output datasets can be further divided by
subgroups, by selecting a strata column.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data
Next Question
Question 50 ( Question Set 1 )
You create an Azure Machine Learning compute resource to train models. The compute
resource is configured as follows:
✑ Minimum nodes: 2
✑ Maximum nodes: 4
You must decrease the minimum number of nodes and increase the maximum number of nodes
to the following values:
✑ Minimum nodes: 0
✑ Maximum nodes: 8
You need to reconfigure the compute resource.
What are three possible ways to achieve this goal? Each correct answer presents a complete
solution.
NOTE: Each correct selection is worth one point.
Answer : ABC
Explanation:
A: You can manage assets and resources in the Azure Machine Learning studio.
B: The update(min_nodes=None, max_nodes=None, idle_seconds_before_scaledown=None) of
the AmlCompute class updates the ScaleSettings for this
AmlCompute target.
C: To change the nodes in the cluster, use the UI for your cluster in the Azure portal.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-
core/azureml.core.compute.amlcompute(class)
Next Question
Page: 10 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
Use the Split data into partitions option when you want to divide the dataset into subsets of the
data. This option is also useful when you want to create a custom number of folds for cross-
validation, or to split rows into several groups.
1. Add the Partition and Sample module to your experiment in Studio (classic), and connect the
dataset.
2. For Partition or sample mode, select Assign to Folds.
3. Use replacement in the partitioning: Select this option if you want the sampled row to be put
back into the pool of rows for potential reuse. As a result, the same row might be assigned to
several folds.
4. If you do not use replacement (the default option), the sampled row is not put back into the
pool of rows for potential reuse. As a result, each row can be assigned to only one fold.
5. Randomized split: Select this option if you want rows to be randomly assigned to folds.
If you do not select this option, rows are assigned to folds using the round-robin method.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-
and-sample
Next Question
Question 52 ( Question Set 1 )
You create a new Azure subscription. No resources are provisioned in the subscription.
You need to create an Azure Machine Learning workspace.
What are three possible ways to achieve this goal? Each correct answer presents a complete
solution.
NOTE: Each correct selection is worth one point.
• A. Run Python code that uses the Azure ML SDK library and calls the Workspace.create
method with name, subscription_id, and resource_group parameters.
• B. Navigate to Azure Machine Learning studio and create a workspace.
• C. Use the Azure Command Line Interface (CLI) with the Azure Machine Learning
extension to call the az group create function with --name and --location parameters,
and then the az ml workspace create function, specifying ג€"w and ג€"g parameters for
the workspace name and resource group.
• D. Navigate to Azure Machine Learning studio and create a workspace.
• E. Run Python code that uses the Azure ML SDK library and calls the Workspace.get
method with name, subscription_id, and resource_group parameters.
Answer : BCD
Explanation:
B: You can create a workspace in the Azure Machine Learning studio
C: You can create a workspace for Azure Machine Learning with Azure CLI
Install the machine learning extension.
Create a resource group: az group create --name <resource-group-name> --location
<location>
To create a new workspace where the services are automatically created, use the following
command: az ml workspace create -w <workspace-name> -g
<resource-group-name>
D: You can create and manage Azure Machine Learning workspaces in the Azure portal.
1. Sign in to the Azure portal by using the credentials for your Azure subscription.
2. In the upper-left corner of Azure portal, select + Create a resource.
3. Use the search bar to find Machine Learning.
4. Select Machine Learning.
5. In the Machine Learning pane, select Create to begin.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-workspace-template
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace-cli
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace
Next Question
Question 53 ( Question Set 1 )
HOTSPOT -
You create an Azure Machine Learning workspace and set up a development environment. You
plan to train a deep neural network (DNN) by using the
Tensorflow framework and by using estimators to submit training scripts.
You must optimize computation speed for training runs.
You need to choose the appropriate estimator to use as well as the appropriate training
compute target configuration.
Which values should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Tensorflow -
TensorFlow represents an estimator for training in TensorFlow experiments.
Box 2: 12 vCPU, 112 GB memory..,2 GPU,..
Use GPUs for the deep neural network.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn
Next Question
Question 54 ( Question Set 1 )
HOTSPOT -
You have an Azure Machine Learning workspace named workspace1 that is accessible from a
public endpoint. The workspace contains an Azure Blob storage datastore named store1 that
represents a blob container in an Azure storage account named account1. You configure
workspace1 and account1 to be accessible by using private endpoints in the same virtual
network.
You must be able to access the contents of store1 by using the Azure Machine Learning SDK for
Python. You must be able to preview the contents of store1 by using Azure Machine Learning
studio.
You need to configure store1.
What should you do? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Regenerate the keys of account1.
Azure Blob Storage support authentication through Account key or SAS token.
To authenticate your access to the underlying storage service, you can provide either your
account key, shared access signatures (SAS) tokens, or service principal
Box 2: Update the authentication for store1.
For Azure Machine Learning studio users, several features rely on the ability to read data from a
dataset; such as dataset previews, profiles and automated machine learning. For these features
to work with storage behind virtual networks, use a workspace managed identity in the studio to
allow Azure Machine
Learning to access the storage account from outside the virtual network.
Note: Some of the studio's features are disabled by default in a virtual network. To re-enable
these features, you must enable managed identity for storage accounts you intend to use in the
studio.
The following operations are disabled by default in a virtual network:
✑ Preview data in the studio.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data
Next Question
Question 55 ( Question Set 2 )
You are analyzing a dataset containing historical data from a local taxi company. You are
developing a regression model.
You must predict the fare of a taxi trip.
You need to select performance metrics to correctly evaluate the regression model.
Which two metrics can you use? Each correct answer presents a complete solution?
NOTE: Each correct selection is worth one point.
Answer : AD
Explanation:
RMSE and R2 are both metrics for regression models.
A: Root mean squared error (RMSE) creates a single value that summarizes the error in the
model. By squaring the difference, the metric disregards the difference between over-prediction
and under-prediction.
D: Coefficient of determination, often referred to as R2, represents the predictive power of the
model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means
there is a perfect fit. However, caution should be used in interpreting R2 values, as low values
can be entirely normal and high values can be suspect.
Incorrect Answers:
C, E: F-score is used for classification models, not for regression models.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model
Next Question
Page: 11 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
You plan to use this configuration to run a script that trains a random forest model and then
tests it with validation data. The label values for the validation data are stored in a variable
named y_test variable, and the predicted probabilities from the model are stored in a variable
named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the
AUC metric.
Solution: Run the following code:
Answer : A
Explanation:
Python printing/logging example:
logging.info(message)
Destination: Driver logs, Azure Machine Learning designer
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-debug-pipelines
Next Question
Question 57 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You
configure a HyperDriveConfig for the experiment by running the following code:
You plan to use this configuration to run a script that trains a random forest model and then
tests it with validation data. The label values for the validation data are stored in a variable
named y_test variable, and the predicted probabilities from the model are stored in a variable
named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the
AUC metric.
Solution: Run the following code:
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation -
Use a solution with logging.info(message) instead.
Note: Python printing/logging example:
logging.info(message)
Destination: Driver logs, Azure Machine Learning designer
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-debug-pipelines
Next Question
Question 58 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You
configure a HyperDriveConfig for the experiment by running the following code:
You plan to use this configuration to run a script that trains a random forest model and then
tests it with validation data. The label values for the validation data are stored in a variable
named y_test variable, and the predicted probabilities from the model are stored in a variable
named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the
AUC metric.
Solution: Run the following code:
• A. Yes
• B. No
Answer : B
Explanation -
Use a solution with logging.info(message) instead.
Note: Python printing/logging example:
logging.info(message)
Destination: Driver logs, Azure Machine Learning designer
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-debug-pipelines
Next Question
Question 59 ( Question Set 2 )
You use the following code to run a script as an experiment in Azure Machine Learning:
You must identify the output files that are generated by the experiment run.
You need to add code to retrieve the output file names.
Which code segment should you add to the script?
• A. files = run.get_properties()
• B. files= run.get_file_names()
• C. files = run.get_details_with_logs()
• D. files = run.get_metrics()
• E. files = run.get_details()
Answer : B
Explanation:
You can list all of the files that are associated with this run record by called run.get_file_names()
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-track-experiments
Next Question
Question 60 ( Question Set 2 )
You write five Python scripts that must be processed in the order specified in Exhibit A ג€" which
allows the same modules to run in parallel, but will wait for modules with dependencies.
You must create an Azure Machine Learning pipeline using the Python SDK, because you want
to script to create the pipeline to be tracked in your version control system. You have created
five PythonScriptSteps and have named the variables to match the module names.
You need to create the pipeline shown. Assume all relevant imports have been done.
Which Python code segment should you use?
A.
B.
C.
D.
Answer : A
Explanation:
The steps parameter is an array of steps. To build pipelines that have multiple steps, place the
steps in order in this array.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-parallel-run-step
Next Question
Page: 12 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
You need to configure the estimator for the experiment so that the script can read the data from
a data reference named data_ref that references the csv_files folder in the training_data
datastore.
Which code should you use to configure the estimator?
A.
B.
C.
D.
E.
Answer : B
Explanation:
Besides passing the dataset through the input parameters in the estimator, you can also pass
the dataset through script_params and get the data path (mounting point) in your training script
via arguments. This way, you can keep your training script independent of azureml-sdk. In other
words, you will be able use the same training script for local debugging and remote training on
any cloud platform.
Example:
from azureml.train.sklearn import SKLearn
script_params = {
# mount the dataset on the remote compute and pass the mounted path as an argument to the
training script
'--data-folder': mnist_ds.as_named_input('mnist').as_mount(),
'--regularization': 0.5
}
est = SKLearn(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script='train_mnist.py')
# Run the experiment
run = experiment.submit(est)
run.wait_for_completion(show_output=True)
Incorrect Answers:
A: Pandas DataFrame not used.
Reference:
https://docs.microsoft.com/es-es/azure/machine-learning/how-to-train-with-datasets
Next Question
Question 62 ( Question Set 2 )
DRAG DROP -
You create a multi-class image classification deep learning experiment by using the PyTorch
framework. You plan to run the experiment on an Azure Compute cluster that has nodes with
GPU's.
You need to define an Azure Machine Learning service pipeline to perform the monthly
retraining of the image classification model. The pipeline must run with minimal cost and
minimize the time required to train the model.
Which three pipeline steps should you run in sequence? To answer, move the appropriate
actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Configure a DataTransferStep() to fetch new image dataג€¦
Step 2: Configure a PythonScriptStep() to run image_resize.y on the cpu-compute compute
target.
Step 3: Configure the EstimatorStep() to run training script on the gpu_compute computer
target.
The PyTorch estimator provides a simple way of launching a PyTorch training job on a compute
target.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch
Next Question
Question 63 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
An IT department creates the following Azure resource groups and resources:
The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target
named aks-cluster in the Azure Machine Learning workspace.
You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are
installed.
You need to run a script that trains a deep neural network (DNN) model and logs the loss and
accuracy metrics.
Solution: Attach the mlvm virtual machine as a compute target in the Azure Machine Learning
workspace. Install the Azure ML SDK on the Surface Book and run
Python code to connect to the workspace. Run the training script as an experiment on the mlvm
remote compute resource.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
Use the VM as a compute target.
Note: A compute target is a designated compute resource/environment where you run your
training script or host your service deployment. This location may be your local machine or a
cloud-based compute resource.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target
Next Question
Question 64 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
An IT department creates the following Azure resource groups and resources:
The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target
named aks-cluster in the Azure Machine Learning workspace.
You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are
installed.
You need to run a script that trains a deep neural network (DNN) model and logs the loss and
accuracy metrics.
Solution: Install the Azure ML SDK on the Surface Book. Run Python code to connect to the
workspace and then run the training script as an experiment on local compute.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Need to attach the mlvm virtual machine as a compute target in the Azure Machine Learning
workspace.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target
Next Question
Question 65 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
An IT department creates the following Azure resource groups and resources:
The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target
named aks-cluster in the Azure Machine Learning workspace.
You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are
installed.
You need to run a script that trains a deep neural network (DNN) model and logs the loss and
accuracy metrics.
Solution: Install the Azure ML SDK on the Surface Book. Run Python code to connect to the
workspace. Run the training script as an experiment on the aks- cluster compute target.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Need to attach the mlvm virtual machine as a compute target in the Azure Machine Learning
workspace.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target
Next Question
Page: 13 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: No -
max_total_runs (50 here)
The maximum total number of runs to create. This is the upper bound; there may be fewer runs
when the sample space is smaller than this value.
Box 2: Yes -
Policy EarlyTerminationPolicy -
The early termination policy to use. If None - the default, no early termination policy will be
used.
Box 3: No -
Discrete hyperparameters are specified as a choice among discrete values. choice can be:
✑ one or more comma-separated values
✑ a range object
✑ any arbitrary list object
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-
core/azureml.train.hyperdrive.hyperdriveconfig https://docs.microsoft.com/en-
us/azure/machine-learning/how-to-tune-hyperparameters
Next Question
Question 67 ( Question Set 2 )
HOTSPOT -
You are using Azure Machine Learning to train machine learning models. You need a compute
target on which to remotely run the training script.
You run the following Python code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Yes -
The compute is created within your workspace region as a resource that can be shared with
other users.
Box 2: Yes -
It is displayed as a compute cluster.
Box 3: Yes -
min_nodes is not specified, so it defaults to 0.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-
core/azureml.core.compute.amlcompute.amlcomputeprovisioningconfiguration
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-
studio
Next Question
Question 68 ( Question Set 2 )
HOTSPOT -
You have an Azure blob container that contains a set of TSV files. The Azure blob container is
registered as a datastore for an Azure Machine Learning service workspace. Each TSV file uses
the same data schema.
You plan to aggregate data for all of the TSV files together and then register the aggregated
data as a dataset in an Azure Machine Learning workspace by using the Azure Machine Learning
SDK for Python.
You run the following code.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: No -
FileDataset references single or multiple files in datastores or from public URLs. The TSV files
need to be parsed.
Box 2: Yes -
to_path() gets a list of file paths for each file stream defined by the dataset.
Box 3: Yes -
TabularDataset.to_pandas_dataframe loads all records from the dataset into a pandas
DataFrame.
TabularDataset represents data in a tabular format created by parsing the provided file or list of
files.
Note: TSV is a file extension for a tab-delimited file used with spreadsheet software. TSV stands
for Tab Separated Values. TSV files are used for raw data and can be imported into and exported
from spreadsheet software. TSV files are essentially text files, and the raw data can be viewed by
text editors, though they are often used when moving raw data between spreadsheets.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset
Next Question
Question 69 ( Question Set 2 )
You create a batch inference pipeline by using the Azure ML SDK. You configure the pipeline
parameters by executing the following code:
You need to obtain the output from the pipeline execution.
Where will you find the output?
Answer : E
Explanation:
output_action (str): How the output is to be organized. Currently supported values are
'append_row' and 'summary_only'.
'append_row' ג€" All values output by run() method invocations will be aggregated into one
unique file named parallel_run_step.txt that is created in the output location.
'summary_only'
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-
steps/azureml.contrib.pipeline.steps.parallelrunconfig
Next Question
Question 70 ( Question Set 2 )
DRAG DROP -
You create a multi-class image classification deep learning model.
The model must be retrained monthly with the new image data fetched from a public web
portal. You create an Azure Machine Learning pipeline to fetch new data, standardize the size of
images, and retrain the model.
You need to use the Azure Machine Learning SDK to configure the schedule for the pipeline.
Which four actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Publish the pipeline.
To schedule a pipeline, you'll need a reference to your workspace, the identifier of your
published pipeline, and the name of the experiment in which you wish to create the schedule.
Step 2: Retrieve the pipeline ID.
Needed for the schedule.
Step 3: Create a ScheduleRecurrence..
To run a pipeline on a recurring basis, you'll create a schedule. A Schedule associates a pipeline,
an experiment, and a trigger.
First create a schedule. Example: Create a Schedule that begins a run every 15 minutes:
recurrence = ScheduleRecurrence(frequency="Minute", interval=15)
Step 4: Define an Azure Machine Learning pipeline schedule..
Example, continued:
recurring_schedule = Schedule.create(ws, name="MyRecurringSchedule", description="Based on
time", pipeline_id=pipeline_id, experiment_name=experiment_name, recurrence=recurrence)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines
Next Question
Page: 14 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
Box 1: Yes -
Parameter source_directory is a local directory containing experiment configuration and code
files needed for a training job.
Box 2: Yes -
script_params is a dictionary of command-line arguments to pass to the training script specified
in entry_script.
Box 3: No -
Box 4: Yes -
The conda_packages parameter is a list of strings representing conda packages to be added to
the Python environment for the experiment.
Next Question
Question 72 ( Question Set 2 )
HOTSPOT -
You have a Python data frame named salesData in the following format:
Answer :
Explanation:
Box 1: dataFrame -
Syntax: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None)[source]
Box 2: shop -
Paramter id_vars id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
Box 3: ['2017','2018']
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
Example:
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value -
0aB1
1bB3
2cB5
3aC2
4bC4
5cC6
Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html
Next Question
Question 73 ( Question Set 2 )
HOTSPOT -
You are working on a classification task. You have a dataset indicating whether a student would
like to play soccer and associated attributes. The dataset includes the following columns:
Answer :
Reference:
https://www.edureka.co/blog/classification-algorithms/
Next Question
Question 74 ( Question Set 2 )
HOTSPOT -
You plan to preprocess text from CSV files. You load the Azure Machine Learning Studio default
stop words list.
You need to configure the Preprocess Text module to meet the following requirements:
✑ Ensure that multiple related words from a single canonical form.
✑ Remove pipe characters from text.
Remove words to optimize information retrieval.
Which three options should you select? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Next Question
Question 75 ( Question Set 2 )
You plan to run a script as an experiment using a Script Run Configuration. The script uses
modules from the scipy library as well as several Python packages that are not typically installed
in a default conda environment.
You plan to run the experiment on your local workstation for small datasets and scale out the
experiment by running it on more powerful remote compute clusters for larger datasets.
You need to ensure that the experiment runs successfully on local and remote compute with the
least administrative effort.
What should you do?
• A. Do not specify an environment in the run configuration for the experiment. Run the
experiment by using the default environment.
• B. Create a virtual machine (VM) with the required Python configuration and attach the
VM as a compute target. Use this compute target for all experiment runs.
• C. Create and register an Environment that includes the required packages. Use this
Environment for all experiment runs.
• D. Create a config.yaml file defining the conda packages that are required and save the
file in the experiment folder.
• E. Always run the experiment with an Estimator by using the default packages.
Answer : C
Explanation:
If you have an existing Conda environment on your local computer, then you can use the service
to create an environment object. By using this strategy, you can reuse your local interactive
environment on remote runs.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments
Next Question
Page: 15 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
You need to record the row count as a metric named row_count that can be returned using the
get_metrics method of the Run object after the experiment run completes.
Which code should you use?
Answer : B
Explanation:
Log a numerical or string value to the run with the given name using log(name, value,
description=''). Logging a metric to a run causes that metric to be stored in the run record in the
experiment. You can log the same metric multiple times within a run, the result being
considered a vector of that metric.
Example: run.log("accuracy", 0.95)
Incorrect Answers:
E: Using log_row(name, description=None, **kwargs) creates a metric with multiple columns as
described in kwargs. Each named parameter generates a column with the value specified.
log_row can be called once to log an arbitrary tuple, or multiple times in a loop to generate a
complete table.
Example: run.log_row("Y over X", x=1, y=0.4)
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run
Next Question
Question 77 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class
imbalance.
Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
SMOTE is used to increase the number of underepresented cases in a dataset used for machine
learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating
existing cases.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
Next Question
Question 78 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class
imbalance.
Solution: You use the Stratified split for the sampling mode.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for
machine learning. SMOTE is a better way of increasing the number of rare cases than simply
duplicating existing cases.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
Next Question
Question 79 ( Question Set 2 )
You are creating a machine learning model.
You need to identify outliers in the data.
Which two visualizations can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
• A. Venn diagram
• B. Box plot
• C. ROC curve
• D. Random forest diagram
• E. Scatter plot
Answer : BE
Explanation:
The box-plot algorithm can be used to display outliers.
One other way to quickly identify Outliers visually is to create scatter plots.
Reference:
https://blogs.msdn.microsoft.com/azuredev/2017/05/27/data-cleansing-tools-in-azure-
machine-learning/
Next Question
Question 80 ( Question Set 2 )
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
• A. Violin plot
• B. Gradient descent
• C. Box plot
• D. Binary classification confusion matrix
Answer : D
Explanation:
Incorrect Answers:
A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.
B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a
function. To find a local minimum of a function using gradient descent, one takes steps
proportional to the negative of the gradient (or approximate gradient) of the function at the
current point.
C: A box plot lets you see basic distribution information about your data, such as median, mean,
range and quartiles but doesn't show you how your data looks throughout its range.
Reference:
https://machinelearningknowledge.ai/confusion-matrix-and-performance-metrics-machine-
learning/
Next Question
Page: 16 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : ADF
Explanation:
AD:
primary_metric_name="accuracy",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE
Optimize the runs to maximize "accuracy". Make sure to log this value in your training script.
Note:
primary_metric_name: The name of the primary metric to optimize. The name of the primary
metric needs to exactly match the name of the metric logged by the training script.
primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or
PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or
minimized when evaluating the runs.
F: The training script calculates the val_accuracy and logs it as "accuracy", which is used as the
primary metric.
Next Question
Question 82 ( Question Set 2 )
DRAG DROP -
You have a dataset that contains over 150 features. You use the dataset to train a Support
Vector Machine (SVM) binary classifier.
You need to use the Permutation Feature Importance module in Azure Machine Learning Studio
to compute a set of feature importance scores for the dataset.
In which order should you perform the actions? To answer, move all actions from the list of
actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Add a Two-Class Support Vector Machine module to initialize the SVM classifier.
Step 2: Add a dataset to the experiment
Step 3: Add a Split Data module to create training and test dataset.
To generate a set of feature scores requires that you have an already trained model, as well as a
test dataset.
Step 4: Add a Permutation Feature Importance module and connect to the trained model and
test dataset.
Step 5: Set the Metric for measuring performance property to Classification - Accuracy and then
run the experiment.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-
support-vector-machine https://docs.microsoft.com/en-us/azure/machine-learning/studio-
module-reference/permutation-feature-importance
Next Question
Question 83 ( Question Set 2 )
HOTSPOT -
You are using the Hyperdrive feature in Azure Machine Learning to train a model.
You configure the Hyperdrive experiment by running the following code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Yes -
In random sampling, hyperparameter values are randomly selected from the defined search
space. Random sampling allows the search space to include both discrete and continuous
hyperparameters.
Box 2: Yes -
learning_rate has a normal distribution with mean value 10 and a standard deviation of 3.
Box 3: No -
keep_probability has a uniform distribution with a minimum value of 0.05 and a maximum value
of 0.1.
Box 4: No -
number_of_hidden_layers takes on one of the values [3, 4, 5].
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
Next Question
Question 84 ( Question Set 2 )
You are performing a filter-based feature selection for a dataset to build a multi-class classifier
by using Azure Machine Learning Studio.
The dataset contains categorical features that are highly correlated to the output label column.
You need to select the appropriate feature scoring statistical method to identify the key
predictors.
Which method should you use?
• A. Kendall correlation
• B. Spearman correlation
• C. Chi-squared
• D. Pearson correlation
Answer : D
Explanation:
Pearson's correlation statistic, or Pearson's correlation coefficient, is also known in statistical
models as the r value. For any two variables, it returns a value that indicates the strength of the
correlation
Pearson's correlation coefficient is the test statistics that measures the statistical relationship, or
association, between two continuous variables. It is known as the best method of measuring the
association between variables of interest because it is based on the method of covariance. It
gives information about the magnitude of the association, or correlation, as well as the direction
of the relationship.
Incorrect Answers:
C: The two-way chi-squared test is a statistical method that measures how close expected values
are to actual results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/filter-
based-feature-selection https://www.statisticssolutions.com/pearsons-correlation-coefficient/
Next Question
Question 85 ( Question Set 2 )
HOTSPOT -
You create a binary classification model to predict whether a person has a disease.
You need to detect possible classification errors.
Which error type should you choose for each description? To answer, select the appropriate
options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Next Question
Page: 17 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Next Question
Question 87 ( Question Set 2 )
You plan to use automated machine learning to train a regression model. You have data that
has features which have missing values, and categorical features with few distinct values.
You need to configure automated machine learning to automatically impute missing values and
encode categorical features as part of the training task.
Which parameter and value pair should you use in the AutoMLConfig class?
• A. featurization = 'auto'
• B. enable_voting_ensemble = True
• C. task = 'classification'
• D. exclude_nan_labels = True
• E. enable_tf = True
Answer : A
Explanation:
Featurization str or FeaturizationConfig
Values: 'auto' / 'off' / FeaturizationConfig
Indicator for whether featurization step should be done automatically or not, or whether
customized featurization should be used.
Column type is automatically detected. Based on the detected column type
preprocessing/featurization is done as follows:
Categorical: Target encoding, one hot encoding, drop high cardinality categories, impute
missing values.
Numeric: Impute missing values, cluster distance, weight of evidence.
DateTime: Several features such as day, seconds, minutes, hours etc.
Text: Bag of words, pre-trained Word embedding, text target encoding.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-automl-
client/azureml.train.automl.automlconfig.automlconfig
Next Question
Question 88 ( Question Set 2 )
DRAG DROP -
You create a training pipeline using the Azure Machine Learning designer. You upload a CSV file
that contains the data from which you want to train your model.
You need to use the designer to create a pipeline that includes steps to perform the following
tasks:
✑ Select the training features using the pandas filter method.
✑ Train a model based on the naive_bayes.GaussianNB algorithm.
✑ Return only the Scored Labels column by using the query
✑ SELECT [Scored Labels] FROM t1;
Which modules should you use? To answer, drag the appropriate modules to the appropriate
locations. Each module name may be used once, more than once, or not at all. You may need to
drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Next Question
Question 89 ( Question Set 2 )
You are building a regression model for estimating the number of calls during an event.
You need to determine whether the feature values achieve the conditions to build a Poisson
regression model.
Which two conditions must the feature set contain? Each correct answer presents part of the
solution.
NOTE: Each correct selection is worth one point.
Answer : BD
Explanation:
Poisson regression is intended for use in regression models that are used to predict numeric
values, typically counts. Therefore, you should use this module to create your regression model
only if the values you are trying to predict fit the following conditions:
✑ The response variable has a Poisson distribution.
✑ Counts cannot be negative. The method will fail outright if you attempt to use it with
negative labels.
✑ A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this
method with non-whole numbers.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/poisson-
regression
Next Question
Question 90 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class
imbalance.
Solution: You use the Principal Components Analysis (PCA) sampling mode.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for
machine learning. SMOTE is a better way of increasing the number of rare cases than simply
duplicating existing cases.
Incorrect Answers:
The Principal Component Analysis module in Azure Machine Learning Studio (classic) is used to
reduce the dimensionality of your training data. The module analyzes your data and creates a
reduced feature set that captures all the information contained in the dataset, but in a smaller
number of features.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-
component-analysis
Next Question
Page: 18 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• A. Edit Metadata
• B. Filter Based Feature Selection
• C. Execute Python Script
• D. Latent Dirichlet Allocation
Answer : A
Explanation:
Typical metadata changes might include marking columns as features.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-
metadata
Next Question
Question 92 ( Question Set 2 )
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
• A. violin plot
• B. Gradient descent
• C. Scatter plot
• D. Receiver Operating Characteristic (ROC) curve
Answer : D
Explanation:
Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the
incorrectly classified labels for a particular model.
Incorrect Answers:
A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.
B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a
function. To find a local minimum of a function using gradient descent, one takes steps
proportional to the negative of the gradient (or approximate gradient) of the function at the
current point.
C: A scatter plot graphs the actual values in your data against the values predicted by the model.
The scatter plot displays the actual values along the X-axis, and displays the predicted values
along the Y-axis. It also displays a line that illustrates the perfect prediction, where the predicted
value exactly matches the actual value.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-
ml#confusion-matrix
Next Question
Question 93 ( Question Set 2 )
You are solving a classification task.
You must evaluate your model on a limited data sample by using k-fold cross-validation. You
start by configuring a k parameter as the number of splits.
You need to configure the k parameter for the cross-validation.
Which value should you use?
• A. k=1
• B. k=10
• C. k=0.5
• D. k=0.9
Answer : B
Explanation:
Leave One Out (LOO) cross-validation
Setting K = n (the number of observations) yields n-fold and is called leave-one out cross-
validation (LOO), a special case of the K-fold approach.
LOO CV is sometimes useful but typically doesn't shake up the data enough. The estimates from
each fold are highly correlated and hence their average can have high variance.
This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance
tradeoff.
Next Question
Question 94 ( Question Set 2 )
HOTSPOT -
You have a dataset created for multiclass classification tasks that contains a normalized
numerical feature set with 10,000 data points and 150 features.
You use 75 percent of the data points for training and 25 percent for testing. You are using the
scikit-learn machine learning library in Python. You use X to denote the feature set and Y to
denote class labels.
You create the following Python data frames:
You need to apply the Principal Component Analysis (PCA) method to reduce the dimensionality
of the feature set to 10 features in both training and testing sets.
How should you complete the code segment? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: PCA(n_components = 10)
Need to reduce the dimensionality of the feature set to 10 features in both training and testing
sets.
Example:
from sklearn.decomposition import PCA
pca = PCA(n_components=2) ;2 dimensions
principalComponents = pca.fit_transform(x)
Box 2: pca -
fit_transform(X[, y]) fits the model with X and apply the dimensionality reduction on X.
Box 3: transform(x_test)
transform(X) applies dimensionality reduction to X.
Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Next Question
Question 95 ( Question Set 2 )
HOTSPOT -
You have a feature set containing the following numerical features: X, Y, and Z.
The Poisson correlation coefficient (r-value) of X, Y, and Z features is shown in the following
image:
Use the drop-down menus to select the answer choice that answers each question based on the
information presented in the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: 0.859122 -
Box 2: a positively linear relationship
+1 indicates a strong positive linear relationship
-1 indicates a strong negative linear correlation
0 denotes no linear relationship between the two variables.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-
linear-correlation
Next Question
Page: 19 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
Box 1: log -
The number of observations in the dataset.
run.log(name, value, description='')
Scalar values: Log a numerical or string value to the run with the given name. Logging a metric
to a run causes that metric to be stored in the run record in the experiment. You can log the
same metric multiple times within a run, the result being considered a vector of that metric.
Example: run.log("accuracy", 0.95)
Box 2: log_image -
A box plot of income by home_owner.
log_image Log an image to the run record. Use log_image to log a .PNG image file or a
matplotlib plot to the run. These images will be visible and comparable in the run record.
Example: run.log_image("ROC", plot=plt)
Box 3: log_table -
A dictionary containing the city names and the average income for each city. log_table: Log a
dictionary object to the run with the given name.
Next Question
Question 97 ( Question Set 2 )
You use the Azure Machine Learning service to create a tabular dataset named training_data.
You plan to use this dataset in a training script.
You create a variable that references the dataset using the following code: training_ds =
workspace.datasets.get("training_data")
You define an estimator to run the script.
You need to set the correct property of the estimator to ensure that your script can access the
training_data dataset.
Which property should you set?
• A. environment_definition = {"training_data":training_ds}
• B. inputs = [training_ds.as_named_input('training_ds')]
• C. script_params = {"--training_ds":training_ds}
• D. source_directory = training_ds
Answer : B
Explanation:
Example:
# Get the training dataset
diabetes_ds = ws.datasets.get("Diabetes Dataset")
# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder,
inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input compute_target
= cpu_cluster, conda_packages=['pandas','ipykernel','matplotlib'], pip_packages=['azureml-
sdk','argparse','pyarrow'], entry_script='diabetes_training.py')
Reference:
https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-
%20Optimizing%20Model%20Training.ipynb
Next Question
Question 98 ( Question Set 2 )
You register a file dataset named csv_folder that references a folder. The folder includes multiple
comma-separated values (CSV) files in an Azure storage blob container.
You plan to use the following code to run a script that loads data from the file dataset. You
create and instantiate the following variables:
• A. inputs=[file_dataset.as_named_input('training_files')],
• B. inputs=[file_dataset.as_named_input('training_files').as_mount()],
• C. inputs=[file_dataset.as_named_input('training_files').to_pandas_dataframe()],
• D. script_params={'--training_files': file_dataset},
Answer : B
Explanation:
Example:
from azureml.train.estimator import Estimator
script_params = {
# to mount files referenced by mnist dataset
'--data-folder': mnist_file_dataset.as_named_input('mnist_opendataset').as_mount(),
'--regularization': 0.5
}
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script='train.py')
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-train-models-with-aml
Next Question
Question 99 ( Question Set 2 )
You are creating a new Azure Machine Learning pipeline using the designer.
The pipeline must train a model using data in a comma-separated values (CSV) file that is
published on a website. You have not created a dataset for this file.
You need to ingest the data from the CSV file into the designer pipeline using the minimal
administrative effort.
Which module should you add to the pipeline in Designer?
• A. Convert to CSV
• B. Enter Data Manually
• C. Import Data
• D. Dataset
Answer : D
Explanation:
The preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to
data that lives in or is accessible from a datastore or at a Web
URL. The Dataset class is abstract, so you will create an instance of either a FileDataset (referring
to one or more files) or a TabularDataset that's created by from one or more files with delimited
columns of data.
Example:
from azureml.core import Dataset
iris_tabular_dataset = Dataset.Tabular.from_delimited_files([(def_blob_store, 'train-
dataset/iris.csv')])
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline
Next Question
Question 100 ( Question Set 2 )
You define a datastore named ml-data for an Azure Storage blob container. In the container,
you have a folder named train that contains a file named data.csv.
You plan to use the file to train a model by using the Azure Machine Learning SDK.
You plan to train the model by using the Azure Machine Learning SDK to run an experiment on
local compute.
You define a DataReference object by running the following code:
You need to load the training data.
Which code segment should you use?
A.
B.
C.
D.
E.
Answer : E
Explanation:
Example:
data_folder = args.data_folder
# Load Train and Test data
train_data = pd.read_csv(os.path.join(data_folder, 'data.csv'))
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai
Next Question
Page: 20 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
You need to create a dataset named training_data and load the data from all files into a single
data frame by using the following code:
• A. Yes
• B. No
Answer : B
Explanation:
Define paths with two file paths instead.
Use Dataset.Tabular_from_delimeted as the data isn't cleansed.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets
Next Question
Question 102 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains
the following files:
✑ /data/2018/Q1.csv
✑ /data/2018/Q2.csv
✑ /data/2018/Q3.csv
✑ /data/2018/Q4.csv
✑ /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,I
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1
You run the following code:
You need to create a dataset named training_data and load the data from all files into a single
data frame by using the following code:
• A. Yes
• B. No
Answer : B
Explanation:
Use two file paths.
Use Dataset.Tabular_from_delimeted, instead of Dataset.File.from_files as the data isn't cleansed.
Note:
A FileDataset references single or multiple files in your datastores or public URLs. If your data is
already cleansed, and ready to use in training experiments, you can download or mount the files
to your compute as a FileDataset object.
A TabularDataset represents data in a tabular format by parsing the provided file or list of files.
This provides you with the ability to materialize the data into a pandas or Spark DataFrame so
you can work with familiar data preparation and training libraries without having to leave your
notebook. You can create a
TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets
Next Question
Question 103 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains
the following files:
✑ /data/2018/Q1.csv
✑ /data/2018/Q2.csv
✑ /data/2018/Q3.csv
✑ /data/2018/Q4.csv
✑ /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,I
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1
You run the following code:
You need to create a dataset named training_data and load the data from all files into a single
data frame by using the following code:
• A. Yes
• B. No
Answer : A
Explanation:
Use two file paths.
Use Dataset.Tabular_from_delimeted as the data isn't cleansed.
Note:
A TabularDataset represents data in a tabular format by parsing the provided file or list of files.
This provides you with the ability to materialize the data into a pandas or Spark DataFrame so
you can work with familiar data preparation and training libraries without having to leave your
notebook. You can create a
TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets
Next Question
Question 104 ( Question Set 2 )
You plan to use the Hyperdrive feature of Azure Machine Learning to determine the optimal
hyperparameter values when training a model.
You must use Hyperdrive to try combinations of the following hyperparameter values:
✑ learning_rate: any value between 0.001 and 0.1
✑ batch_size: 16, 32, or 64
You need to configure the search space for the Hyperdrive experiment.
Which two parameter expressions should you use? Each correct answer presents part of the
solution.
NOTE: Each correct selection is worth one point.
Answer : BD
Explanation:
B: Continuous hyperparameters are specified as a distribution over a continuous range of values.
Supported distributions include:
✑ uniform(low, high) - Returns a value uniformly distributed between low and high
D: Discrete hyperparameters are specified as a choice among discrete values. choice can be: one
or more comma-separated values
✑ a range object
✑ any arbitrary list object
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
Next Question
Question 105 ( Question Set 2 )
HOTSPOT -
Your Azure Machine Learning workspace has a dataset named real_estate_data. A sample of the
data in the dataset follows.
You want to use automated machine learning to find the best regression model for predicting
the price column.
You need to configure an automated machine learning experiment using the Azure Machine
Learning SDK.
How should you complete the code? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: training_data -
The training data to be used within the experiment. It should contain both training features and
a label column (optionally a sample weights column). If training_data is specified, then the
label_column_name parameter must also be specified.
Box 2: validation_data -
Provide validation data: In this case, you can either start with a single data file and split it into
training and validation sets or you can provide a separate data file for the validation set. Either
way, the validation_data parameter in your AutoMLConfig object assigns which data to use as
your validation set.
Example, the following code example explicitly defines which portion of the provided data in
dataset to use for training and validation. dataset = Dataset.Tabular.from_delimited_files(data)
training_data, validation_data = dataset.random_split(percentage=0.8, seed=1) automl_config =
AutoMLConfig(compute_target = aml_remote_compute, task = 'classification', primary_metric =
'AUC_weighted', training_data = training_data, validation_data = validation_data,
label_column_name = 'Class'
)
Box 3: label_column_name -
label_column_name:
The name of the label column. If the input data is from a pandas.DataFrame which doesn't have
column names, column indices can be used instead, expressed as integers.
This parameter is applicable to training_data and validation_data parameters.
Incorrect Answers:
X: The training features to use when fitting pipelines during an experiment. This setting is being
deprecated. Please use training_data and label_column_name instead.
Y: The training labels to use when fitting pipelines during an experiment. This is the value your
model will predict. This setting is being deprecated. Please use training_data and
label_column_name instead.
X_valid: Validation features to use when fitting pipelines during an experiment.
If specified, then y_valid or sample_weight_valid must also be specified.
Y_valid: Validation labels to use when fitting pipelines during an experiment.
Both X_valid and y_valid must be specified together.
exclude_nan_labels: Whether to exclude rows with NaN values in the label. The default is True.
y_max: y_max (float)
Maximum value of y for a regression experiment. The combination of y_min and y_max are used
to normalize test set metrics based on the input data range. If not specified, the maximum value
is inferred from the data.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-automl-
client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py
Next Question
Page: 21 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
Box 1: Yes -
Hyperparameters are adjustable parameters you choose to train a model that govern the
training process itself. Azure Machine Learning allows you to automate hyperparameter
exploration in an efficient manner, saving you significant time and resources. You specify the
range of hyperparameter values and a maximum number of training runs. The system then
automatically launches multiple simultaneous runs with different parameter configurations and
finds the configuration that results in the best performance, measured by the metric you choose.
Poorly performing training runs are automatically early terminated, reducing wastage of
compute resources. These resources are instead used to explore other hyperparameter
configurations.
Box 2: Yes -
uniform(low, high) - Returns a value uniformly distributed between low and high
Box 3: No -
Bayesian sampling does not currently support any early termination policy.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
Next Question
Question 107 ( Question Set 2 )
You run an automated machine learning experiment in an Azure Machine Learning workspace.
Information about the run is listed in the table below:
You need to write a script that uses the Azure Machine Learning SDK to retrieve the best
iteration of the experiment run.
Which Python code segment should you use?
A.
B.
C.
D.
E.
Answer : D
Explanation:
The get_output method on automl_classifier returns the best run and the fitted model for the
last invocation. Overloads on get_output allow you to retrieve the best run and fitted model for
any logged metric or for a particular iteration.
In [ ]:
best_run, fitted_model = local_run.get_output()
Reference:
https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-
azureml/automated-machine-learning/classification-with-deployment/auto- ml-classification-
with-deployment.ipynb
Next Question
Question 108 ( Question Set 2 )
You have a comma-separated values (CSV) file containing data from which you want to train a
classification model.
You are using the Automated Machine Learning interface in Azure Machine Learning studio to
train the classification model. You set the task type to Classification.
You need to ensure that the Automated Machine Learning process evaluates only linear models.
What should you do?
• A. Add all algorithms other than linear ones to the blocked algorithms list.
• B. Set the Exit criterion option to a metric score threshold.
• C. Clear the option to perform automatic featurization.
• D. Clear the option to enable deep learning.
• E. Set the task type to Regression.
Answer : C
Explanation:
Automatic featurization can fit non-linear models.
Reference:
https://econml.azurewebsites.net/spec/estimation/dml.html
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-
models
Next Question
Question 109 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to use a Python script to run an Azure Machine Learning experiment. The script creates
a reference to the experiment run context, loads data from a file, identifies the set of unique
values for the label column, and completes the experiment run:
The experiment must record the unique labels in the data as metrics for the run that can be
reviewed later.
You must add code to the script to record the unique label values as run metrics at the point
indicated by the comment.
Solution: Replace the comment with the following code:
run.upload_file('outputs/labels.csv', './data.csv')
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
label_vals has the unique labels (from the statement label_vals = data['label'].unique()), and it
has to be logged.
Note:
Instead use the run_log function to log the contents in label_vals: for label_val in label_vals:
run.log('Label Values', label_val)
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai
Next Question
Question 110 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You plan to use a Python script to run an Azure Machine Learning experiment. The script creates
a reference to the experiment run context, loads data from a file, identifies the set of unique
values for the label column, and completes the experiment run:
The experiment must record the unique labels in the data as metrics for the run that can be
reviewed later.
You must add code to the script to record the unique label values as run metrics at the point
indicated by the comment.
Solution: Replace the comment with the following code:
run.log_table('Label Values', label_vals)
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use the run_log function to log the contents in label_vals: for label_val in label_vals:
run.log('Label Values', label_val)
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai
Next Question
Page: 22 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
The experiment must record the unique labels in the data as metrics for the run that can be
reviewed later.
You must add code to the script to record the unique label values as run metrics at the point
indicated by the comment.
Solution: Replace the comment with the following code:
for label_val in label_vals:
run.log('Label Values', label_val)
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
The run_log function is used to log the contents in label_vals: for label_val in label_vals:
run.log('Label Values', label_val)
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai
Next Question
Question 112 ( Question Set 2 )
HOTSPOT -
You publish a batch inferencing pipeline that will be used by a business application.
The application developers need to know which information should be submitted to and
returned by the REST interface for the published pipeline.
You need to identify the information required in the REST request and returned as a response
from the published pipeline.
Which values should you use in the REST request and to expect in the response? To answer,
select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: JSON containing an OAuth bearer token
Specify your authentication header in the request.
To run the pipeline from the REST endpoint, you need an OAuth2 Bearer-type authentication
header.
Box 2: JSON containing the experiment name
Add a JSON payload object that has the experiment name.
Example:
rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint,
headers=auth_header,
json={"ExperimentName": "batch_scoring",
"ParameterAssignments": {"process_count_per_node": 6}})
run_id = response.json()["Id"]
Box 3: JSON containing the run ID
Make the request to trigger the run. Include code to access the Id key from the response
dictionary to get the value of the run ID.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-
classification
Next Question
Question 113 ( Question Set 2 )
HOTSPOT -
You create an experiment in Azure Machine Learning Studio. You add a training dataset that
contains 10,000 rows. The first 9,000 rows represent class 0 (90 percent).
The remaining 1,000 rows represent class 1 (10 percent).
The training set is imbalances between two classes. You must increase the number of training
examples for class 1 to 4,000 by using 5 data rows. You add the
Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.
You need to configure the module.
Which values should you use? To answer, select the appropriate options in the dialog box in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: 300 -
You type 300 (%), the module triples the percentage of minority cases (3000) compared to the
original dataset (1000).
Box 2: 5 -
We should use 5 data rows.
Use the Number of nearest neighbors option to determine the size of the feature space that the
SMOTE algorithm uses when in building new cases. A nearest neighbor is a row of data (a case)
that is very similar to some target case. The distance between any two cases is measured by
combining the weighted vectors of all features.
By increasing the number of nearest neighbors, you get features from more cases.
By keeping the number of nearest neighbors low, you use features that are more like those in
the original sample.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
Next Question
Question 114 ( Question Set 2 )
You are solving a classification task.
You must evaluate your model on a limited data sample by using k-fold cross-validation. You
start by configuring a k parameter as the number of splits.
You need to configure the k parameter for the cross-validation.
Which value should you use?
• A. k=0.5
• B. k=0.01
• C. k=5
• D. k=1
Answer : C
Explanation:
Leave One Out (LOO) cross-validation
Setting K = n (the number of observations) yields n-fold and is called leave-one out cross-
validation (LOO), a special case of the K-fold approach.
LOO CV is sometimes useful but typically doesn't shake up the data enough. The estimates from
each fold are highly correlated and hence their average can have high variance.
This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance
tradeoff.
Next Question
Question 115 ( Question Set 2 )
HOTSPOT -
You are running Python code interactively in a Conda environment. The environment includes all
required Azure Machine Learning SDK and MLflow packages.
You must use MLflow to log metrics in an Azure Machine Learning experiment named mlflow-
experiment.
How should you complete the code? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
In the following code, the get_mlflow_tracking_uri() method assigns a unique tracking URI
address to the workspace, ws, and set_tracking_uri() points the MLflow tracking URI to that
address. mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
Box 2: mlflow.set_experiment(experiment_name)
Set the MLflow experiment name with set_experiment() and start your training run with
start_run().
Box 3: mlflow.start_run()
Box 4: mlflow.log_metric -
Then use log_metric() to activate the MLflow logging API and begin logging your training run
metrics.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow
Next Question
Page: 23 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
Step 1:Create and select a new dataset by uploading he command-delimited file of penguin
data.
Step 2: Select the Classification task type
Step 3: Set the Primary metric configuration setting to Accuracy.
The available metrics you can select is determined by the task type you choose.
Primary metrics for classification scenarios:
Post thresholded metrics, like accuracy, average_precision_score_weighted, norm_macro_recall,
and precision_score_weighted may not optimize as well for datasets which are very small, have
very large class skew (class imbalance), or when the expected metric value is very close to 0.0 or
1.0. In those cases,
AUC_weighted can be a better choice for the primary metric.
Step 4: Configure the automated machine learning run by selecting the experiment name, target
column, and compute target
Step 5: Run the automated machine learning experiment and review the results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train
Next Question
Question 117 ( Question Set 2 )
HOTSPOT -
You are tuning a hyperparameter for an algorithm. The following table shows a data set with
different hyperparameter, training error, and validation errors.
Use the drop-down menus to select the answer choice that answers each question based on the
information presented in the graphic.
Hot Area:
Answer :
Explanation:
Box 1: 4 -
Choose the one which has lower training and validation error and also the closest match.
Minimize variance (difference between validation error and train error).
Box 2: 5 -
Minimize variance (difference between validation error and train error).
Reference:
https://medium.com/comet-ml/organizing-machine-learning-projects-project-management-
guidelines-2d2b85651bbd
Next Question
Question 118 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create a model to forecast weather conditions based on historical data.
You need to create a pipeline that runs a processing script to load data from a datastore and
pass the processed data to a machine learning model training script.
Solution: Run the following code:
• A. Yes
• B. No
Answer : B
Explanation:
The two steps are present: process_step and train_step
The training data input is not setup correctly.
Note:
Data used in pipeline can be produced by one step and consumed in another step by providing
a PipelineData object as an output of one step and an input of one or more subsequent steps.
PipelineData objects are also used when constructing Pipelines to describe step dependencies.
To specify that a step requires the output of another step as input, use a PipelineData object in
the constructor of both steps.
For example, the pipeline train step depends on the process_step_output output of the pipeline
process step: from azureml.pipeline.core import Pipeline, PipelineData from
azureml.pipeline.steps import PythonScriptStep datastore = ws.get_default_datastore()
process_step_output = PipelineData("processed_data", datastore=datastore) process_step =
PythonScriptStep(script_name="process.py", arguments=["--data_for_train",
process_step_output], outputs=[process_step_output], compute_target=aml_compute,
source_directory=process_directory) train_step = PythonScriptStep(script_name="train.py",
arguments=["--data_for_train", process_step_output], inputs=[process_step_output],
compute_target=aml_compute, source_directory=train_directory) pipeline =
Pipeline(workspace=ws, steps=[process_step, train_step])
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-
core/azureml.pipeline.core.pipelinedata?view=azure-ml-py
Next Question
Question 119 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create a model to forecast weather conditions based on historical data.
You need to create a pipeline that runs a processing script to load data from a datastore and
pass the processed data to a machine learning model training script.
Solution: Run the following code:
• A. Yes
• B. No
Answer : B
Explanation:
train_step is missing.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-
core/azureml.pipeline.core.pipelinedata?view=azure-ml-py
Next Question
Question 120 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create a model to forecast weather conditions based on historical data.
You need to create a pipeline that runs a processing script to load data from a datastore and
pass the processed data to a machine learning model training script.
Solution: Run the following code:
• A. Yes
• B. No
Answer : B
Explanation:
Note: Data used in pipeline can be produced by one step and consumed in another step by
providing a PipelineData object as an output of one step and an input of one or more
subsequent steps.
Compare with this example, the pipeline train step depends on the process_step_output output
of the pipeline process step: from azureml.pipeline.core import Pipeline, PipelineData from
azureml.pipeline.steps import PythonScriptStep datastore = ws.get_default_datastore()
process_step_output = PipelineData("processed_data", datastore=datastore) process_step =
PythonScriptStep(script_name="process.py", arguments=["--data_for_train",
process_step_output], outputs=[process_step_output], compute_target=aml_compute,
source_directory=process_directory) train_step = PythonScriptStep(script_name="train.py",
arguments=["--data_for_train", process_step_output], inputs=[process_step_output],
compute_target=aml_compute, source_directory=train_directory) pipeline =
Pipeline(workspace=ws, steps=[process_step, train_step])
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-
core/azureml.pipeline.core.pipelinedata?view=azure-ml-py
Next Question
Page: 24 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• A. Yes
• B. No
Answer : A
Explanation:
The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a
compute target. It is implemented through the SKLearn class, which can be used to support
single-node CPU training.
Example:
from azureml.train.sklearn import SKLearn
}
estimator = SKLearn(source_directory=project_folder,
compute_target=compute_target,
entry_script='train_iris.py'
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
Next Question
Question 122 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a Python script named train.py in a local folder named scripts. The script trains a
regression model by using scikit-learn. The script includes code to load a training data file which
is also located in the scripts folder.
You must run the script as an Azure ML experiment on a compute cluster named aml-compute.
You need to configure the run to ensure that the environment includes the required packages
for model training. You have instantiated a variable named aml- compute that references the
target compute cluster.
Solution: Run the following code:
Answer : B
Explanation:
The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a
compute target. It is implemented through the SKLearn class, which can be used to support
single-node CPU training.
Example:
from azureml.train.sklearn import SKLearn
}
estimator = SKLearn(source_directory=project_folder,
compute_target=compute_target,
entry_script='train_iris.py'
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
Next Question
Question 123 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a Python script named train.py in a local folder named scripts. The script trains a
regression model by using scikit-learn. The script includes code to load a training data file which
is also located in the scripts folder.
You must run the script as an Azure ML experiment on a compute cluster named aml-compute.
You need to configure the run to ensure that the environment includes the required packages
for model training. You have instantiated a variable named aml- compute that references the
target compute cluster.
Solution: Run the following code:
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a
compute target. It is implemented through the SKLearn class, which can be used to support
single-node CPU training.
Example:
from azureml.train.sklearn import SKLearn
}
estimator = SKLearn(source_directory=project_folder,
compute_target=compute_target,
entry_script='train_iris.py'
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
Next Question
Question 124 ( Question Set 2 )
DRAG DROP -
You create machine learning models by using Azure Machine Learning.
You plan to train and score models by using a variety of compute contexts. You also plan to
create a new compute resource in Azure Machine Learning studio.
You need to select the appropriate compute types.
Which compute types should you select? To answer, drag the appropriate compute types to the
correct requirements. Each compute type may be used once, more than once, or not at all. You
may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
A remote VM -
Azure Databricks (for use in machine learning pipelines)
Azure Data Lake Analytics (for use in machine learning pipelines)
Azure HDInsight -
Compute instance -
Compute clusters -
Inference clusters -
Attached compute -
Note 2:
Compute clusters -
Create a single or multi node compute cluster for your training, batch inferencing or
reinforcement learning workloads.
Attached compute -
To use compute targets created outside the Azure Machine Learning workspace, you must
attach them. Attaching a compute target makes it available to your workspace. Use Attached
compute to attach a compute target for training. Use Inference clusters to attach an AKS cluster
for inferencing.
Inference clusters -
Create or attach an Azure Kubernetes Service (AKS) cluster for large scale inferencing.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-
studio
Next Question
Question 125 ( Question Set 2 )
DRAG DROP -
You are building an experiment using the Azure Machine Learning designer.
You split a dataset into training and testing sets. You select the Two-Class Boosted Decision Tree
as the algorithm.
You need to determine the Area Under the Curve (AUC) of the model.
Which three modules should you use in sequence? To answer, move the appropriate modules
from the list of modules to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Next Question
Page: 25 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. TensorFlow
• B. PyTorch
• C. SKLearn
• D. Estimator
Answer : B
Explanation:
For PyTorch, TensorFlow and Chainer tasks, Azure Machine Learning provides respective
PyTorch, TensorFlow, and Chainer estimators to simplify using these frameworks.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-ml-models
Next Question
Question 127 ( Question Set 2 )
You create a pipeline in designer to train a model that predicts automobile prices.
Because of non-linear relationships in the data, the pipeline calculates the natural log (Ln) of the
prices in the training data, trains a model to predict this natural log of price value, and then
calculates the exponential of the scored label to get the predicted price.
The training pipeline is shown in the exhibit. (Click the Training pipeline tab.)
Training pipeline -
You create a real-time inference pipeline from the training pipeline, as shown in the exhibit.
(Click the Real-time pipeline tab.)
Real-time pipeline -
You need to modify the inference pipeline to ensure that the web service returns the
exponential of the scored label as the predicted automobile price and that client applications
are not required to include a price value in the input values.
Which three modifications must you make to the inference pipeline? Each correct answer
presents part of the solution.
NOTE: Each correct selection is worth one point.
• A. Connect the output of the Apply SQL Transformation to the Web Service Output
module.
• B. Replace the Web Service Input module with a data input that does not include the
price column.
• C. Add a Select Columns module before the Score Model module to select all columns
other than price.
• D. Replace the training dataset module with a data input that does not include the price
column.
• E. Remove the Apply Math Operation module that replaces price with its natural log
from the data flow.
• F. Remove the Apply SQL Transformation module from the data flow.
Answer : ACE
Next Question
Question 128 ( Question Set 2 )
HOTSPOT -
You register the following versions of a model.
You use the Azure ML Python SDK to run a training experiment. You use a variable named run to
reference the experiment run.
After the run has been submitted and completed, you run the following code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where
Next Question
Question 129 ( Question Set 2 )
You are creating a classification model for a banking company to identify possible instances of
credit card fraud. You plan to create the model in Azure Machine
Learning by using automated machine learning.
The training dataset that you are using is highly unbalanced.
You need to evaluate the classification model.
Which primary metric should you use?
• A. normalized_mean_absolute_error
• B. AUC_weighted
• C. accuracy
• D. normalized_root_mean_squared_error
• E. spearman_correlation
Answer : B
Explanation:
AUC_weighted is a Classification metric.
Note: AUC is the Area under the Receiver Operating Characteristic Curve. Weighted is the
arithmetic mean of the score for each class, weighted by the number of true instances in each
class.
Incorrect Answers:
A: normalized_mean_absolute_error is a regression metric, not a classification metric.
C: When comparing approaches to imbalanced classification problems, consider using metrics
beyond accuracy such as recall, precision, and AUROC. It may be that switching the metric you
optimize for during parameter selection or model selection is enough to provide desirable
performance detecting the minority class.
D: normalized_root_mean_squared_error is a regression metric, not a classification metric.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml
Next Question
Question 130 ( Question Set 2 )
You create a machine learning model by using the Azure Machine Learning designer. You
publish the model as a real-time service on an Azure Kubernetes
Service (AKS) inference compute cluster. You make no changes to the deployed endpoint
configuration.
You need to provide application developers with the information they need to consume the
endpoint.
Which two values should you provide to application developers? Each correct answer presents
part of the solution.
NOTE: Each correct selection is worth one point.
Answer : CE
Explanation:
Deploying an Azure Machine Learning model as a web service creates a REST API endpoint. You
can send data to this endpoint and receive the prediction returned by the model.
You create a web service when you deploy a model to your local environment, Azure Container
Instances, Azure Kubernetes Service, or field-programmable gate arrays (FPGA). You retrieve the
URI used to access the web service by using the Azure Machine Learning SDK. If authentication
is enabled, you can also use the
SDK to get the authentication keys or tokens.
Example:
# URL for the web service
scoring_uri = '<your web service URI>'
# If the service is authenticated, set the key or token
key = '<your key or token>'
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-consume-web-service
Next Question
Page: 26 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Box 1: forcasting -
Task: The type of task to run. Values can be 'classification', 'regression', or 'forecasting'
depending on the type of automated ML problem to solve.
Box 2: temperature -
The training data to be used within the experiment. It should contain both training features and
a label column (optionally a sample weights column).
Box 3: observation_time -
time_column_name: The name of the time column. This parameter is required when forecasting
to specify the datetime column in the input data used for building the time series and inferring
its frequency. This setting is being deprecated. Please use forecasting_parameters instead.
Box 4: 7 -
"predicts temperature over the next seven days"
max_horizon: The desired maximum forecast horizon in units of time-series frequency. The
default value is 1.
Units are based on the time interval of your training data, e.g., monthly, weekly that the
forecaster should predict out. When task type is forecasting, this parameter is required.
Box 5: 50 -
"For the initial round of training, you want to train a maximum of 50 different models."
Iterations: The total number of different algorithm and parameter combinations to test during
an automated ML experiment.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-automl-
client/azureml.train.automl.automlconfig.automlconfig
Next Question
Question 132 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You create a model to forecast weather conditions based on historical data.
You need to create a pipeline that runs a processing script to load data from a datastore and
pass the processed data to a machine learning model training script.
Solution: Run the following code:
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
The two steps are present: process_step and train_step
Data_input correctly references the data in the data store.
Note:
Data used in pipeline can be produced by one step and consumed in another step by providing
a PipelineData object as an output of one step and an input of one or more subsequent steps.
PipelineData objects are also used when constructing Pipelines to describe step dependencies.
To specify that a step requires the output of another step as input, use a PipelineData object in
the constructor of both steps.
For example, the pipeline train step depends on the process_step_output output of the pipeline
process step: from azureml.pipeline.core import Pipeline, PipelineData from
azureml.pipeline.steps import PythonScriptStep datastore = ws.get_default_datastore()
process_step_output = PipelineData("processed_data", datastore=datastore) process_step =
PythonScriptStep(script_name="process.py", arguments=["--data_for_train",
process_step_output], outputs=[process_step_output], compute_target=aml_compute,
source_directory=process_directory) train_step = PythonScriptStep(script_name="train.py",
arguments=["--data_for_train", process_step_output], inputs=[process_step_output],
compute_target=aml_compute, source_directory=train_directory) pipeline =
Pipeline(workspace=ws, steps=[process_step, train_step])
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-
core/azureml.pipeline.core.pipelinedata?view=azure-ml-py
Next Question
Question 133 ( Question Set 2 )
You run an experiment that uses an AutoMLConfig class to define an automated machine
learning task with a maximum of ten model training iterations. The task will attempt to find the
best performing model based on a metric named accuracy.
You submit the experiment with the following code:
You need to create Python code that returns the best model that is generated by the automated
machine learning task.
Which code segment should you use?
• A. best_model = automl_run.get_details()
• B. best_model = automl_run.get_metrics()
• C. best_model = automl_run.get_file_names()[1]
• D. best_model = automl_run.get_output()[1]
Answer : D
Explanation:
The get_output method returns the best run and the fitted model.
Reference:
https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-
azureml/automated-machine-learning/classification/auto-ml- classification.ipynb
Next Question
Question 134 ( Question Set 2 )
You plan to use the Hyperdrive feature of Azure Machine Learning to determine the optimal
hyperparameter values when training a model.
You must use Hyperdrive to try combinations of the following hyperparameter values. You must
not apply an early termination policy.
✑ learning_rate: any value between 0.001 and 0.1
✑ batch_size: 16, 32, or 64
You need to configure the sampling method for the Hyperdrive experiment.
Which two sampling methods can you use? Each correct answer is a complete solution.
NOTE: Each correct selection is worth one point.
• A. No sampling
• B. Grid sampling
• C. Bayesian sampling
• D. Random sampling
Answer : CD
C: Bayesian sampling is based on the Bayesian optimization algorithm and makes intelligent
choices on the hyperparameter values to sample next. It picks the sample based on how the
previous samples performed, such that the new sample improves the reported primary metric.
Bayesian sampling does not support any early termination policy
Example:
from azureml.train.hyperdrive import BayesianParameterSampling from azureml.train.hyperdrive
import uniform, choice param_sampling = BayesianParameterSampling( {
"learning_rate": uniform(0.05, 0.1),
"batch_size": choice(16, 32, 64, 128)
}
)
D: In random sampling, hyperparameter values are randomly selected from the defined search
space. Random sampling allows the search space to include both discrete and continuous
hyperparameters.
Incorrect Answers:
B: Grid sampling can be used if your hyperparameter space can be defined as a choice among
discrete values and if you have sufficient budget to exhaustively search over all values in the
defined search space. Additionally, one can use automated early termination of poorly
performing runs, which reduces wastage of resources.
Example, the following space has a total of six samples:
from azureml.train.hyperdrive import GridParameterSampling
from azureml.train.hyperdrive import choice
param_sampling = GridParameterSampling( {
"num_hidden_layers": choice(1, 2, 3),
"batch_size": choice(16, 32)
}
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
Next Question
Question 135 ( Question Set 2 )
You are training machine learning models in Azure Machine Learning. You use Hyperdrive to
tune the hyperparameters.
In previous model training and tuning runs, many models showed similar performance.
You need to select an early termination policy that meets the following requirements:
✑ accounts for the performance of all previous runs when evaluating the current run avoids
comparing the current run with only the best performing run to date
Which two early termination policies should you use? Each correct answer presents part of the
solution.
NOTE: Each correct selection is worth one point.
• A. Median stopping
• B. Bandit
• C. Default
• D. Truncation selection
Answer : AC
Explanation:
The Median Stopping policy computes running averages across all runs and cancels runs whose
best performance is worse than the median of the running averages.
If no policy is specified, the hyperparameter tuning service will let all training runs execute to
completion.
Incorrect Answers:
B: BanditPolicy defines an early termination policy based on slack criteria, and a frequency and
delay interval for evaluation.
The Bandit policy takes the following configuration parameters: slack_factor: The amount of
slack allowed with respect to the best performing training run. This factor specifies the slack as a
ratio.
D: The Truncation selection policy periodically cancels the given percentage of runs that rank the
lowest for their performance on the primary metric. The policy strives for fairness in ranking the
runs by accounting for improving model performance with training time. When ranking a
relatively young run, the policy uses the corresponding (and earlier) performance of older runs
for comparison. Therefore, runs aren't terminated for having a lower performance because they
have run for less time than other runs.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-
core/azureml.train.hyperdrive.medianstoppingpolicy https://docs.microsoft.com/en-
us/python/api/azureml-train-core/azureml.train.hyperdrive.truncationselectionpolicy
https://docs.microsoft.com/en-us/python/api/azureml-train-
core/azureml.train.hyperdrive.banditpolicy
Next Question
Page: 27 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Box 1: Tabular -
Box 2: Text -
Box 3: Image -
Incorrect Answers:
Hierarchical Attention Network (HAN)
HAN was proposed by Yang et al. in 2016. Key features of HAN that differentiates itself from
existing approaches to document classification are (1) it exploits the hierarchical nature of text
data and (2) attention mechanism is adapted for document classification.
Reference:
https://medium.com/microsoftazure/automated-and-interpretable-machine-learning-
d07975741298
Next Question
Question 137 ( Question Set 2 )
HOTSPOT -
You have a dataset that includes home sales data for a city. The dataset includes the following
columns.
Explanation:
Box 1: Regression -
Regression is a supervised machine learning technique used to predict numeric values.
Box 2: Price -
Reference:
https://docs.microsoft.com/en-us/learn/modules/create-regression-model-azure-machine-
learning-designer
Next Question
Question 138 ( Question Set 2 )
You use the Azure Machine Learning SDK in a notebook to run an experiment using a script file
in an experiment folder.
The experiment fails.
You need to troubleshoot the failed experiment.
What are two possible ways to achieve this goal? Each correct answer presents a complete
solution.
• A. Use the get_metrics() method of the run object to retrieve the experiment run logs.
• B. Use the get_details_with_logs() method of the run object to display the experiment
run logs.
• C. View the log files for the experiment run in the experiment folder.
• D. View the logs for the experiment run in Azure Machine Learning studio.
• E. Use the get_output() method of the run object to retrieve the experiment run logs.
Answer : BD
Explanation:
Use get_details_with_logs() to fetch the run details and logs created by the run.
You can monitor Azure Machine Learning runs and view their logs with the Azure Machine
Learning studio.
Incorrect Answers:
A: You can view the metrics of a trained model using run.get_metrics().
E: get_output() gets the output of the step as PipelineData.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-
core/azureml.pipeline.core.steprun https://docs.microsoft.com/en-us/azure/machine-
learning/how-to-monitor-view-training-logs
Next Question
Question 139 ( Question Set 2 )
DRAG DROP -
You have an Azure Machine Learning workspace that contains a CPU-based compute cluster and
an Azure Kubernetes Service (AKS) inference cluster. You create a tabular dataset containing
data that you plan to use to create a classification model.
You need to use the Azure Machine Learning designer to create a web service through which
client applications can consume the classification model by submitting new data and getting an
immediate prediction as a response.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Create and start a Compute Instance
To train and deploy models using Azure Machine Learning designer, you need compute on
which to run the training process, test the model, and host the model in a deployed service.
There are four kinds of compute resource you can create:
Compute Instances: Development workstations that data scientists can use to work with data
and models.
Compute Clusters: Scalable clusters of virtual machines for on-demand processing of
experiment code.
Inference Clusters: Deployment targets for predictive services that use your trained models.
Attached Compute: Links to existing Azure compute resources, such as Virtual Machines or
Azure Databricks clusters.
Step 2: Create and run a training pipeline..
After you've used data transformations to prepare the data, you can use it to train a machine
learning model. Create and run a training pipeline
Step 3: Create and run a real-time inference pipeline
After creating and running a pipeline to train the model, you need a second pipeline that
performs the same data transformations for new data, and then uses the trained model to
inference (in other words, predict) label values based on its features. This pipeline will form the
basis for a predictive service that you can publish for applications to use.
Reference:
https://docs.microsoft.com/en-us/learn/modules/create-classification-model-azure-machine-
learning-designer/
Next Question
Question 140 ( Question Set 2 )
You use the Two-Class Neural Network module in Azure Machine Learning Studio to build a
binary classification model. You use the Tune Model
Hyperparameters module to tune accuracy for the model.
You need to configure the Tune Model Hyperparameters module.
Which two values should you use? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
Answer : DE
Explanation:
D: For Number of learning iterations, specify the maximum number of times the algorithm
should process the training cases.
E: For Hidden layer specification, select the type of network architecture to create.
Between the input and output layers you can insert multiple hidden layers. Most predictive tasks
can be accomplished easily with only one or a few hidden layers.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-
neural-network
Next Question
Page: 28 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Explanation:
Next Question
Question 142 ( Question Set 2 )
You create a binary classification model by using Azure Machine Learning Studio.
You must tune hyperparameters by performing a parameter sweep of the model. The parameter
sweep must meet the following requirements:
✑ iterate all possible combinations of hyperparameters
✑ minimize computing resources required to perform the sweep
You need to perform a parameter sweep of the model.
Which parameter sweep mode should you use?
• A. Random sweep
• B. Sweep clustering
• C. Entire grid
• D. Random grid
Answer : D
Explanation:
Maximum number of runs on random grid: This option also controls the number of iterations
over a random sampling of parameter values, but the values are not generated randomly from
the specified range; instead, a matrix is created of all possible combinations of parameter values
and a random sampling is taken over the matrix. This method is more efficient and less prone to
regional oversampling or undersampling.
If you are training a model that supports an integrated parameter sweep, you can also set a
range of seed values to use and iterate over the random seeds as well. This is optional, but can
be useful for avoiding bias introduced by seed selection.
Incorrect Answers:
B: If you are building a clustering model, use Sweep Clustering to automatically determine the
optimum number of clusters and other parameters.
C: Entire grid: When you select this option, the module loops over a grid predefined by the
system, to try different combinations and identify the best learner. This option is useful for cases
where you don't know what the best parameter settings might be and want to try all possible
combination of values.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-
model-hyperparameters
Next Question
Question 143 ( Question Set 2 )
You are building a recurrent neural network to perform a binary classification.
You review the training loss, validation loss, training accuracy, and validation accuracy for each
training epoch.
You need to analyze model performance.
You need to identify whether the classification model is overfitted.
Which of the following is correct?
• A. The training loss stays constant and the validation loss stays on a constant value and
close to the training loss value when training the model.
• B. The training loss decreases while the validation loss increases when training the
model.
• C. The training loss stays constant and the validation loss decreases when training the
model.
• D. The training loss increases while the validation loss decreases when training the
model.
Answer : B
Explanation:
An overfit model is one where performance on the train set is good and continues to improve,
whereas performance on the validation set improves to a point and then begins to degrade.
Reference:
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
Next Question
Question 144 ( Question Set 2 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have a Python script named train.py in a local folder named scripts. The script trains a
regression model by using scikit-learn. The script includes code to load a training data file which
is also located in the scripts folder.
You must run the script as an Azure ML experiment on a compute cluster named aml-compute.
You need to configure the run to ensure that the environment includes the required packages
for model training. You have instantiated a variable named aml- compute that references the
target compute cluster.
Solution: Run the following code:
• A. Yes
• B. No
Answer : B
Explanation:
There is a missing line: conda_packages=['scikit-learn'], which is needed.
Correct example:
sk_est = Estimator(source_directory='./my-sklearn-proj',
script_params=script_params,
compute_target=compute_target,
entry_script='train.py',
conda_packages=['scikit-learn'])
Note:
The Estimator class represents a generic estimator to train data using any supplied framework.
This class is designed for use with machine learning frameworks that do not already have an
Azure Machine Learning pre-configured estimator. Pre-configured estimators exist for Chainer,
PyTorch, TensorFlow, and SKLearn.
Example:
from azureml.train.estimator import Estimator
script_params = {
# to mount files referenced by mnist dataset
'--data-folder': ds.as_named_input('mnist').as_mount(),
'--regularization': 0.8
}
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-
core/azureml.train.estimator.estimator
Next Question
Question 145 ( Question Set 2 )
You are performing clustering by using the K-means algorithm.
You need to define the possible termination conditions.
Which three conditions can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
Answer : ACD
Explanation:
AD: The algorithm terminates when the centroids stabilize or when a specified number of
iterations are completed.
C: A measure of how well the centroids represent the members of their clusters is the residual
sum of squares or RSS, the squared distance of each vector from its centroid summed over all
vectors. RSS is the objective function and our goal is to minimize it.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/k-means-
clustering https://nlp.stanford.edu/IR-book/html/htmledition/k-means-1.html
Next Question
Page: 29 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
Box 1: Automatically adjust weights inversely proportional to class frequencies in the input data
The ג€balancedג€ mode uses the values of y to automatically adjust weights inversely
proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
Next Question
Question 147 ( Question Set 2 )
You are building a machine learning model for translating English language textual content into
French language textual content.
You need to build and train the machine learning model to learn the sequence of the textual
content.
Which type of neural network should you use?
Answer : C
Explanation:
To translate a corpus of English text to French, we need to build a recurrent neural network
(RNN).
Note: RNNs are designed to take sequences of text as inputs or return sequences of text as
outputs, or both. They're called recurrent because the network's hidden layers have a loop in
which the output and cell state from each time step become inputs at the next time step. This
recurrence serves as a form of memory.
It allows contextual information to flow through the network so that relevant outputs from
previous time steps can be applied to network operations at the current time step.
Reference:
https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571
Next Question
Question 148 ( Question Set 2 )
You create a binary classification model.
You need to evaluate the model performance.
Which two metrics can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
Answer : BC
Explanation:
The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall,
F1 Score, and AUC.
Note: A very natural question is: 'Out of the individuals whom the model, how many were
classified correctly (TP)?'
This question can be answered by looking at the Precision of the model, which is the proportion
of positives that are classified correctly.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance
Next Question
Question 149 ( Question Set 2 )
You create a script that trains a convolutional neural network model over multiple epochs and
logs the validation loss after each epoch. The script includes arguments for batch size and
learning rate.
You identify a set of batch size and learning rate values that you want to try.
You need to use Azure Machine Learning to find the combination of batch size and learning rate
that results in the model with the lowest validation loss.
What should you do?
Answer : E
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
Next Question
Question 150 ( Question Set 2 )
You use the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
Answer : D
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-trigger-published-pipeline
Next Question
Page: 30 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. to_pandas_dataframe()
• B. as_download()
• C. as_upload()
• D. as_mount()
Answer : B
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-
core/azureml.data.filedataset?view=azure-ml-py
Next Question
Question 152 ( Question Set 2 )
You create a Python script that runs a training experiment in Azure Machine Learning. The script
uses the Azure Machine Learning SDK for Python.
You must add a statement that retrieves the names of the logs and outputs generated by the
script.
You need to reference a Python class object from the SDK for the statement.
Which class object should you use?
• A. Run
• B. ScriptRunConfig
• C. Workspace
• D. Experiment
Answer : A
Explanation:
A run represents a single trial of an experiment. Runs are used to monitor the asynchronous
execution of a trial, log metrics and store output of the trial, and to analyze results and access
artifacts generated by the trial.
The run Class get_all_logs method downloads all logs for the run to a directory.
Incorrect Answers:
A: A run represents a single trial of an experiment. Runs are used to monitor the asynchronous
execution of a trial, log metrics and store output of the trial, and to analyze results and access
artifacts generated by the trial.
B: A ScriptRunConfig packages together the configuration information needed to submit a run
in Azure ML, including the script, compute target, environment, and any distributed job-specific
configs.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)
Next Question
Question 153 ( Question Set 2 )
You run a script as an experiment in Azure Machine Learning.
You have a Run object named run that references the experiment run. You must review the log
files that were generated during the experiment run.
You need to download the log files to a local folder for review.
Which two code segments can you run to achieve this goal? Each correct answer presents a
complete solution.
NOTE: Each correct selection is worth one point.
• A. run.get_details()
• B. run.get_file_names()
• C. run.get_metrics()
• D. run.download_files(output_directory='./runfiles')
• E. run.get_all_logs(destination='./runlogs')
Answer : AE
Explanation:
The run Class get_all_logs method downloads all logs for the run to a directory.
The run Class get_details gets the definition, status information, current log files, and other
details of the run.
Incorrect Answers:
B: The run get_file_names list the files that are stored in association with the run.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)
Next Question
Question 154 ( Question Set 2 )
You have the following code. The code prepares an experiment to run a script:
The experiment must be run on local computer using the default environment.
You need to add code to start the experiment and run the script.
Which code segment should you use?
• A. run = script_experiment.start_logging()
• B. run = Run(experiment=script_experiment)
• C. ws.get_run(run_id=experiment.id)
• D. run = script_experiment.submit(config=script_config)
Answer : D
Explanation:
The experiment class submit method submits an experiment and return the active created run.
Syntax: submit(config, tags=None, **kwargs)
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment
Next Question
Question 155 ( Question Set 2 )
You use the following code to define the steps for a pipeline: from azureml.core import
Workspace, Experiment, Run from azureml.pipeline.core import Pipeline from
azureml.pipeline.steps import PythonScriptStep ws = Workspace.from_config()
...
step1 = PythonScriptStep(name="step1", ...)
step2 = PythonScriptsStep(name="step2", ...)
pipeline_steps = [step1, step2]
You need to add code to run the steps.
Which two code segments can you use to achieve this goal? Each correct answer presents a
complete solution.
NOTE: Each correct selection is worth one point.
Answer : CD
Explanation:
After you define your steps, you build the pipeline by using some or all of those steps.
# Build the pipeline. Example:
pipeline1 = Pipeline(workspace=ws, steps=[compare_models])
# Submit the pipeline to be run
pipeline_run1 = Experiment(ws, 'Compare_Models_Exp').submit(pipeline1)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-machine-learning-
pipelines
Next Question
Page: 31 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Designing and Implementing a Data Science Solution
on Azure (beta) v1.0 (DP-100)
Page: 32 / 56
Total 282 questions
Answer :
Explanation:
Box 1: No -
The Workspace.get method loads an existing workspace without using configuration files. ws =
Workspace.get(name="myworkspace", subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup')
Box 2: Yes -
MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts
from your local runs into your Azure Machine Learning workspace.
The get_mlflow_tracking_uri() method assigns a unique tracking URI address to the workspace,
ws, and set_tracking_uri() points the MLflow tracking URI to that address.
Box 3: Yes -
Note: In Deep Learning, epoch means the total dataset is passed forward and backward in a
neural network once.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow
Next Question
Question 157 ( Question Set 2 )
You create and register a model in an Azure Machine Learning workspace.
You must use the Azure Machine Learning SDK to implement a batch inference pipeline that
uses a ParallelRunStep to score input data using the model. You must specify a value for the
ParallelRunConfig compute_target setting of the pipeline step.
You need to create the compute target.
Which class should you use?
• A. BatchCompute
• B. AdlaCompute
• C. AmlCompute
• D. AksCompute
Answer : C
Explanation:
Compute target to use for ParallelRunStep. This parameter may be specified as a compute target
object or the string name of a compute target in the workspace.
The compute_target target is of AmlCompute or string.
Note: An Azure Machine Learning Compute (AmlCompute) is a managed-compute infrastructure
that allows you to easily create a single or multi-node compute.
The compute is created within your workspace region as a resource that can be shared with
other users
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-
steps/azureml.contrib.pipeline.steps.parallelrunconfig https://docs.microsoft.com/en-
us/python/api/azureml-core/azureml.core.compute.amlcompute(class)
Next Question
Question 158 ( Question Set 2 )
DRAG DROP -
You previously deployed a model that was trained using a tabular dataset named training-
dataset, which is based on a folder of CSV files.
Over time, you have collected the features and predicted labels generated by the model in a
folder containing a CSV file for each month. You have created two tabular datasets based on the
folder containing the inference data: one named predictions-dataset with a schema that
matches the training data exactly, including the predicted label; and another named features-
dataset with a schema containing all of the feature columns and a timestamp column based on
the filename, which includes the day, month, and year.
You need to create a data drift monitor to identify any changing trends in the feature data since
the model was trained. To accomplish this, you must define the required datasets for the data
drift monitor.
Which datasets should you use to configure the data drift monitor? To answer, drag the
appropriate datasets to the correct data drift monitor options. Each source may be used once,
more than once, or not at all. You may need to drag the split bar between panes or scroll to
view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Box 1: training-dataset -
Baseline dataset - usually the training dataset for a model.
Box 2: predictions-dataset -
Target dataset - usually model input data - is compared over time to your baseline dataset. This
comparison means that your target dataset must have a timestamp column specified.
The monitor will compare the baseline and target datasets.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets
Next Question
Question 159 ( Question Set 2 )
You plan to run a Python script as an Azure Machine Learning experiment.
The script contains the following code:
You must specify a file dataset as an input to the script. The dataset consists of multiple large
image files and must be streamed directly from its source.
You need to write code to define a ScriptRunConfig object for the experiment and pass the ds
dataset as an argument.
Which code segment should you use?
Answer : A
Explanation:
If you have structured data not yet registered as a dataset, create a TabularDataset and use it
directly in your training script for your local or remote experiment.
To load the TabularDataset to pandas DataFrame
df = dataset.to_pandas_dataframe()
Note: TabularDataset represents data in a tabular format created by parsing the provided file or
list of files.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets
Next Question
Question 160 ( Testlet 1 )
Case study -
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
You need to implement a scaling strategy for the local penalty detection data.
Which normalization type should you use?
• A. Streaming
• B. Weight
• C. Batch
• D. Cosine
Answer : C
Explanation:
Post batch normalization statistics (PBN) is the Microsoft Cognitive Toolkit (CNTK) version of
how to evaluate the population mean and variance of Batch
Normalization which could be used in inference Original Paper.
In CNTK, custom networks are defined using the BrainScriptNetworkBuilder and described in the
CNTK network description language "BrainScript."
Scenario:
Local penalty detection models must be written by using BrainScript.
Reference:
https://docs.microsoft.com/en-us/cognitive-toolkit/post-batch-normalization-statistics
Next Question
Page: 32 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
HOTSPOT -
You need to use the Python language to build a sampling strategy for the global penalty
detection models.
How should you complete the code segment? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: import pytorch as deeplearninglib
Box 2: ..DistributedSampler(Sampler)..
DistributedSampler(Sampler):
Sampler that restricts data loading to a subset of the dataset.
It is especially useful in conjunction with class:`torch.nn.parallel.DistributedDataParallel`. In such
case, each process can pass a DistributedSampler instance as a
DataLoader sampler, and load a subset of the original dataset that is exclusive to it.
Scenario: Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Box 3: optimizer = deeplearninglib.train. GradientDescentOptimizer(learning_rate=0.10)
Incorrect Answers: ..SGD..
Scenario: All penalty detection models show inference phases using a Stochastic Gradient
Descent (SGD) are running too slow.
Box 4: .. nn.parallel.DistributedDataParallel..
DistributedSampler(Sampler): The sampler that restricts data loading to a subset of the dataset.
It is especially useful in conjunction with :class:`torch.nn.parallel.DistributedDataParallel`.
Reference:
https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py
Next Question
Question 162 ( Testlet 1 )
Case study -
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
DRAG DROP -
You need to define an evaluation strategy for the crowd sentiment models.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Scenario:
Experiments for local crowd sentiment models must combine local penalty detection data.
Crowd sentiment models must identify known sounds such as cheers and known catch phrases.
Individual crowd sentiment models will detect similar sounds.
Note: Evaluate the changed in correlation between model error rate and centroid distance
In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification
model that assigns to observations the label of the class of training samples whose mean
(centroid) is closest to the observation.
Reference:
https://en.wikipedia.org/wiki/Nearest_centroid_classifier
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/sweep-
clustering
Next Question
Question 163 ( Testlet 1 )
Case study -
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
You need to implement a feature engineering strategy for the crowd sentiment local models.
What should you do?
Answer : D
Explanation:
The linear discriminant analysis method works only on continuous variables, not categorical or
ordinal variables.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by
comparing the means of the variables.
Scenario:
Data scientists must build notebooks in a local environment using automatic feature
engineering and model building in machine learning pipelines.
Experiments for local crowd sentiment models must combine local penalty detection data.
All shared features for local models are continuous variables.
Incorrect Answers:
B: The Pearson correlation coefficient, sometimes called Pearson's R test, is a statistical value
that measures the linear relationship between two variables. By examining the coefficient values,
you can infer something about the strength of the relationship between the two variables, and
whether they are positively correlated or negatively correlated.
C: Spearman's correlation coefficient is designed for use with non-parametric and non-normally
distributed data. Spearman's coefficient is a nonparametric measure of statistical dependence
between two variables, and is sometimes denoted by the Greek letter rho. The Spearman's
coefficient expresses the degree to which two variables are monotonically related. It is also
called Spearman rank correlation, because it can be used with ordinal variables.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/fisher-
linear-discriminant-analysis https://docs.microsoft.com/en-us/azure/machine-learning/studio-
module-reference/compute-linear-correlation
Next Question
Question 164 ( Testlet 1 )
Case study -
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
DRAG DROP -
You need to define a modeling strategy for ad response.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Implement a K-Means Clustering model
Step 2: Use the cluster as a feature in a Decision jungle model.
Decision jungles are non-parametric models, which can represent non-linear decision
boundaries.
Step 3: Use the raw score as a feature in a Score Matchbox Recommender model
The goal of creating a recommendation system is to recommend one or more "items" to "users"
of the system. Examples of an item could be a movie, restaurant, book, or song. A user could be
a person, group of persons, or other entity with item preferences.
Scenario:
Ad response rated declined.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Ad response models must support non-linear boundaries of features.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/multiclass-
decision-jungle https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/score-matchbox-recommender
Next Question
Question 165 ( Testlet 1 )
Case study -
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
DRAG DROP -
You need to define an evaluation strategy for the crowd sentiment models.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Define a cross-entropy function activation
When using a neural network to perform classification and prediction, it is usually better to use
cross-entropy error than classification error, and somewhat better to use cross-entropy error
than mean squared error to evaluate the quality of the neural network.
Step 2: Add cost functions for each target state.
Step 3: Evaluated the distance error metric.
Reference:
https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-
techniques/
Next Question
Page: 33 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
You need to implement a model development strategy to determine a user's tendency to
respond to an ad.
Which technique should you use?
• A. Use a Relative Expression Split module to partition the data based on centroid
distance.
• B. Use a Relative Expression Split module to partition the data based on distance
travelled to the event.
• C. Use a Split Rows module to partition the data based on distance travelled to the
event.
• D. Use a Split Rows module to partition the data based on centroid distance.
Answer : A
Explanation:
Split Data partitions the rows of a dataset into two distinct sets.
The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio
is helpful when you need to divide a dataset into training and testing datasets using a numerical
expression.
Relative Expression Split: Use this option whenever you want to apply a condition to a number
column. The number could be a date/time field, a column containing age or dollar amounts, or
even a percentage. For example, you might want to divide your data set depending on the cost
of the items, group people by age ranges, or separate data by a calendar date.
Scenario:
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
The distribution of features across training and production data are not consistent
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data
Next Question
Question 167 ( Testlet 1 )
Case study -
Overview -
You are a data scientist in a company that provides data science for professional sporting
events. Models will use global and local market data to meet the following business goals:
Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.
Assess a user's tendency to respond to an advertisement.
Customize styles of ads served on mobile devices.
Use video to detect penalty events
Current environment -
Media used for penalty event detection will be provided by consumer devices. Media may
include images and videos captured during the sporting event and shared using social media.
The images and videos will have varying sizes and formats.
The data available for model building comprises of seven years of sporting event media. The
sporting event media includes; recorded video transcripts or radio commentary, and logs from
related social media feeds captured during the sporting events.
Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo formats.
Advertisements -
During the initial weeks in production, the following was observed:
Ad response rated declined.
Drops were not consistent across ad styles.
The distribution of features across training and production data are not consistent
Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features
that come from location sources are being used as raw features. A suggested experiment to
remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.
Initial data discovery shows a wide range of densities of target states in training data used for
crowd sentiment models.
All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD)
are running too slow.
Audio samples show that the length of a catch phrase varies between 25%-47% depending on
region
The performance of the global penalty detection models shows lower variance but higher bias
when comparing training and validation sets. Before implementing any feature changes, you
must confirm the bias and variance using all training and validation cases.
Ad response models must be trained at the beginning of each event and applied during the
sporting event.
Market segmentation models must optimize for similar ad response history.
Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.
Local market segmentation models will be applied before determining a user's propensity to
respond to an advertisement.
Ad response models must support non-linear boundaries of features.
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
The ad propensity model uses cost factors shown in the following diagram:
The ad propensity model uses proposed cost factors shown in the following diagram:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
You need to implement a new cost factor scenario for the ad response models as illustrated in
the performance curve exhibit.
Which technique should you use?
• A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.
• B. Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.
• C. Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.
• D. Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15.
Answer : A
Explanation:
Scenario:
Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviated from 0.1 +/- 5%.
Next Question
Question 168 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
HOTSPOT -
You need to replace the missing data in the AccessibilityToHighway columns.
How should you configure the Clean Missing Data module? To answer, select the appropriate
options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 2: Propagate -
Cols with all missing values indicate if columns of all missing values should be preserved in the
output.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 169 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
DRAG DROP -
You need to produce a visualization for the diagnostic test evaluation according to the data
visualization requirements.
Which three modules should you recommend be used in sequence? To answer, move the
appropriate modules from the list of modules to the answer area and arrange them in the
correct order.
Select and Place:
Answer :
Explanation:
Next Question
Question 170 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
You need to visually identify whether outliers exist in the Age column and quantify the outliers
before the outliers are removed.
Which three Azure Machine Learning Studio modules should you use? Each correct answer
presents part of the solution.
NOTE: Each correct selection is worth one point.
• A. Create Scatterplot
• B. Summarize Data
• C. Clip Values
• D. Replace Discrete Values
• E. Build Counting Transform
Answer : ABC
Explanation:
B: To have a global view, the summarize data module can be used. Add the module and connect
it to the data set that needs to be visualized.
A: One way to quickly identify Outliers visually is to create scatter plots.
C: The easiest way to treat the outliers in Azure ML is to use the Clip Values module. It can
identify and optionally replace data values that are above or below a specified threshold.
You can use the Clip Values module in Azure Machine Learning Studio, to identify and optionally
replace data values that are above or below a specified threshold. This is useful when you want
to remove outliers or replace them with a mean, a constant, or other substitute value.
Reference:
https://blogs.msdn.microsoft.com/azuredev/2017/05/27/data-cleansing-tools-in-azure-
machine-learning/ https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/clip-values
Next Question
Page: 34 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
HOTSPOT -
You need to identify the methods for dividing the data according to the testing requirements.
Which properties should you select? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Scenario: Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Next Question
Question 172 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
HOTSPOT -
You need to configure the Edit Metadata module so that the structure of the datasets match.
Which configuration options should you select? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Next Question
Question 173 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
HOTSPOT -
You need to configure the Permutation Feature Importance module for the model training
requirements.
What should you do? To answer, select the appropriate options in the dialog box in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: 500 -
For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a
number is generated based on the system clock.
A seed value is optional, but you should provide a value if you want reproducibility across runs
of the same experiment.
Here we must replicate the findings.
Coefficient of Determination -
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/permutation-feature-importance
Next Question
Question 174 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
• A. Mutual information
• B. Pearson's correlation
• C. Spearman correlation
• D. Fisher Linear Discriminant Analysis
Answer : C
Explanation:
Spearman's rank correlation coefficient assesses how well the relationship between two variables
can be described using a monotonic function.
Note: Both Spearman's and Kendall's can be formulated as special cases of a more general
correlation coefficient, and they are both appropriate in this scenario.
Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format.
You need to select a feature selection algorithm to analyze the relationship between the two
columns in more detail.
Incorrect Answers:
B: The Spearman correlation between two variables is equal to the Pearson correlation between
the rank values of those two variables; while Pearson's correlation assesses linear relationships,
Spearman's correlation assesses monotonic relationships (whether linear or not).
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-
selection-modules
Next Question
Question 175 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
HOTSPOT -
You need to set up the Permutation Feature Importance module according to the model
training requirements.
Which properties should you select? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Accuracy -
Scenario: You want to configure hyperparameters in the model learning process to speed the
learning phase by using hyperparameters. In addition, this configuration should cancel the
lowest performing runs at each evaluation interval, thereby directing effort and resources
towards models that are more likely to be successful.
Box 2: R-Squared -
Next Question
Page: 35 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
HOTSPOT -
You need to configure the Feature Based Feature Selection module based on the experiment
requirements and datasets.
How should you configure the module properties? To answer, select the appropriate options in
the dialog box in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Mutual Information.
The mutual information score is particularly useful in feature selection because it maximizes the
mutual information between the joint distribution and target variables in datasets with many
dimensions.
Box 2: MedianValue -
MedianValue is the feature column, , it is the predictor of the dataset.
Scenario: The MedianValue and AvgRoomsinHouse columns both hold data in numeric format.
You need to select a feature selection algorithm to analyze the relationship between the two
columns in more detail.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/filter-
based-feature-selection
Next Question
Question 177 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
• A. Mutual information
• B. Mood's median test
• C. Kendall correlation
• D. Permutation Feature Importance
Answer : C
Explanation:
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau
coefficient (after the Greek letter „), is a statistic used to measure the ordinal association
between two measured quantities.
It is a supported method of the Azure Machine Learning Feature selection.
Note: Both Spearman's and Kendall's can be formulated as special cases of a more general
correlation coefficient, and they are both appropriate in this scenario.
Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format.
You need to select a feature selection algorithm to analyze the relationship between the two
columns in more detail.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-
selection-modules
Next Question
Question 178 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
DRAG DROP -
You need to implement an early stopping criteria policy for model training.
Which three code segments should you use to develop the solution? To answer, move the
appropriate code segments from the list of code segments to the answer area and arrange them
in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the
correct orders you select.
Select and Place:
Answer :
Explanation:
You need to implement an early stopping criterion on models that provides savings without
terminating promising jobs.
Truncation selection cancels a given percentage of lowest performing runs at each evaluation
interval. Runs are compared based on their performance on the primary metric and the lowest
X% are terminated.
Example:
from azureml.train.hyperdrive import TruncationSelectionPolicy early_termination_policy =
TruncationSelectionPolicy(evaluation_interval=1, truncation_percentage=20, delay_evaluation=5)
Incorrect Answers:
Bandit is a termination policy based on slack factor/slack amount and evaluation interval. The
policy early terminates any runs where the primary metric is not within the specified slack factor
/ slack amount with respect to the best performing training run.
Example:
from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1,
delay_evaluation=5
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-
hyperparameters
Next Question
Question 179 ( Testlet 2 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
DRAG DROP -
You need to implement early stopping criteria as stated in the model training requirements.
Which three code segments should you use to develop the solution? To answer, move the
appropriate code segments from the list of code segments to the answer area and arrange them
in the correct order.
NOTE: More than one order of answer choices is correct. You will receive the credit for any of
the correct orders you select.
Select and Place:
Answer :
Explanation:
Step 1: from azureml.train.hyperdrive
Step 2: Import TruncationCelectionPolicy
Truncation selection cancels a given percentage of lowest performing runs at each evaluation
interval. Runs are compared based on their performance on the primary metric and the lowest
X% are terminated.
Scenario: You must configure hyperparameters in the model learning process to speed the
learning phase. In addition, this configuration should cancel the lowest performing runs at each
evaluation interval, thereby directing effort and resources towards models that are more likely to
be successful.
Step 3: early_terminiation_policy = TruncationSelectionPolicy..
Example:
from azureml.train.hyperdrive import TruncationSelectionPolicy early_termination_policy =
TruncationSelectionPolicy(evaluation_interval=1, truncation_percentage=20, delay_evaluation=5)
In this example, the early termination policy is applied at every interval starting at evaluation
interval 5. A run will be terminated at interval 5 if its performance at interval 5 is in the lowest
20% of performance of all runs at interval 5.
Incorrect Answers:
Median:
Median stopping is an early termination policy based on running averages of primary metrics
reported by the runs. This policy computes running averages across all training runs and
terminates runs whose performance is worse than the median of the running averages.
Slack:
Bandit is a termination policy based on slack factor/slack amount and evaluation interval. The
policy early terminates any runs where the primary metric is not within the specified slack factor
/ slack amount with respect to the best performing training run.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-
hyperparameters
Next Question
Question 180 ( Question Set 3 )
HOTSPOT -
You are a lead data scientist for a project that tracks the health and migration of birds. You
create a multi-image classification deep learning model that uses a set of labeled bird photos
collected by experts. You plan to use the model to develop a cross-platform mobile app that
predicts the species of bird captured by app users.
You must test and deploy the trained model as a web service. The deployed model must meet
the following requirements:
✑ An authenticated connection must not be required for testing.
✑ The deployed model must perform with low latency during inferencing.
✑ The REST endpoints must be scalable and should have a capacity to handle large number of
requests when multiple end users are using the mobile application.
You need to verify that the web service returns predictions in the expected JSON format when a
valid REST request is submitted.
Which compute resources should you use? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: ds-workstation notebook VM
An authenticated connection must not be required for testing.
On a Microsoft Azure virtual machine (VM), including a Data Science Virtual Machine (DSVM),
you create local user accounts while provisioning the VM. Users then authenticate to the VM by
using these credentials.
Next Question
Page: 36 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : B
Explanation:
You can use Azure Machine Learning to deploy a GPU-enabled model as a web service.
Deploying a model on Azure Kubernetes Service (AKS) is one option.
The AKS cluster provides a GPU resource that is used by the model for inference.
Inference, or model scoring, is the phase where the deployed model is used to make predictions.
Using GPUs instead of CPUs offers performance advantages on highly parallelizable
computation.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus
Next Question
Question 182 ( Question Set 3 )
You create a batch inference pipeline by using the Azure ML SDK. You run the pipeline by using
the following code: from azureml.pipeline.core import Pipeline from azureml.core.experiment
import Experiment pipeline = Pipeline(workspace=ws, steps=[parallelrun_step]) pipeline_run =
Experiment(ws, 'batch_pipeline').submit(pipeline)
You need to monitor the progress of the pipeline execution.
What are two possible ways to achieve this goal? Each correct answer presents a complete
solution.
NOTE: Each correct selection is worth one point.
• E. Run the following code and monitor the console output from the PipelineRun object:
Answer : DE
Explanation:
A batch inference job can take a long time to finish. This example monitors progress by using a
Jupyter widget. You can also manage the job's progress by using:
✑ Azure Machine Learning Studio.
✑ Console output from the PipelineRun object.
from azureml.widgets import RunDetails
RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion(show_output=True)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-parallel-run-
step#monitor-the-parallel-run-job
Next Question
Question 183 ( Question Set 3 )
You train and register a model in your Azure Machine Learning workspace.
You must publish a pipeline that enables client applications to use the model for batch
inferencing. You must use a pipeline with a single ParallelRunStep step that runs a Python
inferencing script to get predictions from the input data.
You need to create the inferencing script for the ParallelRunStep pipeline step.
Which two functions should you include? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
• A. run(mini_batch)
• B. main()
• C. batch()
• D. init()
• E. score(mini_batch)
Answer : AD
Reference:
https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-
azureml/machine-learning-pipelines/parallel-run
Next Question
Question 184 ( Question Set 3 )
You deploy a model as an Azure Machine Learning real-time web service using the following
code.
• A. service.get_logs()
• B. service.state
• C. service.serialize()
• D. service.update_deployment_state()
Answer : A
Explanation:
You can print out detailed Docker engine log messages from the service object. You can view
the log for ACI, AKS, and Local deployments. The following example demonstrates how to print
the logs.
# if you already have the service object handy
print(service.get_logs())
# if you only know the name of the service (note there might be multiple services with the same
name but different version number) print(ws.webservices['mysvc'].get_logs())
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment
Next Question
Question 185 ( Question Set 3 )
HOTSPOT -
You deploy a model in Azure Container Instance.
You must use the Azure Machine Learning SDK to call the model API.
You need to invoke the deployed model using native SDK classes and methods.
How should you complete the command? To answer, select the appropriate options in the
answer areas.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: from azureml.core.webservice import Webservice
The following code shows how to use the SDK to update the model, environment, and entry
script for a web service to Azure Container Instances: from azureml.core import Environment
from azureml.core.webservice import Webservice from azureml.core.model import Model,
InferenceConfig
Box 2: predictions = service.run(input_json)
Example: The following code demonstrates sending data to the service: import json test_sample
= json.dumps({'data': [
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
]})
test_sample = bytes(test_sample, encoding='utf8')
prediction = service.run(input_data=test_sample)
print(prediction)
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/how-to-deploy-azure-container-
instance https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-
deployment
Next Question
Page: 37 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Save the model locally as a.pt file, and deploy the model as a local web service.
• B. Deploy the model on computer that is configured to use the default Azure Machine
Learning conda environment.
• C. Register the model with a .pt file extension and the default version property.
• D. Register the model, specifying the model_framework and model_framework_version
properties.
Answer : D
Explanation:
framework_version: The PyTorch version to be used for executing training code.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-
core/azureml.train.dnn.pytorch?view=azure-ml-py
Next Question
Question 187 ( Question Set 3 )
You train a machine learning model.
You must deploy the model as a real-time inference service for testing. The service requires low
CPU utilization and less than 48 MB of RAM. The compute target for the deployed service must
initialize automatically while minimizing cost and administrative overhead.
Which compute target should you use?
Answer : A
Explanation:
Azure Container Instances (ACI) are suitable only for small models less than 1 GB in size.
Use it for low-scale CPU-based workloads that require less than 48 GB of RAM.
Note: Microsoft recommends using single-node Azure Kubernetes Service (AKS) clusters for
dev-test of larger models.
Reference:
https://docs.microsoft.com/id-id/azure/machine-learning/how-to-deploy-and-where
Next Question
Question 188 ( Question Set 3 )
You register a model that you plan to use in a batch inference pipeline.
The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset.
The script has the ParallelRunStep step runs must process six input files each time the
inferencing function is called.
You need to configure the pipeline.
Which configuration setting should you specify in the ParallelRunConfig object for the
PrallelRunStep step?
• A. process_count_per_node= "6"
• B. node_count= "6"
• C. mini_batch_size= "6"
• D. error_threshold= "6"
Answer : B
Explanation:
node_count is the number of nodes in the compute target used for running the ParallelRunStep.
Incorrect Answers:
A: process_count_per_node -
Number of processes executed on each node. (optional, default value is number of cores on
node.)
C: mini_batch_size -
For FileDataset input, this field is the number of files user script can process in one run() call. For
TabularDataset input, this field is the approximate size of data the user script can process in one
run() call. Example values are 1024, 1024KB, 10MB, and 1GB.
D: error_threshold -
The number of record failures for TabularDataset and file failures for FileDataset that should be
ignored during processing. If the error count goes above this value, then the job will be aborted.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-
steps/azureml.contrib.pipeline.steps.parallelrunconfig?view=azure-ml-py
Next Question
Question 189 ( Question Set 3 )
You deploy a real-time inference service for a trained model.
The deployed model supports a business-critical application, and it is important to be able to
monitor the data submitted to the web service and the predictions the data generates.
You need to implement a monitoring solution for the deployed model using minimal
administrative effort.
What should you do?
Answer : B
Explanation:
Configure logging with Azure Machine Learning studio
You can also enable Azure Application Insights from Azure Machine Learning studio. When
you're ready to deploy your model as a web service, use the following steps to enable
Application Insights:
1. Sign in to the studio at https://ml.azure.com.
2. Go to Models and select the model you want to deploy.
3. Select +Deploy.
4. Populate the Deploy model form.
5. Expand the Advanced menu.
6. Select Enable Application Insights diagnostics and data collection.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-enable-app-insights
Next Question
Question 190 ( Question Set 3 )
HOTSPOT -
You use Azure Machine Learning to train and register a model.
You must deploy the model into production as a real-time web service to an inference cluster
named service-compute that the IT department has created in the
Azure Machine Learning workspace.
Client applications consuming the deployed web service must be authenticated based on their
Azure Active Directory service principal.
You need to write a script that uses the Azure Machine Learning SDK to deploy the model. The
necessary modules have been imported.
How should you complete the code? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: AksCompute -
Example:
aks_target = AksCompute(ws,"myaks")
# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config,
aks_target)
Box 2: AksWebservice -
Box 3: token_auth_enabled=Yes -
Whether or not token auth is enabled for the Webservice.
Note: A Service principal defined in Azure Active Directory (Azure AD) can act as a principal on
which authentication and authorization policies can be enforced in
Azure Databricks.
The Azure Active Directory Authentication Library (ADAL) can be used to programmatically get
an Azure AD access token for a user.
Incorrect Answers:
auth_enabled (bool): Whether or not to enable key auth for this Webservice. Defaults to True.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-azure-kubernetes-
service https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-
prin-aad-token
Next Question
Page: 38 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Create a new compute cluster by using larger VM sizes for the nodes, redeploy the
web service to that cluster, and update the DNS registration for the service endpoint to
point to the new cluster.
• B. Increase the node count of the compute cluster where the web service is deployed.
• C. Increase the minimum node count of the compute cluster where the web service is
deployed.
• D. Increase the VM size of nodes in the compute cluster where the web service is
deployed.
Answer : B
Explanation:
The Azure Machine Learning SDK does not provide support scaling an AKS cluster. To scale the
nodes in the cluster, use the UI for your AKS cluster in the Azure
Machine Learning studio. You can only change the node count, not the VM size of the cluster.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-kubernetes
Next Question
Question 192 ( Question Set 3 )
You use Azure Machine Learning designer to create a real-time service endpoint. You have a
single Azure Machine Learning service compute resource.
You train the model and prepare the real-time pipeline for deployment.
You need to publish the inference pipeline as a web service.
Which compute type should you use?
Answer : B
Explanation:
Azure Kubernetes Service (AKS) can be used real-time inference.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target
Next Question
Question 193 ( Question Set 3 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You train and register a machine learning model.
You plan to deploy the model as a real-time web service. Applications must use key-based
authentication to use the model.
You need to deploy the web service.
Solution:
Create an AciWebservice instance.
Set the value of the ssl_enabled property to True.
Deploy the model to the service.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use only auth_enabled = TRUE
Note: Key-based authentication.
Web services deployed on AKS have key-based auth enabled by default. ACI-deployed services
have key-based auth disabled by default, but you can enable it by setting auth_enabled = TRUE
when creating the ACI web service. The following is an example of creating an ACI deployment
configuration with key-based auth enabled. deployment_config <-
aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1, auth_enabled = TRUE)
Reference:
https://azure.github.io/azureml-sdk-for-r/articles/deploying-models.html
Next Question
Question 194 ( Question Set 3 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You train and register a machine learning model.
You plan to deploy the model as a real-time web service. Applications must use key-based
authentication to use the model.
You need to deploy the web service.
Solution:
Create an AciWebservice instance.
Set the value of the auth_enabled property to True.
Deploy the model to the service.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
Key-based authentication.
Web services deployed on AKS have key-based auth enabled by default. ACI-deployed services
have key-based auth disabled by default, but you can enable it by setting auth_enabled = TRUE
when creating the ACI web service. The following is an example of creating an ACI deployment
configuration with key-based auth enabled. deployment_config <-
aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1, auth_enabled = TRUE)
Reference:
https://azure.github.io/azureml-sdk-for-r/articles/deploying-models.html
Next Question
Question 195 ( Question Set 3 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You train and register a machine learning model.
You plan to deploy the model as a real-time web service. Applications must use key-based
authentication to use the model.
You need to deploy the web service.
Solution:
Create an AciWebservice instance.
Set the value of the auth_enabled property to False.
Set the value of the token_auth_enabled property to True.
Deploy the model to the service.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use only auth_enabled = TRUE
Note: Key-based authentication.
Web services deployed on AKS have key-based auth enabled by default. ACI-deployed services
have key-based auth disabled by default, but you can enable it by setting auth_enabled = TRUE
when creating the ACI web service. The following is an example of creating an ACI deployment
configuration with key-based auth enabled. deployment_config <-
aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1, auth_enabled = TRUE)
Reference:
https://azure.github.io/azureml-sdk-for-r/articles/deploying-models.html
Next Question
Page: 39 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• A. service.state
• B. service.get_logs()
• C. service.serialize()
• D. service.environment
Answer : B
Explanation:
The first step in debugging errors is to get your deployment logs. In Python: service.get_logs()
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment
Next Question
Question 197 ( Question Set 3 )
You use the Azure Machine Learning Python SDK to define a pipeline that consists of multiple
steps.
When you run the pipeline, you observe that some steps do not run. The cached output from a
previous run is used instead.
You need to ensure that every step in the pipeline is run, even if the parameters and contents of
the source directory have not changed since the previous run.
What are two possible ways to achieve this goal? Each correct answer presents a complete
solution.
NOTE: Each correct selection is worth one point.
• A. Use a PipelineData object that references a datastore other than the default datastore.
• B. Set the regenerate_outputs property of the pipeline to True.
• C. Set the allow_reuse property of each step in the pipeline to False.
• D. Restart the compute cluster where the pipeline experiment is configured to run.
• E. Set the outputs property of each step in the pipeline to True.
Answer : BC
Explanation:
B: If regenerate_outputs is set to True, a new submit will always force generation of all step
outputs, and disallow data reuse for any step of this run. Once this run is complete, however,
subsequent runs may reuse the results of this run.
C: Keep the following in mind when working with pipeline steps, input/output data, and step
reuse.
✑ If data used in a step is in a datastore and allow_reuse is True, then changes to the data
change won't be detected. If the data is uploaded as part of the snapshot (under the step's
source_directory), though this is not recommended, then the hash will change and will trigger a
rerun.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-
core/azureml.pipeline.core.pipelinestep
https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-
azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting- started.ipynb
Next Question
Question 198 ( Question Set 3 )
You train a model and register it in your Azure Machine Learning workspace. You are ready to
deploy the model as a real-time web service.
You deploy the model to an Azure Kubernetes Service (AKS) inference cluster, but the
deployment fails because an error occurs when the service runs the entry script that is
associated with the model deployment.
You need to debug the error by iteratively modifying the code and reloading the service,
without requiring a re-deployment of the service for each code update.
What should you do?
• A. Modify the AKS service deployment configuration to enable application insights and
re-deploy to AKS.
• B. Create an Azure Container Instances (ACI) web service deployment configuration and
deploy the model on ACI.
• C. Add a breakpoint to the first line of the entry script and redeploy the service to AKS.
• D. Create a local web service deployment configuration and deploy the model to a local
Docker container.
• E. Register a new version of the model and update the entry script to load the new
version of the model from its registered path.
Answer : B
Explanation:
How to work around or solve common Docker deployment errors with Azure Container
Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine
Learning.
The recommended and the most up to date approach for model deployment is via the
Model.deploy() API using an Environment object as an input parameter. In this case our service
will create a base docker image for you during deployment stage and mount the required
models all in one call. The basic deployment tasks are:
1. Register the model in the workspace model registry.
2. Define Inference Configuration:
a) Create an Environment object based on the dependencies you specify in the environment
yaml file or use one of our procured environments. b) Create an inference configuration
(InferenceConfig object) based on the environment and the scoring script.
3. Deploy the model to Azure Container Instance (ACI) service or to Azure Kubernetes Service
(AKS).
Next Question
Question 199 ( Question Set 3 )
You use Azure Machine Learning designer to create a training pipeline for a regression model.
You need to prepare the pipeline for deployment as an endpoint that generates predictions
asynchronously for a dataset of input data values.
What should you do?
Answer : C
Explanation:
You must first convert the training pipeline into a real-time inference pipeline. This process
removes training modules and adds web service inputs and outputs to handle requests.
Incorrect Answers:
A: Use the Enter Data Manually module to create a small dataset by typing values.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-designer-automobile-price-
deploy https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-
reference/enter-data-manually
Next Question
Question 200 ( Question Set 3 )
You retrain an existing model.
You need to register the new version of a model while keeping the current version of the model
in the registry.
What should you do?
• A. Register a model with a different name from the existing model and a custom
property named version with the value 2.
• B. Register the model with the same name as the existing model.
• C. Save the new model in the default datastore with the same name as the existing
model. Do not register the new model.
• D. Delete the existing model and register the new one with the same name.
Answer : B
Explanation:
Model version: A version of a registered model. When a new model is added to the Model
Registry, it is added as Version 1. Each model registered to the same model name increments
the version number.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/applications/mlflow/model-registry
Next Question
Page: 40 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Specify a different name for the model each time you register it.
• B. Register the model with the same name each time regardless of accuracy, and always
use the latest version of the model in the batch inferencing pipeline.
• C. Specify the model framework version when registering the model, and only register
subsequent models if this value is higher.
• D. Specify a property named accuracy with the accuracy metric as a value when
registering the model, and only register subsequent models if their accuracy is higher
than the accuracy property value of the currently registered model.
• E. Specify a tag named accuracy with the accuracy metric as a value when registering the
model, and only register subsequent models if their accuracy is higher than the accuracy
tag value of the currently registered model.
Answer : CE
Explanation:
E: Using tags, you can track useful information such as the name and version of the machine
learning library used to train the model. Note that tags must be alphanumeric.
Reference:
https://notebooks.azure.com/xavierheriat/projects/azureml-getting-started/html/how-to-use-
azureml/deployment/register-model-create-image-deploy-service/ register-model-create-
image-deploy-service.ipynb
Next Question
Question 202 ( Question Set 3 )
You are a data scientist working for a hotel booking website company. You use the Azure
Machine Learning service to train a model that identifies fraudulent transactions.
You must deploy the model as an Azure Machine Learning real-time web service using the
Model.deploy method in the Azure Machine Learning SDK. The deployed web service must
return real-time predictions of fraud based on transaction data input.
You need to create the script that is specified as the entry_script parameter for the
InferenceConfig class used to deploy the model.
What should the entry script do?
Answer : C
Explanation:
The entry script receives data submitted to a deployed web service and passes it to the model. It
then takes the response returned by the model and returns that to the client. The script is
specific to your model. It must understand the data that the model expects and returns.
The two things you need to accomplish in your entry script are:
✑ Loading your model (using a function called init())
✑ Running your model on input data (using a function called run())
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where
Next Question
Question 203 ( Question Set 3 )
DRAG DROP -
You use Azure Machine Learning to deploy a model as a real-time web service.
You need to create an entry script for the service that ensures that the model is loaded when the
service starts and is used to score new data as it is received.
Which functions should you include in the script? To answer, drag the appropriate functions to
the correct actions. Each function may be used once, more than once, or not at all. You may
need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Box 1: init()
The entry script has only two required functions, init() and run(data). These functions are used to
initialize the service at startup and run the model using request data passed in by a client. The
rest of the script handles loading and running the model(s).
Box 2: run()
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-existing-model
Next Question
Question 204 ( Question Set 3 )
You develop and train a machine learning model to predict fraudulent transactions for a hotel
booking website.
Traffic to the site varies considerably. The site experiences heavy traffic on Monday and Friday
and much lower traffic on other days. Holidays are also high web traffic days.
You need to deploy the model as an Azure Machine Learning real-time web service endpoint on
compute that can dynamically scale up and down to support demand.
Which deployment compute option should you use?
Answer : D
Explanation:
Azure Machine Learning compute cluster is a managed-compute infrastructure that allows you
to easily create a single or multi-node compute. The compute is created within your workspace
region as a resource that can be shared with other users in your workspace. The compute scales
up automatically when a job is submitted, and can be put in an Azure Virtual Network.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-sdk
Next Question
Question 205 ( Question Set 3 )
You use the designer to create a training pipeline for a classification model. The pipeline uses a
dataset that includes the features and labels required for model training.
You create a real-time inference pipeline from the training pipeline. You observe that the
schema for the generated web service input is based on the dataset and includes the label
column that the model predicts. Client applications that use the service must not be required to
submit this value.
You need to modify the inference pipeline to meet the requirement.
What should you do?
• A. Add a Select Columns in Dataset module to the inference pipeline after the dataset
and use it to select all columns other than the label.
• B. Delete the dataset from the training pipeline and recreate the real-time inference
pipeline.
• C. Delete the Web Service Input module from the inference pipeline.
• D. Replace the dataset in the inference pipeline with an Enter Data Manually module that
includes data for the feature columns but not the label column.
Answer : A
Explanation:
By default, the Web Service Input will expect the same data schema as the module output data
which connects to the same downstream port as it. You can remove the target variable column
in the inference pipeline using Select Columns in Dataset module. Make sure that the output of
Select Columns in Dataset removing target variable column is connected to the same port as the
output of the Web Service Intput module.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-designer-automobile-price-
deploy
Next Question
Page: 41 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
Answer : D
Explanation:
You need to create an inferencing cluster.
Next Question
Question 207 ( Question Set 3 )
You create an Azure Machine Learning workspace named ML-workspace. You also create an
Azure Databricks workspace named DB-workspace. DB-workspace contains a cluster named DB-
cluster.
You must use DB-cluster to run experiments from notebooks that you import into DB-
workspace.
You need to use ML-workspace to track MLflow metrics and artifacts generated by experiments
running on DB-cluster. The solution must minimize the need for custom code.
What should you do?
Answer : B
Explanation:
Connect your Azure Databricks and Azure Machine Learning workspaces:
Linking your ADB workspace to your Azure Machine Learning workspace enables you to track
your experiment data in the Azure Machine Learning workspace.
To link your ADB workspace to a new or existing Azure Machine Learning workspace
1. Sign in to Azure portal.
2. Navigate to your ADB workspace's Overview page.
3. Select the Link Azure Machine Learning workspace button on the bottom right.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow-azure-databricks
Next Question
Question 208 ( Question Set 3 )
HOTSPOT -
You create an Azure Machine Learning workspace.
You need to detect data drift between a baseline dataset and a subsequent target dataset by
using the DataDriftDetector class.
How should you complete the code segment? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: create_from_datasets -
The create_from_datasets method creates a new DataDriftDetector object from a baseline
tabular dataset and a target time series dataset.
Box 2: backfill -
The backfill method runs a backfill job over a given specified start and end date.
Syntax: backfill(start_date, end_date, compute_target=None, create_compute_target=False)
Incorrect Answers:
List and update do not have datetime parameters.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-
datadrift/azureml.datadrift.datadriftdetector(class)
Next Question
Question 209 ( Question Set 3 )
You are planning to register a trained model in an Azure Machine Learning workspace.
You must store additional metadata about the model in a key-value format. You must be able to
add new metadata and modify or delete metadata after creation.
You need to register the model.
Which parameter should you use?
• A. description
• B. model_framework
• C. tags
• D. properties
Answer : D
Explanation:
azureml.core.Model.properties:
Dictionary of key value properties for the Model. These properties cannot be changed after
registration, however new key value pairs can be added.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model
Next Question
Question 210 ( Question Set 3 )
You have a Python script that executes a pipeline. The script includes the following code: from
azureml.core import Experiment pipeline_run = Experiment(ws, 'pipeline_test').submit(pipeline)
You want to test the pipeline before deploying the script.
You need to display the pipeline run details written to the STDOUT output when the pipeline
completes.
Which code segment should you add to the test script?
• A. pipeline_run.get.metrics()
• B. pipeline_run.wait_for_completion(show_output=True)
• C. pipeline_param = PipelineParameter(name="stdout", default_value="console")
• D. pipeline_run.get_status()
Answer : B
Explanation:
wait_for_completion: Wait for the completion of this run. Returns the status object after the wait.
Syntax: wait_for_completion(show_output=False, wait_post_processing=False,
raise_on_error=True)
Parameter: show_output -
Indicates whether to show the run output on sys.stdout.
Next Question
Page: 42 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Enable data drift monitoring for the model and its training dataset.
• B. Score the model against some test data with known label values and use the results to
calculate a confusion matrix.
• C. Use the Hyperdrive library to test the model with multiple hyperparameter values.
• D. Use the interpretability package to generate an explainer for the model.
• E. Add tags to the model registration indicating the names of the features in the training
dataset.
Answer : D
Explanation:
When you compute model explanations and visualize them, you're not limited to an existing
model explanation for an automated ML model. You can also get an explanation for your model
with different test data. The steps in this section show you how to compute and visualize
engineered feature importance based on your test data.
Incorrect Answers:
A: In the context of machine learning, data drift is the change in model input data that leads to
model performance degradation. It is one of the top reasons where model accuracy degrades
over time, thus monitoring data drift helps detect model performance issues.
B: A confusion matrix is used to describe the performance of a classification model. Each row
displays the instances of the true, or actual class in your dataset, and each column represents
the instances of the class that was predicted by the model.
C: Hyperparameters are adjustable parameters you choose for model training that guide the
training process. The HyperDrive package helps you automate choosing these parameters.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-
interpretability-automl
Next Question
Question 212 ( Question Set 4 )
HOTSPOT -
You write code to retrieve an experiment that is run from your Azure Machine Learning
workspace.
The run used the model interpretation support in Azure Machine Learning to generate and
upload a model explanation.
Business managers in your organization want to see the importance of the features in the
model.
You need to print out the model features and their relative importance in an output that looks
similar to the following.
How should you complete the code? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: from_run_id -
from_run_id(workspace, experiment_name, run_id)
Create the client with factory method given a run ID.
Returns an instance of the ExplanationClient.
Parameters -
✑ Workspace Workspace - An object that represents a workspace.
✑ experiment_name str - The name of an experiment.
✑ run_id str - A GUID that represents a run.
Box 2: list_model_explanations -
list_model_explanations returns a dictionary of metadata for all model explanations available.
Returns -
A dictionary of explanation metadata such as id, data type, explanation method, model type,
and upload time, sorted by upload time
Box 3: explanation -
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contrib-
interpret/azureml.contrib.interpret.explanation.explanation_client.explanationclient?view=azure-
ml-py
Next Question
Question 213 ( Question Set 4 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You train a classification model by using a logistic regression algorithm.
You must be able to explain the model's predictions by calculating the importance of each
feature, both as an overall global relative importance value and as a measure of local
importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local
feature importance values.
Solution: Create a MimicExplainer.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use Permutation Feature Importance Explainer (PFI).
Note 1: Mimic explainer is based on the idea of training global surrogate models to mimic
blackbox models. A global surrogate model is an intrinsically interpretable model that is trained
to approximate the predictions of any black box model as accurately as possible. Data scientists
can interpret the surrogate model to draw conclusions about the black box model.
Note 2: Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a
technique used to explain classification and regression models. At a high level, the way it works
is by randomly shuffling data one feature at a time for the entire dataset and calculating how
much the performance metric of interest changes. The larger the change, the more important
that feature is. PFI can explain the overall behavior of any underlying model but does not explain
individual predictions.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-
interpretability
Next Question
Question 214 ( Question Set 4 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You train a classification model by using a logistic regression algorithm.
You must be able to explain the model's predictions by calculating the importance of each
feature, both as an overall global relative importance value and as a measure of local
importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local
feature importance values.
Solution: Create a TabularExplainer.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Instead use Permutation Feature Importance Explainer (PFI).
Note 1:
Note 2: Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a
technique used to explain classification and regression models. At a high level, the way it works
is by randomly shuffling data one feature at a time for the entire dataset and calculating how
much the performance metric of interest changes. The larger the change, the more important
that feature is. PFI can explain the overall behavior of any underlying model but does not explain
individual predictions.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-
interpretability
Next Question
Question 215 ( Question Set 4 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You train a classification model by using a logistic regression algorithm.
You must be able to explain the model's predictions by calculating the importance of each
feature, both as an overall global relative importance value and as a measure of local
importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local
feature importance values.
Solution: Create a PFIExplainer.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a technique
used to explain classification and regression models. At a high level, the way it works is by
randomly shuffling data one feature at a time for the entire dataset and calculating how much
the performance metric of interest changes. The larger the change, the more important that
feature is. PFI can explain the overall behavior of any underlying model but does not explain
individual predictions.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-
interpretability
Next Question
Page: 43 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Explanation:
Box 1: StandardScaler -
The StandardScaler assumes your data is normally distributed within each feature and will scale
them such that the distribution is now centred around 0, with a standard deviation of 1.
Example:
All features are now on the same scale relative to one another.
Notice that the skewness of the distribution is maintained but the 3 distributions are brought
into the same scale so that they overlap.
Box 3: Normalizer -
Reference:
http://benalexkeen.com/feature-scaling-with-scikit-learn/
Next Question
Question 217 ( Question Set 4 )
You are determining if two sets of data are significantly different from one another by using
Azure Machine Learning Studio.
Estimated values in one set of data may be more than or less than reference values in the other
set of data. You must produce a distribution that has a constant
Type I error as a function of the correlation.
You need to produce the distribution.
Which type of distribution should you produce?
Answer : D
Explanation:
Choose a one-tail or two-tail test. The default is a two-tailed test. This is the most common type
of test, in which the expected distribution is symmetric around zero.
Example: Type I error of unpaired and paired two-sample t-tests as a function of the correlation.
The simulated random numbers originate from a bivariate normal distribution with a variance of
1.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/test-
hypothesis-using-t-test https://en.wikipedia.org/wiki/Student%27s_t-test
Next Question
Question 218 ( Question Set 4 )
DRAG DROP -
You are producing a multiple linear regression model in Azure Machine Learning Studio.
Several independent variables are highly correlated.
You need to select appropriate methods for conducting effective feature engineering on all the
data.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Use the Filter Based Feature Selection module
Filter Based Feature Selection identifies the features in a dataset with the greatest predictive
power.
The module outputs a dataset that contains the best feature columns, as ranked by predictive
power. It also outputs the names of the features and their scores from the selected metric.
Step 2: Build a counting transform
A counting transform creates a transformation that turns count tables into features, so that you
can apply the transformation to multiple datasets.
Step 3: Test the hypothesis using t-Test
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/studio-module-reference/filter-
based-feature-selection https://docs.microsoft.com/en-us/azure/machine-learning/studio-
module-reference/build-counting-transform
Next Question
Question 219 ( Question Set 4 )
You are performing feature engineering on a dataset.
You must add a feature named CityName and populate the column value with the text London.
You need to add the new feature to the dataset.
Which Azure Machine Learning Studio module should you use?
Answer : B
Explanation:
Typical metadata changes might include marking columns as features.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-
metadata
Next Question
Question 220 ( Question Set 4 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a model to predict the price of a student's artwork depending on the following
variables: the student's length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative
Absolute Error, Relative Squared Error, and the Coefficient of
Determination.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : A
Explanation:
The following metrics are reported for evaluating regression models. When you compare
models, they are ranked by the metric you select for evaluation.
Mean absolute error (MAE) measures how close the predictions are to the actual outcomes;
thus, a lower score is better.
Root mean squared error (RMSE) creates a single value that summarizes the error in the model.
By squaring the difference, the metric disregards the difference between over-prediction and
under-prediction.
Relative absolute error (RAE) is the relative absolute difference between expected and actual
values; relative because the mean difference is divided by the arithmetic mean.
Relative squared error (RSE) similarly normalizes the total squared error of the predicted values
by dividing by the total squared error of the actual values.
Mean Zero One Error (MZOE) indicates whether the prediction was correct or not. In other
words: ZeroOneLoss(x,y) = 1 when x!=y; otherwise 0.
Coefficient of determination, often referred to as R2, represents the predictive power of the
model as a value between 0 and 1. Zero means the model is random
(explains nothing); 1 means there is a perfect fit. However, caution should be used in
interpreting R2 values, as low values can be entirely normal and high values can be suspect.
AUC.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model
Next Question
Page: 44 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Yes
• B. No
Answer : B
Explanation:
Those are metrics for evaluating classification models, instead use: Mean Absolute Error, Root
Mean Absolute Error, Relative Absolute Error, Relative Squared
Error, and the Coefficient of Determination.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model
Next Question
Question 222 ( Question Set 4 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a model to predict the price of a student's artwork depending on the following
variables: the student's length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Relative Squared Error, Coefficient of Determination,
Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Relative Squared Error, Coefficient of Determination are good metrics to evaluate the linear
regression model, but the others are metrics for classification models.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model
Next Question
Question 223 ( Question Set 4 )
You are a data scientist creating a linear regression model.
You need to determine how closely the data fits the regression line.
Which metric should you review?
Answer : B
Explanation:
Coefficient of determination, often referred to as R2, represents the predictive power of the
model as a value between 0 and 1. Zero means the model is random
(explains nothing); 1 means there is a perfect fit. However, caution should be used in
interpreting R2 values, as low values can be entirely normal and high values can be suspect.
Incorrect Answers:
A: Root mean squared error (RMSE) creates a single value that summarizes the error in the
model. By squaring the difference, the metric disregards the difference between over-prediction
and under-prediction.
C: Recall is the fraction of all correct results returned by the model.
D: Precision is the proportion of true results over all positive results.
E: Mean absolute error (MAE) measures how close the predictions are to the actual outcomes;
thus, a lower score is better.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model
Next Question
Question 224 ( Question Set 4 )
You are creating a binary classification by using a two-class logistic regression model.
You need to evaluate the model results for imbalance.
Which evaluation metric should you use?
Answer : B
Explanation:
One can inspect the true positive rate vs. the false positive rate in the Receiver Operating
Characteristic (ROC) curve and the corresponding Area Under the
Curve (AUC) value. The closer this curve is to the upper left corner; the better the classifier's
performance is (that is maximizing the true positive rate while minimizing the false positive rate).
Curves that are close to the diagonal of the plot, result from classifiers that tend to make
predictions that are close to random guessing.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-
performance#evaluating-a-binary-classification-model
Next Question
Question 225 ( Question Set 4 )
HOTSPOT -
You are developing a linear regression model in Azure Machine Learning Studio. You run an
experiment to compare different algorithms.
The following image displays the results dataset output:
Use the drop-down menus to select the answer choice that answers each question based on the
information presented in the image.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Boosted Decision Tree Regression
Mean absolute error (MAE) measures how close the predictions are to the actual outcomes;
thus, a lower score is better.
Box 2:
Online Gradient Descent: If you want the algorithm to find the best parameters for you, set
Create trainer mode option to Parameter Range. You can then specify multiple values for the
algorithm to try.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/linear-regression
Next Question
Page: 45 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Explanation:
In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g.
deep) has low bias and high variance.
Note: In statistics and machine learning, the biasג€"variance tradeoff is the property of a set of
predictive models whereby models with a lower bias in parameter estimation have a higher
variance of the parameter estimates across samples, and vice versa. Increasing the bias will
decrease the variance. Increasing the variance will decrease the bias.
Reference:
https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-
machine-learning/
Next Question
Question 227 ( Question Set 4 )
DRAG DROP -
You have a model with a large difference between the training and validation error values.
You must create a new model and perform cross-validation.
You need to identify a parameter set for the new model using Azure Machine Learning Studio.
Which module you should use for each step? To answer, drag the appropriate modules to the
correct steps. Each module may be used once or more than once, or not at all. You may need to
drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Next Question
Question 228 ( Question Set 4 )
HOTSPOT -
You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two
datasets.
Use the drop-down menus to select the answer choice that answers each question based on the
information presented in the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Next Question
Question 229 ( Question Set 4 )
You are a data scientist building a deep convolutional neural network (CNN) for image
classification.
The CNN model you build shows signs of overfitting.
You need to reduce overfitting and converge the model to an optimal fit.
Which two actions should you perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
Explanation:
B: Weight regularization provides an approach to reduce the overfitting of a deep learning
neural network model on the training data and improve the performance of the model on new
data, such as the holdout test set.
Keras provides a weight regularization API that allows you to add a penalty for weight size to
the loss function.
Three different regularizer instances are provided; they are:
✑ L1: Sum of the absolute weights.
✑ L2: Sum of the squared weights.
✑ L1L2: Sum of the absolute and the squared weights.
D: Because a fully connected layer occupies most of the parameters, it is prone to overfitting.
One method to reduce overfitting is dropout. At each training stage, individual nodes are either
"dropped out" of the net with probability 1-p or kept with probability p, so that a reduced
network is left; incoming and outgoing edges to a dropped-out node are also removed.
By avoiding training all nodes on all training data, dropout decreases overfitting.
Reference:
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-
regularization/ https://en.wikipedia.org/wiki/Convolutional_neural_network
Next Question
Question 230 ( Question Set 4 )
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are creating a model to predict the price of a student's artwork depending on the following
variables: the student's length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative
Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?
• A. Yes
• B. No
Answer : B
Explanation:
Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models.
Note: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the
linear regression model.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-
model
Next Question
Page: 46 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Designing and Implementing a Data Science Solution
on Azure (beta) v1.0 (DP-100)
Page: 47 / 56
Total 282 questions
Answer : ABD
Explanation:
Next Question
Question 232 ( Question Set 4 )
HOTSPOT -
You train a classification model by using a decision tree algorithm.
You create an estimator by running the following Python code. The variable feature_names is a
list of all feature names, and class_names is a list of all class names. from interpret.ext.blackbox
import TabularExplainer explainer = TabularExplainer(model, x_train, features=feature_names,
classes=class_names)
You need to explain the predictions made by the model for all classes by determining the
importance of all features.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Box 1: Yes -
TabularExplainer calls one of the three SHAP explainers underneath (TreeExplainer,
DeepExplainer, or KernelExplainer).
Box 2: Yes -
To make your explanations and visualizations more informative, you can choose to pass in
feature names and output class names if doing classification.
Box 3: No -
TabularExplainer automatically selects the most appropriate one for your use case, but you can
call each of its three underlying explainers underneath
(TreeExplainer, DeepExplainer, or KernelExplainer) directly.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-
interpretability-aml
Next Question
Question 233 ( Question Set 4 )
DRAG DROP -
You have several machine learning models registered in an Azure Machine Learning workspace.
You must use the Fairlearn dashboard to assess fairness in a selected model.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Step 1: Select a model feature to be evaluated.
Step 2: Select a binary classification or regression model.
Register your models within Azure Machine Learning. For convenience, store the results in a
dictionary, which maps the id of the registered model (a string in name:version format) to the
predictor itself.
Example:
model_dict = {}
lr_reg_id = register_model("fairness_logistic_regression", lr_predictor) model_dict[lr_reg_id] =
lr_predictor svm_reg_id = register_model("fairness_svm", svm_predictor) model_dict[svm_reg_id]
= svm_predictor
Step 3: Select a metric to be measured
Precompute fairness metrics.
Create a dashboard dictionary using Fairlearn's metrics package.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-fairness-
aml
Next Question
Question 234 ( Question Set 4 )
HOTSPOT -
A biomedical research company plans to enroll people in an experimental medical treatment
trial.
You create and train a binary classification model to support selection and admission of patients
to the trial. The model includes the following features: Age,
Gender, and Ethnicity.
The model returns different performance metrics for people from different ethnic groups.
You need to use Fairlearn to mitigate and minimize disparities for each category in the Ethnicity
feature.
Which technique and constraint should you use? To answer, select the appropriate options in
the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
Next Question
Question 235 ( Testlet 3 )
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as
you would like to complete each case. However, there may be additional case studies and
sections on this exam. You must manage your time to ensure that you are able to complete all
questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin
a new section, you cannot return to this section.
Overview -
You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into
Europe and has asked you to investigate prices for private residences in major European cities.
You use Azure Machine Learning Studio to measure the median value of properties. You
produce a regression model to predict property prices by using the
Linear Regression and Bayesian Linear Regression modules.
Datasets -
There are two datasets in CSV format that contain property details for two cities, London and
Paris. You add both files to Azure Machine Learning Studio as separate datasets to the starting
point for an experiment. Both datasets contain the following columns:
An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format,
whereas the larger London dataset contains the MedianValue in numerical format.
Data issues -
Missing values -
The AccessibilityToHighway column in both datasets contains missing values. The missing data
must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.
Columns in each dataset contain missing and null values. The datasets also contain many
outliers. The Age column has a high proportion of outliers. You need to remove the rows that
have outliers in the Age column. The MedianValue and AvgRoomsInHouse columns both hold
data in numeric format. You need to select a feature selection algorithm to analyze the
relationship between the two columns in more detail.
Model fit -
The model shows signs of overfitting. You need to produce a more refined regression model
that reduces the overfitting.
Experiment requirements -
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance. In each case, the predictor of the dataset is the
column named MedianValue. You must ensure that the datatype of the MedianValue column of
the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-
parametric statistics to measure relationships.
You must use a feature selection algorithm to analyze the relationship between the
MedianValue and AvgRoomsInHouse columns.
Model training -
Hyperparameters -
You must configure hyperparameters in the model learning process to speed the learning phase.
In addition, this configuration should cancel the lowest performing runs at each evaluation
interval, thereby directing effort and resources towards models that are more likely to be
successful.
You are concerned that the model might not efficiently use compute resources in
hyperparameter tuning. You also are concerned that the model might prevent an increase in the
overall tuning time. Therefore, must implement an early stopping criterion on models that
provides savings without terminating promising jobs.
Testing -
You must produce multiple partitions of a dataset based on sampling using the Partition and
Sample module in Azure Machine Learning Studio.
Cross-validation -
You must create three equal partitions for cross-validation. You must also configure the cross-
validation process so that the rows in the test and training datasets are divided evenly by
properties that are near each city's main river. You must complete this task before the data goes
through the sampling process.
Data visualization -
You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve
in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class
Decision Jungle modules with one another.
DRAG DROP -
You need to correct the model fit issue.
Which three actions should you perform in sequence? To answer, move the appropriate actions
from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Answer :
Explanation:
Next Question
Page: 47 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Explanation:
A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU
instances.
Next Question
Question 238 ( Question Set 5 )
You need to implement a Data Science Virtual Machine (DSVM) that supports the Caffe2 deep
learning framework.
Which of the following DSVM should you create?
Answer : C
Explanation:
Caffe2 is supported by Data Science Virtual Machine for Linux.
Microsoft offers Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4.
However, only the DSVM on Ubuntu is preconfigured for Caffe2.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Question 239 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with employing a machine learning model, which makes use of a
PostgreSQL database and needs GPU processing, to forecast prices.
You are preparing to create a virtual machine that has the necessary tools built into it.
You need to make use of the correct virtual machine type.
Recommendation: You make use of a Geo AI Data Science Virtual Machine (Geo-DSVM)
Windows edition.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Explanation:
The Azure Geo AI Data Science VM (Geo-DSVM) delivers geospatial analytics capabilities from
Microsoft's Data Science VM. Specifically, this VM extends the AI and data science toolkits in the
Data Science VM by adding ESRI's market-leading ArcGIS Pro Geographic Information System.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Question 240 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with employing a machine learning model, which makes use of a
PostgreSQL database and needs GPU processing, to forecast prices.
You are preparing to create a virtual machine that has the necessary tools built into it.
You need to make use of the correct virtual machine type.
Recommendation: You make use of a Deep Learning Virtual Machine (DLVM) Windows edition.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Explanation:
DLVM is a template on top of DSVM image. In terms of the packages, GPU drivers etc are all
there in the DSVM image. Mostly it is for convenience during creation where we only allow
DLVM to be created on GPU VM instances on Azure.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Page: 48 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Yes
• B. No
Answer : A
Explanation:
In the DSVM, your training models can use deep learning algorithms on hardware that's based
on graphics processing units (GPUs).
PostgreSQL is available for the following operating systems: Linux (all recent distributions), 64-
bit installers available for macOS (OS X) version 10.6 and newer ג€"
Windows (with installers available for 64-bit version; tested on latest versions and back to
Windows 2012 R2.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-
machine/overview
Next Question
Question 242 ( Question Set 5 )
DRAG DROP -
You have been tasked with moving data into Azure Blob Storage for the purpose of supporting
Azure Machine Learning.
Which of the following can be used to complete your task? Answer by dragging the correct
options from the list to the answer area.
Select and Place:
Answer :
Explanation:
You can move data to and from Azure Blob storage using different technologies:
✑ Azure Storage-Explorer
✑ AzCopy
✑ Python
✑ SSIS
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-
azure-blob
Next Question
Question 243 ( Question Set 5 )
HOTSPOT -
Complete the sentence by selecting the correct option in the answer area.
Hot Area:
Answer :
Explanation:
Use the Convert to ARFF module in Azure Machine Learning Studio, to convert datasets and
results in Azure Machine Learning to the attribute-relation file format used by the Weka toolset.
This format is known as ARFF.
The ARFF data specification for Weka supports multiple machine learning tasks, including data
preprocessing, classification, and feature selection. In this format, data is organized by entities
and their attributes, and is contained in a single text file.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-
arff
Next Question
Question 244 ( Question Set 5 )
You have been tasked with designing a deep learning model, which accommodates the most
recent edition of Python, to recognize language.
You have to include a suitable deep learning framework in the Data Science Virtual Machine
(DSVM).
Which of the following actions should you take?
Answer : B
Reference:
https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-
explained.html
Next Question
Question 245 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with evaluating your model on a partial data sample via k-fold cross-
validation.
You have already configured a k parameter as the number of splits. You now have to configure
the k parameter for the cross-validation with the usual value choice.
Recommendation: You configure the use of the value k=3.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Next Question
Page: 49 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Yes
• B. No
Answer : A
Explanation:
Leave One Out (LOO) cross-validation
Setting K = n (the number of observations) yields n-fold and is called leave-one out cross-
validation (LOO), a special case of the K-fold approach.
LOO CV is sometimes useful but typically doesnג€™t shake up the data enough. The estimates
from each fold are highly correlated and hence their average can have high variance.
This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance
tradeoff.
Next Question
Question 247 ( Question Set 5 )
You construct a machine learning experiment via Azure Machine Learning Studio.
You would like to split data into two separate datasets.
Which of the following actions should you take?
Answer : D
Explanation:
The Group Data into Bins module supports multiple options for binning data. You can customize
how the bin edges are set and how values are apportioned into the bins.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-
data-into-bins
Next Question
Question 248 ( Question Set 5 )
You have been tasked with creating a new Azure pipeline via the Machine Learning designer.
You have to makes sure that the pipeline trains a model using data in a comma-separated
values (CSV) file that is published on a website. A dataset for the file for this file does not exist.
Data from the CSV file must be ingested into the designer pipeline with the least amount of
administrative effort as possible.
Which of the following actions should you take?
Explanation:
The preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to
data that lives in or is accessible from a datastore or at a Web
URL. The Dataset class is abstract, so you will create an instance of either a FileDataset (referring
to one or more files) or a TabularDataset that's created by from one or more files with delimited
columns of data.
Example:
from azureml.core import Dataset
iris_tabular_dataset = Dataset.Tabular.from_delimited_files([(def_blob_store, 'train-
dataset/iris.csv')])
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline
Next Question
Question 249 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You are in the process of creating a machine learning model. Your dataset includes rows with
null and missing values.
You plan to make use of the Clean Missing Data module in Azure Machine Learning Studio to
detect and fix the null and missing values in the dataset.
Recommendation: You make use of the Replace with median option.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 250 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You are in the process of creating a machine learning model. Your dataset includes rows with
null and missing values.
You plan to make use of the Clean Missing Data module in Azure Machine Learning Studio to
detect and fix the null and missing values in the dataset.
Recommendation: You make use of the Custom substitution value option.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Page: 50 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
• A. Yes
• B. No
Answer : A
Explanation:
Remove entire row: Completely removes any row in the dataset that has one or more missing
values. This is useful if the missing value can be considered randomly missing.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 252 ( Question Set 5 )
You need to consider the underlined segment to establish whether it is accurate.
To transform a categorical feature into a binary indicator, you should make use of the Clean
Missing Data module.
Select ג€No adjustment requiredג€ if the underlined segment is accurate. If the underlined
segment is inaccurate, select the accurate option.
• A. No adjustment required.
• B. Convert to Indicator Values
• C. Apply SQL Transformation
• D. Group Categorical Values
Answer : B
Explanation:
Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of
this module is to convert columns that contain categorical values into a series of binary indicator
columns that can more easily be used as features in a machine learning model.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-
indicator-values
Next Question
Question 253 ( Question Set 5 )
You need to consider the underlined segment to establish whether it is accurate.
To improve the amount of low incidence cases in a dataset, you should make use of the SMOTE
module.
Select ג€No adjustment requiredג€ if the underlined segment is accurate. If the underlined
segment is inaccurate, select the accurate option.
• A. No adjustment required.
• B. Remove Duplicate Rows
• C. Join Data
• D. Edit Metadata
Answer : A
Explanation:
Use the SMOTE module in Azure Machine Learning Studio to increase the number of
underrepresented cases in a dataset used for machine learning. SMOTE is a better way of
increasing the number of rare cases than simply duplicating existing cases.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
Next Question
Question 254 ( Question Set 5 )
HOTSPOT -
You need to consider the underlined segment to establish whether it is accurate.
Hot Area:
Answer :
Explanation:
The box-plot algorithm can be used to display outliers.
Reference:
https://medium.com/analytics-vidhya/what-is-an-outliers-how-to-detect-and-remove-them-
which-algorithm-are-sensitive-towards-outliers-2d501993d59
Next Question
Question 255 ( Question Set 5 )
You are planning to host practical training to acquaint learners with data visualization creation
using Python. Learner devices are able to connect to the internet.
Learner devices are currently NOT configured for Python development. Also, learners are unable
to install software on their devices as they lack administrator permissions. Furthermore, they are
unable to access Azure subscriptions.
It is imperative that learners are able to execute Python-based data visualization code.
Which of the following actions should you take?
Answer : C
Reference:
https://notebooks.azure.com/
Next Question
Page: 51 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
Explanation:
Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using
Chained Equations (MICE), this option has the advantage of not requiring the application of
predictors for each column. Instead, it approximates the covariance for the full dataset.
Therefore, it might offer better performance for datasets that have missing values in many
columns.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-
missing-data
Next Question
Question 257 ( Question Set 5 )
You have recently concluded the construction of a binary classification machine learning model.
You are currently assessing the model. You want to make use of a visualization that allows for
precision to be used as the measurement for the assessment.
Which of the following actions should you take?
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-
ml#confusion-matrix
Next Question
Question 258 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with evaluating your model on a partial data sample via k-fold cross-
validation.
You have already configured a k parameter as the number of splits. You now have to configure
the k parameter for the cross-validation with the usual value choice.
Recommendation: You configure the use of the value k=1.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Next Question
Question 259 ( Question Set 5 )
DRAG DROP -
You are in the process of constructing a regression model.
You would like to make it a Poisson regression model. To achieve your goal, the feature values
need to meet certain conditions.
Which of the following are relevant conditions with regards to the label data? Answer by
dragging the correct options from the list to the answer area.
Select and Place:
Answer :
Explanation:
Poisson regression is intended for use in regression models that are used to predict numeric
values, typically counts. Therefore, you should use this module to create your regression model
only if the values you are trying to predict fit the following conditions:
✑ The response variable has a Poisson distribution.
✑ Counts cannot be negative. The method will fail outright if you attempt to use it with
negative labels.
✑ A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this
method with non-whole numbers.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/poisson-
regression
Next Question
Question 260 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Group Categorical Values Azure Machine Learning
Studio module.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Next Question
Page: 52 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• A. Yes
• B. No
Answer : B
Next Question
Question 262 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Edit Metadata Azure Machine Learning Studio
module.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : A
Explanation:
Typical metadata changes might include marking columns as features.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-
metadata https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/join-data https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-
reference/group-categorical-values
Next Question
Question 263 ( Question Set 5 )
You have been tasked with ascertaining if two sets of data differ considerably. You will make use
of Azure Machine Learning Studio to complete your task.
You plan to perform a paired t-test.
Which of the following are conditions that must apply to use a paired t-test? (Choose all that
apply.)
Answer : BC
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/test-
hypothesis-using-t-test
Next Question
Question 264 ( Question Set 5 )
You want to train a classification model using data located in a comma-separated values (CSV)
file.
The classification model will be trained via the Automated Machine Learning interface using the
Classification task type.
You have been informed that only linear models need to be assessed by the Automated
Machine Learning.
Which of the following actions should you take?
Answer : C
Reference:
https://econml.azurewebsites.net/spec/estimation/dml.html
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-
models
Next Question
Question 265 ( Question Set 5 )
You are preparing to train a regression model via automated machine learning. The data
available to you has features with missing values, as well as categorical features with little
discrete values.
You want to make sure that automated machine learning is configured as follows:
✑ missing values must be automatically imputed.
✑ categorical features must be encoded as part of the training task.
Which of the following actions should you take?
• A. You should make use of the featurization parameter with the 'auto' value pair.
• B. You should make use of the featurization parameter with the 'off' value pair.
• C. You should make use of the featurization parameter with the 'on' value pair.
• D. You should make use of the featurization parameter with the 'FeaturizationConfig'
value pair.
Answer : A
Explanation:
Featurization str or FeaturizationConfig
Values: 'auto' / 'off' / FeaturizationConfig
Indicator for whether featurization step should be done automatically or not, or whether
customized featurization should be used.
Column type is automatically detected. Based on the detected column type
preprocessing/featurization is done as follows:
Categorical: Target encoding, one hot encoding, drop high cardinality categories, impute
missing values.
Numeric: Impute missing values, cluster distance, weight of evidence.
DateTime: Several features such as day, seconds, minutes, hours etc.
Text: Bag of words, pre-trained Word embedding, text target encoding.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-automl-
client/azureml.train.automl.automlconfig.automlconfig
Next Question
Page: 53 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer : C
Explanation:
Mean absolute error (MAE) measures how close the predictions are to the actual outcomes;
thus, a lower score is better.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-
reference/boosted-decision-tree-regression https://docs.microsoft.com/en-us/azure/machine-
learning/studio-module-reference/evaluate-model https://docs.microsoft.com/en-
us/azure/machine-learning/studio-module-reference/linear-regression
Next Question
Question 267 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with constructing a machine learning model that translates language text
into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Convolutional Neural Networks (CNNs).
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Next Question
Question 268 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with constructing a machine learning model that translates language text
into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Generative Adversarial Networks (GANs).
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Next Question
Question 269 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You have been tasked with constructing a machine learning model that translates language text
into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Recurrent Neural Networks (RNNs).
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : A
Explanation:
Note: RNNs are designed to take sequences of text as inputs or return sequences of text as
outputs, or both. Theyג€™re called recurrent because the networkג€™s hidden layers have a
loop in which the output and cell state from each time step become inputs at the next time step.
This recurrence serves as a form of memory.
It allows contextual information to flow through the network so that relevant outputs from
previous time steps can be applied to network operations at the current time step.
Reference:
https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571
Next Question
Question 270 ( Question Set 5 )
DRAG DROP -
You have been tasked with evaluating the performance of a binary classification model that you
created.
You need to choose evaluation metrics to achieve your goal.
Which of the following are the metrics you would choose? Answer by dragging the correct
options from the list to the answer area.
Select and Place:
Answer :
Explanation:
The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall,
F1 Score, and AUC.
Note: A very natural question is: ג€˜Out of the individuals whom the model, how many were
classified correctly (TP)?ג€™
This question can be answered by looking at the Precision of the model, which is the proportion
of positives that are classified correctly.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance
Next Question
Page: 54 / 56
Total 282 questions
• Facebook
• Twitter
• Youtube
• [email protected]
• Vendors
• Exam List
• Blog
• Logout
• Profile
Answer :
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-
neural-network
Next Question
Question 272 ( Question Set 5 )
You make use of Azure Machine Learning Studio to create a binary classification model.
You are preparing to carry out a parameter sweep of the model to tune hyperparameters. You
have to make sure that the sweep allows for every possible combination of hyperparameters to
be iterated. Also, the computing resources needed to carry out the sweep must be reduced.
Which of the following actions should you take?
• A. You should consider making use of the Selective grid sweep mode.
• B. You should consider making use of the Measured grid sweep mode.
• C. You should consider making use of the Entire grid sweep mode.
• D. You should consider making use of the Random grid sweep mode.
Answer : D
Explanation:
Maximum number of runs on random grid: This option also controls the number of iterations
over a random sampling of parameter values, but the values are not generated randomly from
the specified range; instead, a matrix is created of all possible combinations of parameter values
and a random sampling is taken over the matrix. This method is more efficient and less prone to
regional oversampling or undersampling.
If you are training a model that supports an integrated parameter sweep, you can also set a
range of seed values to use and iterate over the random seeds as well. This is optional, but can
be useful for avoiding bias introduced by seed selection.
C: Entire grid: When you select this option, the module loops over a grid predefined by the
system, to try different combinations and identify the best learner. This option is useful for cases
where you don't know what the best parameter settings might be and want to try all possible
combination of values.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-
model-hyperparameters
Next Question
Question 273 ( Question Set 5 )
You are in the process of constructing a deep convolutional neural network (CNN). The CNN will
be used for image classification.
You notice that the CNN model you constructed displays hints of overfitting.
You want to make sure that overfitting is minimized, and that the model is converged to an
optimal fit.
Which of the following is TRUE with regards to achieving your goal?
• A. You have to add an additional dense layer with 512 input units, and reduce the
amount of training data.
• B. You have to add L1/L2 regularization, and reduce the amount of training data.
• C. You have to reduce the amount of training data and make use of training data
augmentation.
• D. You have to add L1/L2 regularization, and make use of training data augmentation.
• E. You have to add an additional dense layer with 512 input units, and add L1/L2
regularization.
Answer : B
Explanation:
B: Weight regularization provides an approach to reduce the overfitting of a deep learning
neural network model on the training data and improve the performance of the model on new
data, such as the holdout test set.
Keras provides a weight regularization API that allows you to add a penalty for weight size to
the loss function.
Three different regularizer instances are provided; they are:
✑ L1: Sum of the absolute weights.
✑ L2: Sum of the squared weights.
✑ L1L2: Sum of the absolute and the squared weights.
Because a fully connected layer occupies most of the parameters, it is prone to overfitting. One
method to reduce overfitting is dropout. At each training stage, individual nodes are either
"dropped out" of the net with probability 1-p or kept with probability p, so that a reduced
network is left; incoming and outgoing edges to a dropped-out node are also removed.
By avoiding training all nodes on all training data, dropout decreases overfitting.
Reference:
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-
regularization/ https://en.wikipedia.org/wiki/Convolutional_neural_network
Next Question
Question 274 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You are planning to make use of Azure Machine Learning designer to train models.
You need choose a suitable compute type.
Recommendation: You choose Attached compute.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-
studio
Next Question
Question 275 ( Question Set 5 )
This question is included in a number of questions that depicts the identical set-up. However,
every question has a distinctive result. Establish if the recommendation satisfies the
requirements.
You are planning to make use of Azure Machine Learning designer to train models.
You need choose a suitable compute type.
Recommendation: You choose Inference cluster.
Will the requirements be satisfied?
• A. Yes
• B. No
Answer : B
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-
studio
Next Question
Page: 55 / 56
Total 282 questions
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]
• A. Yes
• B. No
Answer : A
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-
studio
Next Question
Question 277 ( Question Set 5 )
You are making use of the Azure Machine Learning to designer construct an experiment.
After dividing a dataset into training and testing sets, you configure the algorithm to be Two-
Class Boosted Decision Tree.
You are preparing to ascertain the Area Under the Curve (AUC).
Which of the following is a sequential combination of the models required to achieve your goal?
Answer : A
Next Question
Page: 56 / 56
Total 282 questions
Previous Page
CONNECT WITH US
• Facebook
• Twitter
• Youtube
• [email protected]