dp-100 C86e3e2d5342

Certy IQ
Premium exam material

Get certification quickly with the CertyIQ Premium exam material.
Everything you need to prepare, learn & pass your certification exam easily. Lifetime free updates
First attempt guaranteed success.
https://www.CertyIQ.com
Microsoft
(DP-100)
Designing and Implementing a Data Science Solution on Azure

(beta)
Total: 484 Questions

Link: https://certyiq.com/papers?provider=microsoft&exam=dp-100
Question: 1 CertyIQ
DRAG DROP -
You are planning to host practical training to acquaint staff with Docker for Windows.
Staff devices must support the installation of Docker.
Which of the following are requirements for this installation? Answer by dragging the correct options from the list
to the answer area.
Select and Place:
Answer:
Explanation:
Reference:
https://docs.docker.com/toolbox/toolbox_install_windows/
https://blogs.technet.microsoft.com/canitpro/2015/09/08/step-by-step-enabling-hyper-v-for-use-on-windows
-10/ https://docs.docker.com/docker-for-windows/install/
" target="_blank" style="word-break: break-all;">
Question: 2 CertyIQ
HOTSPOT -
Complete the sentence by selecting the correct option in the answer area.
Hot Area:
Answer:
Explanation:
A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU instances.
Question: 3 CertyIQ
You need to implement a Data Science Virtual Machine (DSVM) that supports the Caffe2 deep learning framework.
Which of the following DSVM should you create?
A. Windows Server 2012 DSVM

B. Windows Server 2016 DSVM
C. Ubuntu 16.04 DSVM
D. CentOS 7.4 DSVM
Answer: C
Explanation:
Caffe2 is supported by Data Science Virtual Machine for Linux.
Microsoft offers Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4.
However, only the DSVM on Ubuntu is preconfigured for Caffe2.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview
Question: 4 CertyIQ
This question is included in a number of questions that depicts the identical set-up. However, every question has a
distinctive result. Establish if the recommendation satisfies the requirements.
You have been tasked with employing a machine learning model, which makes use of a PostgreSQL database and
needs GPU processing, to forecast prices.
You are preparing to create a virtual machine that has the necessary tools built into it.
You need to make use of the correct virtual machine type.
Recommendation: You make use of a Geo AI Data Science Virtual Machine (Geo-DSVM) Windows edition.
Will the requirements be satisfied?
A. Yes
B. No
Answer: B
Explanation:
The Azure Geo AI Data Science VM (Geo-DSVM) delivers geospatial analytics capabilities from Microsoft's
Data Science VM. Specifically, this VM extends the AI and data science toolkits in the Data Science VM by
adding ESRI's market-leading ArcGIS Pro Geographic Information System.
Reference:
Question: 5 CertyIQ
Recommendation: You make use of a Deep Learning Virtual Machine (DLVM) Windows edition.
A. Yes
B. No
Answer: A
Explanation:
1. I will also stand with a Yes.A Deep Learning Virtual Machine (DLVM) is a pre-configured virtual machine
image on Microsoft Azure that is optimized for training deep learning models and includes popular tools such
as TensorFlow, PyTorch, and Caffe2. The DLVM Windows edition includes support for GPU processing, making
it suitable for the task of running a machine learning model that requires GPU processing to forecast prices.
Additionally, the DLVM can be configured to use PostgreSQL as the database, satisfying the requirement for
a PostgreSQL database. Therefore, the recommendation to use a DLVM Windows edition will satisfy the
requirements.
Question: 6 CertyIQ
Recommendation: You make use of a Data Science Virtual Machine (DSVM) Windows edition.
A. Yes
B. No
Answer: A
Explanation:
In the DSVM, your training models can use deep learning algorithms on hardware that's based on graphics
processing units (GPUs).
PostgreSQL is available for the following operating systems: Linux (all recent distributions), 64-bit installers
available for macOS (OS X) version 10.6 and newer "
Windows (with installers available for 64-bit version; tested on latest versions and back to Windows 2012 R2.
References:
Question: 7 CertyIQ
DRAG DROP -
You have been tasked with moving data into Azure Blob Storage for the purpose of supporting Azure Machine
Learning.
Which of the following can be used to complete your task? Answer by dragging the correct options from the list to
the answer area.
Select and Place:
Answer:
Explanation:
You can move data to and from Azure Blob storage using different technologies:
✑ Azure Storage-Explorer
✑ AzCopy
✑ Python
✑ SSIS
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-azure-blob
Question: 8 CertyIQ
HOTSPOT -
Hot Area:
Answer:
Explanation:
Use the Convert to ARFF module in Azure Machine Learning Studio, to convert datasets and results in Azure
Machine Learning to the attribute-relation file format used by the Weka toolset. This format is known as
ARFF.
The ARFF data specification for Weka supports multiple machine learning tasks, including data
preprocessing, classification, and feature selection. In this format, data is organized by entities and their
attributes, and is contained in a single text file.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-arff
Question: 9 CertyIQ
You have been tasked with designing a deep learning model, which accommodates the most recent edition of
Python, to recognize language.
You have to include a suitable deep learning framework in the Data Science Virtual Machine (DSVM).
Which of the following actions should you take?
A. You should consider including Rattle.

B. You should consider including TensorFlow.
C. You should consider including Theano.
D. You should consider including Chainer.
Answer: B
Explanation:
Reference:
https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html
Question: 10 CertyIQ
You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation.
You have already configured a k parameter as the number of splits. You now have to configure the k parameter for
the cross-validation with the usual value choice.
Recommendation: You configure the use of the value k=3.
A. Yes
B. No
Answer: B
Explanation:
Usual choice is key word here and usual choice is K=5 or 10. So answer is B. You can use as many splits as you
want. It all depends on the data. Train/test/validate is basically 3 splits that are just swapped around. 3 is
perfectly fine.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/cross-validate-model
A. Yes
B. No
Answer: A
Explanation:
Leave One Out (LOO) cross-validation
Setting K = n (the number of observations) yields n-fold and is called leave-one out cross-validation (LOO), a
special case of the K-fold approach.
LOO CV is sometimes useful but typically doesn't shake up the data enough. The estimates from each fold are
highly correlated and hence their average can have high variance.
This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance tradeoff.
You construct a machine learning experiment via Azure Machine Learning Studio.
You would like to split data into two separate datasets.
A. You should make use of the Split Data module.

B. You should make use of the Group Categorical Values module.
C. You should make use of the Clip Values module.
D. You should make use of the Group Data into Bins module.
Answer: A
Explanation:
The ans selected as B is selected based on old Machine Learning Studio but now azure machine learning is
used to perform any task and according to that A is right.
Reference:
https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/split-data
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/cross-
validate-model
You have been tasked with creating a new Azure pipeline via the Machine Learning designer.
You have to makes sure that the pipeline trains a model using data in a comma-separated values (CSV) file that is
published on a website. A dataset for the file for this file does not exist.
Data from the CSV file must be ingested into the designer pipeline with the least amount of administrative effort
as possible.
A. You should make use of the Convert to TXT module.

B. You should add the Copy Data object to the pipeline.
C. You should add the Import Data object to the pipeline.
D. You should add the Dataset object to the pipeline.
Answer: C
Explanation:
Both dataset and import data modules can do the job, but dataset module requires one to register the dataset
first before using it in designer, so violating the least admin efforts. Import data component is used to import
data from data sources such as web URLs with minimum effort. Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/component-reference/import-data
You are in the process of creating a machine learning model. Your dataset includes rows with null and missing
values.
You plan to make use of the Clean Missing Data module in Azure Machine Learning Studio to detect and fix the null
and missing values in the dataset.
Recommendation: You make use of the Replace with median option.
A. Yes
B. No
Answer: A
Explanation:
This is an incomplete question. We don't know what type of data it is. Continuous or categorical. If it's
continuous then it's A else its B, Clean Missing Data module has the option to replace with median
values.
Recommendation: You make use of the Custom substitution value option.
A. Yes
B. No
Answer: A
Explanation:
Clean Missing Data module also provides "Custom substitution value" cleaning modebecause we can use '0'
for numeric and 'na' for text column.
values.
Recommendation: You make use of the Remove entire row option.
A. Yes
B. No
Answer: A
Explanation:
Remove entire row: Completely removes any row in the dataset that has one or more missing values. This is
useful if the missing value can be considered randomly missing.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data
You need to consider the underlined segment to establish whether it is accurate.
To transform a categorical feature into a binary indicator, you should make use of the Clean Missing Data module.
Select `No adjustment required` if the underlined segment is accurate. If the underlined segment is inaccurate,
select the accurate option.
A. No adjustment required.
B. Convert to Indicator Values
C. Apply SQL Transformation
D. Group Categorical Values
Answer: B
Explanation:
Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this module is
to convert columns that contain categorical values into a series of binary indicator columns that can more
easily be used as features in a machine learning model.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-indicator-valu
es
To improve the amount of low incidence cases in a dataset, you should make use of the SMOTE module.
Select `No adjustment required` if the underlined segment is accurate. If the underlined segment is inaccurate,
select the accurate option.
A. No adjustment required.
B. Remove Duplicate Rows
C. Join Data
D. Edit Metadata
Answer: A
Explanation:
Use the SMOTE module in Azure Machine Learning Studio to increase the number of underrepresented cases
in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than
simply duplicating existing cases.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
HOTSPOT -
Hot Area:
Answer:
Explanation:
The box-plot algorithm can be used to display outliers.
Reference:
https://medium.com/analytics-vidhya/what-is-an-outliers-how-to-detect-and-remove-them-which-algorithm-a
re-sensitive-towards-outliers-2d501993d59
You are planning to host practical training to acquaint learners with data visualization creation using Python.
Learner devices are able to connect to the internet.
Learner devices are currently NOT configured for Python development. Also, learners are unable to install
software on their devices as they lack administrator permissions. Furthermore, they are unable to access Azure
subscriptions.
It is imperative that learners are able to execute Python-based data visualization code.
A. You should consider configuring the use of Azure Container Instance.

B. You should consider configuring the use of Azure BatchAI.
C. You should consider configuring the use of Azure Notebooks.
D. You should consider configuring the use of Azure Kubernetes Service.
Answer: C
Explanation:
Azure Notebooks are accessible by any device that can connect to the internet. The other options require
installation of additional software, which is not permitted in this scenario.
Reference:
https://notebooks.azure.com/
HOTSPOT -
Hot Area:
Answer:
Explanation:
Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using Chained
Equations (MICE), this option has the advantage of not requiring the application of predictors for each column.
Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for
datasets that have missing values in many columns.
Reference:
You have recently concluded the construction of a binary classification machine learning model.
You are currently assessing the model. You want to make use of a visualization that allows for precision to be used
as the measurement for the assessment.
A. You should consider using Venn diagram visualization.

B. You should consider using Receiver Operating Characteristic (ROC) curve visualization.
C. You should consider using Box plot visualization.
D. You should consider using the Binary classification confusion matrix visualization.
Answer: D
Explanation:
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#confusion-matri
x
A. Yes
B. No
Answer: B
Explanation:
k can be 5 or 10More specifically, k MUST be within: 1 < k <= N
DRAG DROP -
You are in the process of constructing a regression model.
You would like to make it a Poisson regression model. To achieve your goal, the feature values need to meet
certain conditions.
Which of the following are relevant conditions with regards to the label data? Answer by dragging the correct
options from the list to the answer area.
Select and Place:
Answer:
Explanation:
Poisson regression is intended for use in regression models that are used to predict numeric values, typically
counts. Therefore, you should use this module to create your regression model only if the values you are
trying to predict fit the following conditions:
✑ The response variable has a Poisson distribution.
✑ Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels.
✑ A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non-
whole numbers.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/poisson-regression
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Group Categorical Values Azure Machine Learning Studio module.
A. Yes
B. No
Answer: B
Explanation:
The typical use for grouping categorical values is to merge multiple string values into a single new level.
Recommendation: You must make use of the Join Data Azure Machine Learning Studio module.
A. Yes
B. No
Answer: B
Explanation:
Add Columns need to be used. Join data is needed only for database style joins join is used for datasets not for
columns b is correct
Reference:
https://docs.microsoft.com/bs-cyrl-ba/azure/machine-learning/component-reference/add-columns
https://docs.microsoft.com/bs-cyrl-ba/azure/machine-learning/component-reference/join-data
Recommendation: You must make use of the Edit Metadata Azure Machine Learning Studio module.
A. Yes
B. No
Answer: B
Explanation:
Edit meta data cannot add a new column, it can change properties of existing column
You have been tasked with ascertaining if two sets of data differ considerably. You will make use of Azure Machine
Learning Studio to complete your task.
You plan to perform a paired t-test.
Which of the following are conditions that must apply to use a paired t-test? (Choose all that apply.)
A. All scores are independent from each other.

B. You have a matched pairs of scores.
C. The sampling distribution of d is normal.
D. The sampling distribution of x1- x2 is normal.
Answer: BC
Explanation:
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/test-hypothesis-using-t-t
est
You want to train a classification model using data located in a comma-separated values (CSV) file.
The classification model will be trained via the Automated Machine Learning interface using the Classification task
type.
You have been informed that only linear models need to be assessed by the Automated Machine Learning.
A. You should disable deep learning.

B. You should enable automatic featurization.
C. You should disable automatic featurization.
D. You should set the task type to Forecasting.
Answer: A
Explanation:
Disabling automatic featurization does not cause the models evaluated to be linear only. Automatic
featurization has the following steps listed how-to-configure-auto-features#automatic-featurization The only
way to force linear algorithms to be evaluated is to use the blocked algorithms. list.
You are preparing to train a regression model via automated machine learning. The data available to you has
features with missing values, as well as categorical features with little discrete values.
You want to make sure that automated machine learning is configured as follows:
✑ missing values must be automatically imputed.
✑ categorical features must be encoded as part of the training task.
A. You should make use of the featurization parameter with the 'auto' value pair.
B. You should make use of the featurization parameter with the 'off' value pair.
C. You should make use of the featurization parameter with the 'on' value pair.
D. You should make use of the featurization parameter with the 'FeaturizationConfig' value pair.
Answer: A
Explanation:
Featurization str or FeaturizationConfig
Values: 'auto' / 'off' / FeaturizationConfig
Indicator for whether featurization step should be done automatically or not, or whether customized
featurization should be used.
Column type is automatically detected. Based on the detected column type preprocessing/featurization is
done as follows:
Categorical: Target encoding, one hot encoding, drop high cardinality categories, impute missing values.
Numeric: Impute missing values, cluster distance, weight of evidence.
DateTime: Several features such as day, seconds, minutes, hours etc.
Text: Bag of words, pre-trained Word embedding, text target encoding.

Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-automl-
You make use of Azure Machine Learning Studio to develop a linear regression model. You perform an experiment
to assess various algorithms.
Which of the following is an algorithm that reduces the variances between actual and predicted values?
A. Fast Forest Quantile Regression

B. Poisson Regression
C. Boosted Decision Tree Regression
D. Linear Regression
Answer: D
Explanation:
Linear regression minimizes the sum of squares, i.e. it minimizes S=Sum[(y_i-f_i)^2, i,1,n ], where y_i is the
actual value and f_i the predicted value (we can see it as the average).
Since the variance of y_i is S/n, minimizing S is equivalent to minimizing the variance.
You have been tasked with constructing a machine learning model that translates language text into a different
language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Convolutional Neural Networks (CNNs).
A. Yes
B. No
Answer: B
Explanation:
Use Reccurent Neural Network (RNN) for translations.
language text.
Recommendation: You make use of Generative Adversarial Networks (GANs).
A. Yes
B. No
Answer: B
Explanation:
GANs used for image translation
language text.
Recommendation: You make use of Recurrent Neural Networks (RNNs).
A. Yes
B. No
Answer: A
Explanation:
Note: RNNs are designed to take sequences of text as inputs or return sequences of text as outputs, or both.
They're called recurrent because the network's hidden layers have a loop in which the output and cell state
from each time step become inputs at the next time step. This recurrence serves as a form of memory.
It allows contextual information to flow through the network so that relevant outputs from previous time
steps can be applied to network operations at the current time step.
Reference:
https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571
DRAG DROP -
You have been tasked with evaluating the performance of a binary classification model that you created.
You need to choose evaluation metrics to achieve your goal.
Which of the following are the metrics you would choose? Answer by dragging the correct options from the list to
the answer area.
Select and Place:
Answer:
Explanation:
The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, F1 Score,
and AUC.
Note: A very natural question is: 'Out of the individuals whom the model, how many were classified correctly
(TP)?'
This question can be answered by looking at the Precision of the model, which is the proportion of positives
that are classified correctly.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance
DRAG DROP -
You build a binary classification model using the Azure Machine Learning Studio Two-Class Neural Network
module.
You are preparing to configure the Tune Model Hyperparameters module for the purpose of tuning accuracy for
the model.
Which of the following are valid parameters for the Two-Class Neural Network module? Answer by dragging the
correct options from the list to the answer area.
Select and Place:
Answer:
Explanation:
Random seed is a parameter for binary classification, but I do not understand "Hyperparameters"
Reference
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-neural-
network
You make use of Azure Machine Learning Studio to create a binary classification model.
You are preparing to carry out a parameter sweep of the model to tune hyperparameters. You have to make sure
that the sweep allows for every possible combination of hyperparameters to be iterated. Also, the computing
resources needed to carry out the sweep must be reduced.
A. You should consider making use of the Selective grid sweep mode.
B. You should consider making use of the Measured grid sweep mode.
C. You should consider making use of the Entire grid sweep mode.
D. You should consider making use of the Random grid sweep mode.
Answer: D
Explanation:
Maximum number of runs on random grid: This option also controls the number of iterations over a random
sampling of parameter values, but the values are not generated randomly from the specified range; instead, a
matrix is created of all possible combinations of parameter values and a random sampling is taken over the
matrix. This method is more efficient and less prone to regional oversampling or undersampling.
If you are training a model that supports an integrated parameter sweep, you can also set a range of seed
values to use and iterate over the random seeds as well. This is optional, but can be useful for avoiding bias
introduced by seed selection.
C: Entire grid: When you select this option, the module loops over a grid predefined by the system, to try
different combinations and identify the best learner. This option is useful for cases where you don't know
what the best parameter settings might be and want to try all possible combination of values.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparam
eters
You are in the process of constructing a deep convolutional neural network (CNN). The CNN will be used for image
classification.
You notice that the CNN model you constructed displays hints of overfitting.
You want to make sure that overfitting is minimized, and that the model is converged to an optimal fit.
Which of the following is TRUE with regards to achieving your goal?
A. You have to add an additional dense layer with 512 input units, and reduce the amount of training data.
B. You have to add L1/L2 regularization, and reduce the amount of training data.
C. You have to reduce the amount of training data and make use of training data augmentation.
D. You have to add L1/L2 regularization, and make use of training data augmentation.
E. You have to add an additional dense layer with 512 input units, and add L1/L2 regularization.
Answer: D
Explanation:
Discarded because that is increasing the complexity of the architecture. B, C are suggesting reducing the
amount of data. D will generate more data for the CNN to be able to generalize more.
You are planning to make use of Azure Machine Learning designer to train models.
You need choose a suitable compute type.
Recommendation: You choose Attached compute.
A. Yes
B. No
Answer: A
Explanation:
Because we can use databricks or vm as attached compute for training purposes
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-studio
Recommendation: You choose Inference cluster.
A. Yes
B. No
Answer: B
Explanation:
Reference:
Recommendation: You choose Compute cluster.
A. Yes
B. No
Answer: A
Explanation:
Says even Compute Instance is possible Answer
Reference:
You are making use of the Azure Machine Learning to designer construct an experiment.
After dividing a dataset into training and testing sets, you configure the algorithm to be Two-Class Boosted
Decision Tree.
You are preparing to ascertain the Area Under the Curve (AUC).
Which of the following is a sequential combination of the models required to achieve your goal?
A. Train, Score, Evaluate.

B. Score, Evaluate, Train.
C. Evaluate, Export Data, Train.
D. Train, Score, Export Data.
Answer: A
Explanation:
Train,score,evaluate
DRAG DROP
-
You create an Azure Machine Learning workspace.
You must implement dedicated compute for model training in the workspace by using Azure Synapse compute
resources. The solution must attach the dedicated compute and start an Azure Synapse session.
You need to implement the computer resources.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of
actions to the answer area and arrange them in the correct order.
Answer:
You create an Azure Machine Learning workspace. You are preparing a local Python environment on a laptop
computer. You want to use the laptop to connect to the workspace and run experiments.
You create the following config.json file.
You must use the Azure Machine Learning SDK to interact with data and experiments in the workspace.
You need to configure the config.json file to connect to the workspace from the Python environment.
Which two additional parameters must you add to the config.json file in order to connect to the workspace? Each
correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. login
B. resource_group
C. subscription_id
D. key
E. region
Answer: BC
Explanation:
To use the same workspace in multiple environments, create a JSON configuration file. The configuration file
saves your subscription (subscription_id), resource
(resource_group), and workspace name so that it can be easily loaded.
The following sample shows how to create a workspace.
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace
You create a batch inference pipeline by using the Azure ML SDK. You configure the pipeline parameters by
executing the following code:
You need to obtain the output from the pipeline execution.

Where will you find the output?
A. the digit_identification.py script

B. the debug log
C. the Activity Log in the Azure portal for the Machine Learning workspace
D. the Inference Clusters tab in Machine Learning studio
E. a file named parallel_run_step.txt located in the output folder
Answer: E
Explanation:
output_action (str): How the output is to be organized. Currently supported values are 'append_row' and
'summary_only'.
✑ 'append_row' " All values output by run() method invocations will be aggregated into one unique file named
parallel_run_step.txt that is created in the output location.
'summary_only'
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.p
arallelrunconfig
You write a Python script that processes data in a comma-separated values (CSV) file.
You plan to run this script as an Azure Machine Learning experiment.
The script loads the data and determines the number of rows it contains using the following code:
You need to record the row count as a metric named row_count that can be returned using the get_metrics method
of the Run object after the experiment run completes.
Which code should you use?
A. run.upload_file('row_count', './data.csv')
B. run.log('row_count', rows)
C. run.tag('row_count', rows)
D. run.log_table('row_count', rows)
E. run.log_row('row_count', rows)
Answer: B
Explanation:
Log a numerical or string value to the run with the given name using log(name, value, description=''). Logging
a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same
metric multiple times within a run, the result being considered a vector of that metric.
Example: run.log("accuracy", 0.95)
Incorrect Answers:
E: Using log_row(name, description=None, **kwargs) creates a metric with multiple columns as described in
kwargs. Each named parameter generates a column with the value specified. log_row can be called once to
log an arbitrary tuple, or multiple times in a loop to generate a complete table.
Example: run.log_row("Y over X", x=1, y=0.4)
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run
You are developing a hands-on workshop to introduce Docker for Windows to attendees.
You need to ensure that workshop attendees can install Docker on their devices.
Which two prerequisite components should attendees install on the devices? Each correct answer presents part of
the solution.
A. Microsoft Hardware-Assisted Virtualization Detection Tool

B. Kitematic
C. BIOS-enabled virtualization
D. VirtualBox
E. Windows 10 64-bit Professional
Answer: CE
Explanation:
C: Make sure your Windows system supports Hardware Virtualization Technology and that virtualization is
enabled.
Ensure that hardware virtualization support is turned on in the BIOS settings. For example:
E: To run Docker, your machine must have a 64-bit operating system running Windows 7 or higher.
Reference:
https://docs.docker.com/toolbox/toolbox_install_windows/
https://blogs.technet.microsoft.com/canitpro/2015/09/08/step-by-step-enabling-hyper-v-for-use-on-windows
-10/
Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
✑ support Python and Scala
✑ compose data storage, movement, and processing services into automated data pipelines
✑ the same tool should be used for the orchestration of both data engineering and data science
✑ support workload isolation and interactive workloads
✑ enable scaling across a cluster of machines
You need to create the environment.
What should you do?
A. Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
B. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
C. Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.
D. Build the environment in Azure Databricks and use Azure Container Instances for orchestration.
Answer: B
Explanation:
In Azure Databricks, we can create two different types of clusters.
✑ Standard, these are the default clusters and can be used with Python, R, Scala and SQL
✑ High-concurrency
Azure Databricks is fully integrated with Azure Data Factory.
Incorrect Answers:
D: Azure Container Instances is good for development or testing. Not suitable for production workloads.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-science-and-machi
ne-learning
DRAG DROP -
You are building an intelligent solution using machine learning models.
The environment must support the following requirements:
✑ Data scientists must build notebooks in a cloud environment
✑ Data scientists must use automatic feature engineering and model building in machine learning pipelines.
✑ Notebooks must be deployed to retrain using Spark instances with dynamic worker allocation.
✑ Notebooks must be exportable to be version controlled locally.
You need to create the environment.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of
actions to the answer area and arrange them in the correct order.
Select and Place:
Answer:
Explanation:
Create Azure Databricks cluster -> Install Azure ML SDK for Python -> Create and exec Jupyter notebook
using AutoML -> Export Jupyter to local env. That because you need auto feature engineering provided by
autoML
You plan to build a team data science environment. Data for training models in machine learning pipelines will be
over 20 GB in size.
You have the following requirements:
✑ Models must be built using Caffe2 or Chainer frameworks.
✑ Data scientists must be able to use a data science environment to build the machine learning pipelines and train
models on their personal devices in both connected and disconnected network environments.
Personal devices must support updating machine learning pipelines when connected to a network.
You need to select a data science environment.
Which environment should you use?
A. Azure Machine Learning Service

B. Azure Machine Learning Studio
C. Azure Databricks
D. Azure Kubernetes Service (AKS)
Answer: A
Explanation:
The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft's Azure cloud built
specifically for doing data science. Caffe2 and Chainer are supported by DSVM.
DSVM integrates with Azure Machine Learning.
Incorrect Answers:
B: Use Machine Learning Studio when you want to experiment with machine learning models quickly and
easily, and the built-in machine learning algorithms are sufficient for your solutions.
Reference:
You are implementing a machine learning model to predict stock prices.
The model uses a PostgreSQL database and requires GPU processing.
You need to create a virtual machine that is pre-configured with the required tools.
What should you do?
A. Create a Data Science Virtual Machine (DSVM) Windows edition.

B. Create a Geo Al Data Science Virtual Machine (Geo-DSVM) Windows edition.
C. Create a Deep Learning Virtual Machine (DLVM) Linux edition.
D. Create a Deep Learning Virtual Machine (DLVM) Windows edition.
Answer: C
Explanation:
1. Only Windows and Linux Ubuntu two options when talking about machine / deep learning VMs; unless very
specific MS / Windows products, e.g., SQL Server, otherwise always select Linux Ubuntu as your answer
You are developing deep learning models to analyze semi-structured, unstructured, and structured data types.
You have the following data available for model building:
✑ Video recordings of sporting events
✑ Transcripts of radio commentary about events
✑ Logs from related social media feeds captured during sporting events
You need to select an environment for creating the model.
Which environment should you use?
A. Azure Cognitive Services

B. Azure Data Lake Analytics
C. Azure HDInsight with Spark MLib
D. Azure Machine Learning Studio
Answer: A
Explanation:
Azure Cognitive Services expand on Microsoft's evolving portfolio of machine learning APIs and enable
developers to easily add cognitive features " such as emotion and video detection; facial, speech, and vision
recognition; and speech and language understanding " into their applications. The goal of Azure Cognitive
Services is to help developers create applications that can see, hear, speak, understand, and even begin to
reason. The catalog of services within Azure Cognitive
Services can be categorized into five main pillars - Vision, Speech, Language, Search, and Knowledge.
Reference:
https://docs.microsoft.com/en-us/azure/cognitive-services/welcome
You must store data in Azure Blob Storage to support Azure Machine Learning.
You need to transfer the data into Azure Blob Storage.
What are three possible ways to achieve the goal? Each correct answer presents a complete solution.
A. Bulk Insert SQL Query

B. AzCopy
C. Python script
D. Azure Storage Explorer
E. Bulk Copy Program (BCP)
Answer: BCD
Explanation:
You can move data to and from Azure Blob storage using different technologies:
✑ Azure Storage-Explorer
✑ AzCopy
✑ Python
✑ SSIS
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-azure-blob
You are moving a large dataset from Azure Machine Learning Studio to a Weka environment.
You need to format the data for the Weka environment.
Which module should you use?
A. Convert to CSV
B. Convert to Dataset
C. Convert to ARFF
D. Convert to SVMLight
Answer: C
Explanation:
Use the Convert to ARFF module in Azure Machine Learning Studio, to convert datasets and results in Azure
Machine Learning to the attribute-relation file format used by the Weka toolset. This format is known as
ARFF.
The ARFF data specification for Weka supports multiple machine learning tasks, including data
preprocessing, classification, and feature selection. In this format, data is organized by entites and their
attributes, and is contained in a single text file.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-arff
You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual
Machine (DSVM).
What should you recommend?
A. Rattle
B. TensorFlow
C. Weka
D. Scikit-learn
Answer: B
Explanation:
TensorFlow is an open-source library for numerical computation and large-scale machine learning. It uses
Python to provide a convenient front-end API for building applications with the framework
TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition,
word embeddings, recurrent neural networks, sequence- to-sequence models for machine translation, natural
language processing, and PDE (partial differential equation) based simulations.
Incorrect Answers:
A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.
C: Weka is used for visual data mining and machine learning software in Java.
D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy, SciPy and
matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction.
Reference:
https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html
You plan to use a Data Science Virtual Machine (DSVM) with the open source deep learning frameworks Caffe2
and PyTorch.
You need to select a pre-configured DSVM to support the frameworks.
What should you create?
A.Data Science Virtual Machine for Windows 2012

B.Data Science Virtual Machine for Linux (CentOS)
C.Geo AI Data Science Virtual Machine with ArcGIS
D.Data Science Virtual Machine for Windows 2016
E.Data Science Virtual Machine for Linux (Ubuntu)
Answer: E
Explanation:
Caffe2 and PyTorch is supported by Data Science Virtual Machine for Linux.
Microsoft offers Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4.
Only the DSVM on Ubuntu is preconfigured for Caffe2 and PyTorch.
Incorrect Answers:
D: Caffe2 and PytOCH are only supported in the Data Science Virtual Machine for Linux.
Reference:
HOTSPOT -
You are performing sentiment analysis using a CSV file that includes 12,000 customer reviews written in a short
sentence format. You add the CSV file to Azure
Machine Learning Studio and configure it as the starting point dataset of an experiment. You add the Extract N-
Gram Features from Text module to the experiment to extract key phrases from the customer review column in the
dataset.
You must create a new n-gram dictionary from the customer review text and set the maximum n-gram size to
trigrams.
What should you select? To answer, select the appropriate options in the answer area.
Hot Area:
Answer:
Explanation:
Vocabulary mode: Create -
For Vocabulary mode, select Create to indicate that you are creating a new list of n-gram features.
N-Grams size: 3 -
For N-Grams size, type a number that indicates the maximum size of the n-grams to extract and store. For
example, if you type 3, unigrams, bigrams, and trigrams will be created.
Weighting function: Leave blank -

The option, Weighting function, is required only if you merge or update vocabularies. It specifies how terms in
the two vocabularies and their scores should be weighted against each other.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/extract-n-gram-features-
from-text
You are developing a data science workspace that uses an Azure Machine Learning service.
You need to select a compute target to deploy the workspace.
What should you use?
A. Azure Data Lake Analytics

B. Azure Databricks
C. Azure Container Service
D. Apache Spark for HDInsight
Answer: C
Explanation:
Azure Container Instances can be used as compute target for testing or development. Use for low-scale CPU-
based workloads that require less than 48 GB of
RAM.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where
You are solving a classification task.
The dataset is imbalanced.
You need to select an Azure Machine Learning Studio module to improve the classification accuracy.
A. Permutation Feature Importance

B. Filter Based Feature Selection
C. Fisher Linear Discriminant Analysis
D. Synthetic Minority Oversampling Technique (SMOTE)
Answer: D
Explanation:
Use the SMOTE module in Azure Machine Learning Studio (classic) to increase the number of
underrepresented cases in a dataset used for machine learning.
SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.
You connect the SMOTE module to a dataset that is imbalanced. There are many reasons why a dataset might
be imbalanced: the category you are targeting might be very rare in the population, or the data might simply
be difficult to collect. Typically, you use SMOTE when the class you want to analyze is underrepresented.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
DRAG DROP -
You configure a Deep Learning Virtual Machine for Windows.
You need to recommend tools and frameworks to perform the following:
✑ Build deep neural network (DNN) models
✑ Perform interactive data exploration and visualization
Which tools and frameworks should you recommend? To answer, drag the appropriate tools to the correct tasks.
Each tool may be used once, more than once, or not at all. You may need to drag the split bar between panes or
scroll to view content.
Select and Place:
Answer:
Explanation:
Be Microsoft Cognitive Kit. Vowpal Wabbit is for Machine Learning. I will go with microsoft cognitive toolkit
for DNN:
PowerBI Desktop -Power BI Desktop is a powerful visual data exploration and interactive reporting toolBI is a
name given to a modern approach to business decision making in which users are empowered to find, explore,
and share insights from data across the enterprise.
Reference:
https://docs.microsoft.com/en-us/cognitive-toolkit/examples#c-examples
You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.
A. Assign Data to Clusters

B. Load Trained Model
C. Partition and Sample
D. Tune Model-Hyperparameters
Answer: C
Explanation:
Partition and Sample with the Stratified split option outputs multiple datasets, partitioned using the rules you
specified.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample
DRAG DROP -
You are creating an experiment by using Azure Machine Learning Studio.
You must divide the data into four subsets for evaluation. There is a high degree of missing values in the data. You
must prepare the data for analysis.
You need to select appropriate methods for producing the experiment.
Which three modules should you run in sequence? To answer, move the appropriate actions from the list of actions
to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you
select.
Select and Place:
Answer:
Explanation:
The Clean Missing Data module in Azure Machine Learning Studio, to remove, replace, or infer missing values.
Incorrect Answers:
✑ Latent Direchlet Transformation: Latent Dirichlet Allocation module in Azure Machine Learning Studio, to
group otherwise unclassified text into a number of categories. Latent Dirichlet Allocation (LDA) is often used
in natural language processing (NLP) to find texts that are similar. Another common term is topic modeling.
✑ Build Counting Transform: Build Counting Transform module in Azure Machine Learning Studio, to analyze
training data. From this data, the module builds a count table as well as a set of count-based features that can
be used in a predictive model.
Missing Value Scrubber: The Missing Values Scrubber module is deprecated.
✑ Feature hashing: Feature hashing is used for linguistics, and works by converting unique tokens into
integers.
✑ Replace discrete values: the Replace Discrete Values module in Azure Machine Learning Studio is used to
generate a probability score that can be used to represent a discrete value. This score can be useful for
understanding the information value of the discrete values.
Reference:
HOTSPOT -
You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the system
clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
Hot Area:
Answer:
Explanation:
Box 1: Sampling -
Create a sample of data -

This option supports simple random sampling or stratified random sampling. This is useful if you want to
create a smaller representative sample dataset for testing.
1. Add the Partition and Sample module to your experiment in Studio, and connect the dataset.
2. Partition or sample mode: Set this to Sampling.
3. Rate of sampling. See box 2 below.
Box 2: 0 -
3. Rate of sampling. Random seed for sampling: Optionally, type an integer to use as a seed value.
This option is important if you want the rows to be divided the same way every time. The default value is 0,
meaning that a starting seed is generated based on the system clock. This can lead to slightly different
results each time you run the experiment.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample
You are creating a machine learning model. You have a dataset that contains null rows.
You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify and resolve the null
and missing data in the dataset.
Which parameter should you use?
A. Replace with mean

B. Remove entire column
C. Remove entire row
D. Hot Deck
E. Custom substitution value
F. Replace with mode
Answer: C
Explanation:
Remove entire row: Completely removes any row in the dataset that has one or more missing values. This is
useful if the missing value can be considered randomly missing.
Reference:
HOTSPOT -
The finance team asks you to train a model using data in an Azure Storage blob container named finance-data.
You need to register the container as a datastore in an Azure Machine Learning workspace and ensure that an
error will be raised if the container does not exist.
How should you complete the code? To answer, select the appropriate options in the answer area.
Hot Area:
Answer:
Explanation:
Box 1: register_azure_blob_container
Register an Azure Blob Container to the datastore.
Box 2: create_if_not_exists = False
Create the file share if it does not exist, defaults to False.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore
You plan to provision an Azure Machine Learning Basic edition workspace for a data science project.
You need to identify the tasks you will be able to perform in the workspace.
Which three tasks will you be able to perform? Each correct answer presents a complete solution.
A. Create a Compute Instance and use it to run code in Jupyter notebooks.

B. Create an Azure Kubernetes Service (AKS) inference cluster.
C. Use the designer to train a model by dragging and dropping pre-defined modules.
D. Create a tabular dataset that supports versioning.
E. Use the Automated Machine Learning user interface to train a model.
Answer: ABD
Explanation:
Incorrect Answers:
C, E: The UI is included the Enterprise edition only.
Reference:
https://azure.microsoft.com/en-us/pricing/details/machine-learning/
HOTSPOT -
A coworker registers a datastore in a Machine Learning services workspace by using the following code:
You need to write code to access the datastore from a notebook.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
Hot Area:
Answer:
Explanation:
Box 1: DataStore -
To get a specific datastore registered in the current workspace, use the get() static method on the Datastore
class:
# Get a named datastore from the current workspace
datastore = Datastore.get(ws, datastore_name='your datastore name')
Box 2: ws -
Box 3: demo_datastore -
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file is stored in
a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container
for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a
parent folder named sales to create the following hierarchical structure:
At the end of each month, a new folder with that month's sales file is added to the sales folder.
You plan to use the sales data to train a machine learning model based on the following requirements:
✑ You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to
a dataframe.
✑ You must be able to create experiments that use only data that was created before a specific previous month,
ignoring any data that was added after that month.
✑ You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace.
What should you do?
A. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing the
existing dataset and specifying a tag named month indicating the month and year it was registered. Use this
dataset for all experiments.
B. Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register the
dataset with the name sales_dataset and a tag named month indicating the month and year it was registered,
and use this dataset for all experiments.
C. Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with
appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for
experiments.
D. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a
tag named month indicating the month and year it was registered. Use this dataset for all experiments,
identifying the version to be used based on the month tag as necessary.
Answer: D
Explanation:
B does not allow you to get the data from before a specific month. With D you create only one dataset with
multiple versions (1 version per month). Similar example in 'Versioning best practice': the solution. Because we
dont want to add data before specific month. So the option B adding everything is not desirable.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-version-track-datasets
DRAG DROP -
An organization uses Azure Machine Learning service and wants to expand their use of machine learning.
You have the following compute environments. The organization does not want to create another compute
environment.
You need to determine which compute environment to use for the following scenarios.
Which compute types should you use? To answer, drag the appropriate compute environments to the correct
scenarios. Each compute environment may be used once, more than once, or not at all. You may need to drag the
split bar between panes or scroll to view content.
Select and Place:
Answer:
Explanation:
mlc_cluster, aks_cluster
HOTSPOT -
You create an Azure Machine Learning compute target named ComputeOne by using the STANDARD_D1 virtual
machine image.
ComputeOne is currently idle and has zero active nodes.
You define a Python variable named ws that references the Azure Machine Learning workspace. You run the
following Python code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
Hot Area:
Answer:
Explanation:
Box 1: no
Box 2: Yes -
Box 3: yes
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.computetarget
HOTSPOT -
You are developing a deep learning model by using TensorFlow. You plan to run the model training workload on an
Azure Machine Learning Compute Instance.
You must use CUDA-based model training.
You need to provision the Compute Instance.
Which two virtual machines sizes can you use? To answer, select the appropriate virtual machine sizes in the
answer area.
Hot Area:
Answer:
Explanation:
CUDA is a parallel computing platform and programming model developed by Nvidia for general computing
on its own GPUs (graphics processing units). CUDA enables developers to speed up compute-intensive
applications by harnessing the power of GPUs for the parallelizable part of the computation.
Reference:
https://www.infoworld.com/article/3299703/what-is-cuda-parallel-programming-for-gpus.html
DRAG DROP -
You are analyzing a raw dataset that requires cleaning.
You must perform transformations and manipulations by using Azure Machine Learning Studio.
You need to identify the correct modules to perform the transformations.
Which modules should you choose? To answer, drag the appropriate modules to the correct scenarios. Each
module may be used once, more than once, or not at all.
You may need to drag the split bar between panes or scroll to view content.
Select and Place:
Answer:
Explanation:
Box 1: Clean Missing Data -
Box 2: SMOTE -
Use the SMOTE module in Azure Machine Learning Studio to increase the number of underepresented cases
in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than
simply duplicating existing cases.
Box 3: Convert to Indicator Values
Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this module is
to convert columns that contain categorical values into a series of binary indicator columns that can more
easily be used as features in a machine learning model.
Box 4: Remove Duplicate Rows -
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote https://docs.micro
soft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-indicator-values
Note: This question is part of a series of questions that present the same scenario. Each question in the series
contains a unique solution that might meet the stated goals. Some question sets might have more than one correct
solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear in the review screen.
You are using Azure Machine Learning Studio to perform feature engineering on a dataset.
You need to normalize values to produce a feature column grouped into bins.
Solution: Apply an Entropy Minimum Description Length (MDL) binning mode.
Does the solution meet the goal?
A. Yes
B. No
Answer: B
Explanation:
Apply a Quantiles normalization with a QuantileIndex normalization. you can specify the following binning
modes: - Entropy MDL - Quantiles - Equal Width - Custom Edges - Equal Width with Custom Start and Stop
From all of these, the only binning mode which supports normalization is Quantiles. In particular, Entropy MDL
does NOT support normalization.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-data-into-bins
HOTSPOT -
You are preparing to use the Azure ML SDK to run an experiment and need to create compute. You run the
following code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
Hot Area:
Answer:
Explanation:
Box 1: No -
If a compute cluster already exists it will be used.
Box 2: No
Box 3: Yes -
Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted.
Box 4: No -
Need to use training_compute.delete() to deprovision and delete the AmlCompute target.
Reference:
https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-
azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb https://docs.microsoft.com/en-
us/python/api/azureml-core/azureml.core.compute.computetarget
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply a Quantiles normalization with a QuantileIndex normalization.
A. Yes
B. No
Answer: A
Explanation:
If you select the Quantiles binning mode, use the Quantile normalization option to determine how values are
normalized prior to sorting into quantiles. Note that normalizing values transforms the values, but does not
affect the final number of bins. For an example, see Effects of Different Normalization Methods.
The following normalization types are supported:
Percent: Values are normalized within the range [0,100]
PQuantile: Values are normalized within the range [0,1]
QuantileIndex: Values are normalized within the range [1,number of bins]
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Scale and Reduce sampling mode.
A. Yes
B. No
Answer: B
Explanation:
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine
learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing
cases.
Incorrect Answers:
Common data tasks for the Scale and Reduce sampling mode include clipping, binning, and normalizing
numerical values.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote https://docs.micro
soft.com/en-us/azure/machine-learning/studio-module-reference/data-transformation-scale-and-reduce
You are analyzing a dataset by using Azure Machine Learning Studio.
You need to generate a statistical summary that contains the p-value and the unique count for each feature
column.
Which two modules can you use? Each correct answer presents a complete solution.
A. Computer Linear Correlation

B. Export Count Table
C. Execute Python Script
D. Convert to Indicator Values
E. Summarize Data
Answer: BE
Explanation:
The Export Count Table module is provided for backward compatibility with experiments that use the Build
Count Table (deprecated) and Count Featurizer
(deprecated) modules.
E: Summarize Data statistics are useful when you want to understand the characteristics of the complete
dataset. For example, you might need to know:
✑ How many missing values are there in each column?
✑ How many unique values are there in a feature column?
✑ What is the mean and standard deviation for each column?
✑ The module calculates the important scores for each column, and returns a row of summary statistics for
each variable (data column) provided as input.
Incorrect Answers:
A: The Compute Linear Correlation module in Azure Machine Learning Studio is used to compute a set of
Pearson correlation coefficients for each possible pair of variables in the input dataset.
C: With Python, you can perform tasks that aren't currently supported by existing Studio modules such as:
Visualizing data using matplotlib
Using Python libraries to enumerate datasets and models in your workspace
Reading, loading, and manipulating data from sources not supported by the Import Data module
D: The purpose of the Convert to Indicator Values module is to convert columns that contain categorical
values into a series of binary indicator columns that can more easily be used as features in a machine learning
model.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/export-count-table https:
//docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/summarize-data
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the
feature set.
You need to analyze a full dataset to include all values.
Solution: Use the Last Observation Carried Forward (LOCF) method to impute the missing data points.
A. Yes
B. No
Answer: A
Explanation:
Absolutely possible to replace data with LOCF to keep the same number of rows and columns of the original
dataset! That is the only requirements for the question!
HOTSPOT -
You are creating a machine learning model in Python. The provided dataset contains several numerical columns
and one text column. The text column represents a product's category. The product category will always be one of
the following:
✑ Bikes
✑ Cars
✑ Vans
✑ Boats
You are building a regression model using the scikit-learn Python package.
You need to transform the text data to be compatible with the scikit-learn Python package.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
Hot Area:
Answer:
Explanation:
Box 1: pandas as df -
Pandas takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and
columns called data frame that looks very similar to table in a statistical software (think Excel or SPSS for
example.
Box 2: transpose[ProductCategoryMapping]
Reshape the data from the pandas Series to columns.
Reference:
https://datascienceplus.com/linear-regression-in-python/
Thank you
Thank you for being so interested in the premium exam material.
I'm glad to hear that you found it informative and helpful.
But Wait
I wanted to let you know that there is more content available in the full version.
The full paper contains additional sections and information that you may find helpful,
and I encourage you to download it to get a more comprehensive and detailed view of
all the subject matter.
Download Full Version Now
Total: 484 Questions

Link: https://certyiq.com/papers?provider=microsoft&exam=dp-100

dp-100 C86e3e2d5342

Uploaded by

Copyright:

Available Formats

dp-100 C86e3e2d5342

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

dp-100 C86e3e2d5342

Uploaded by

Copyright:

Available Formats

Certy IQ

Premium exam material

Designing and Implementing a Data Science Solution on Azure

Total: 484 Questions

A. Windows Server 2012 DSVM

A. You should consider including Rattle.

Leave One Out (LOO) cross-validation

A. You should make use of the Split Data module.

A. You should make use of the Convert to TXT module.

A. You should consider configuring the use of Azure Container Instance.

A. You should consider using Venn diagram visualization.

k can be 5 or 10More specifically, k MUST be within: 1 < k <= N

A. All scores are independent from each other.

A. You should disable deep learning.

Featurization str or FeaturizationConfig

Values: 'auto' / 'off' / FeaturizationConfig

Numeric: Impute missing values, cluster distance, weight of evidence.

DateTime: Several features such as day, seconds, minutes, hours etc.

Text: Bag of words, pre-trained Word embedding, text target encoding.

A. Fast Forest Quantile Regression

Use Reccurent Neural Network (RNN) for translations.

GANs used for image translation

Because we can use databricks or vm as attached compute for training purposes

Says even Compute Instance is possible Answer

A. Train, Score, Evaluate.

You create an Azure Machine Learning workspace.

You need to implement the computer resources.

You need to obtain the output from the pipeline execution.

A. the digit_identification.py script

A. Microsoft Hardware-Assisted Virtualization Detection Tool

A. Azure Machine Learning Service

DSVM integrates with Azure Machine Learning.

A. Create a Data Science Virtual Machine (DSVM) Windows edition.

A. Azure Cognitive Services

A. Bulk Insert SQL Query

A.Data Science Virtual Machine for Windows 2012

Weighting function: Leave blank -

A. Azure Data Lake Analytics

A. Permutation Feature Importance

A. Assign Data to Clusters

Create a sample of data -

A. Replace with mean

A. Create a Compute Instance and use it to run code in Jupyter notebooks.

Box 4: Remove Duplicate Rows -

If a compute cluster already exists it will be used.

Need to use training_compute.delete() to deprovision and delete the AmlCompute target.

The following normalization types are supported:

Percent: Values are normalized within the range [0,100]

PQuantile: Values are normalized within the range [0,1]

QuantileIndex: Values are normalized within the range [1,number of bins]

A. Computer Linear Correlation

Download Full Version Now

Total: 484 Questions

You might also like