0% found this document useful (0 votes)

380 views5 pages

AWS Certified Machine Learning - Specialty - Sample Questions

The document contains sample exam questions for a Machine Learning Specialty certification. The questions cover topics like accelerating model training, building classification models, handling imbalanced and missing data, and feature extraction. Correct answers are provided for each question.

Uploaded by

kaushal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

380 views5 pages

AWS Certified Machine Learning - Specialty - Sample Questions

Uploaded by

kaushal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine Learning - Specialty (MLS-C01)

Sample Exam Questions

1) A Machine Learning team has several large CSV datasets in Amazon S3. Historically, models built
with the Amazon SageMaker Linear Learner algorithm have taken hours to train on similar-sized
datasets. The team’s leaders need to accelerate the training process.

What can a Machine Learning Specialist do to address this concern?

A. Use Amazon SageMaker Pipe mode.

B. Use Amazon Machine Learning to train the models.
C. Use Amazon Kinesis to stream the data to Amazon SageMaker.
D. Use AWS Glue to transform the CSV dataset to the JSON format.

2) A term frequency–inverse document frequency (tf–idf) matrix using both unigrams and bigrams is
built from a text corpus consisting of the following two sentences:

1. Please call the number below.

2. Please do not call us.

What are the dimensions of the tf–idf matrix?

A. (2, 16)
B. (2, 8)
C. (2, 10)
D. (8, 10)

3) A company is setting up a system to manage all of the datasets it stores in Amazon S3. The company
would like to automate running transformation jobs on the data and maintaining a catalog of the
metadata concerning the datasets. The solution should require the least amount of setup and
maintenance.

Which solution will allow the company to achieve its goals?

A. Create an Amazon EMR cluster with Apache Hive installed. Then, create a Hive metastore and a script to
run transformation jobs on a schedule.
B. Create an AWS Glue crawler to populate the AWS Glue Data Catalog. Then, author an AWS Glue ETL
job, and set up a schedule for data transformation jobs.
C. Create an Amazon EMR cluster with Apache Spark installed. Then, create an Apache Hive metastore
and a script to run transformation jobs on a schedule.
D. Create an AWS Data Pipeline that transforms the data. Then, create an Apache Hive metastore and a
script to run transformation jobs on a schedule.

4) A Data Scientist is working on optimizing a model during the training process by varying multiple
parameters. The Data Scientist observes that, during multiple runs with identical parameters, the loss
function converges to different, yet stable, values.

What should the Data Scientist do to improve the training process?

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved | aws.amazon.com 1|Page
Machine Learning - Specialty (MLS-C01)
Sample Exam Questions
A. Increase the learning rate. Keep the batch size the same.
B. Reduce the batch size. Decrease the learning rate.
C. Keep the batch size the same. Decrease the learning rate.
D. Do not change the learning rate. Increase the batch size.

5) A Data Scientist is evaluating different binary classification models. A false positive result is 5 times
more expensive (from a business perspective) than a false negative result.

The models should be evaluated based on the following criteria:

1) Must have a recall rate of at least 80%
2) Must have a false positive rate of 10% or less
3) Must minimize business costs

After creating each binary classification model, the Data Scientist generates the corresponding
confusion matrix.

Which confusion matrix represents the model that satisfies the requirements?

A. TN = 91, FP = 9
FN = 22, TP = 78
B. TN = 99, FP = 1
FN = 21, TP = 79
C. TN = 96, FP = 4
FN = 10, TP = 90
D. TN = 98, FP = 2
FN = 18, TP = 82

6) A Data Scientist uses logistic regression to build a fraud detection model. While the model accuracy
is 99%, 90% of the fraud cases are not detected by the model.

What action will definitively help the model detect more than 10% of fraud cases?

A. Using undersampling to balance the dataset

B. Decreasing the class probability threshold
C. Using regularization to reduce overfitting
D. Using oversampling to balance the dataset

7) A company is interested in building a fraud detection model. Currently, the Data Scientist does not
have a sufficient amount of information due to the low number of fraud cases.

Which method is MOST likely to detect the GREATEST number of valid fraud cases?

A. Oversampling using bootstrapping

B. Undersampling
C. Oversampling using SMOTE
D. Class weight adjustment

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved | aws.amazon.com 2|Page
Machine Learning - Specialty (MLS-C01)
Sample Exam Questions
8) A Machine Learning Engineer is preparing a data frame for a supervised learning task with the
Amazon SageMaker Linear Learner algorithm. The ML Engineer notices the target label classes are
highly imbalanced and multiple feature columns contain missing values. The proportion of missing
values across the entire data frame is less than 5%.

What should the ML Engineer do to minimize bias due to missing values?

A. Replace each missing value by the mean or median across non-missing values in same row.
B. Delete observations that contain missing values because these represent less than 5% of the data.
C. Replace each missing value by the mean or median across non-missing values in the same column.
D. For each feature, approximate the missing values using supervised learning based on other features.

9) A company has collected customer comments on its products, rating them as safe or unsafe, using
decision trees. The training dataset has the following features: id, date, full review, full review
summary, and a binary safe/unsafe tag. During training, any data sample with missing features was
dropped. In a few instances, the test set was found to be missing the full review text field.

For this use case, which is the most effective course of action to address test data samples with
missing features?

A. Drop the test samples with missing full review text fields, and then run through the test set.
B. Copy the summary text fields and use them to fill in the missing full review text fields, and then run
through the test set.
C. Use an algorithm that handles missing data better than decision trees.
D. Generate synthetic data to fill in the fields that are missing data, and then run through the test set.

10) An insurance company needs to automate claim compliance reviews because human reviews are
expensive and error-prone. The company has a large set of claims and a compliance label for each.
Each claim consists of a few sentences in English, many of which contain complex related
information. Management would like to use Amazon SageMaker built-in algorithms to design a
machine learning supervised model that can be trained to read each claim and predict if the claim is
compliant or not.

Which approach should be used to extract features from the claims to be used as inputs for the
downstream supervised task?

A. Derive a dictionary of tokens from claims in the entire dataset. Apply one-hot encoding to tokens found in
each claim of the training set. Send the derived features space as inputs to an Amazon SageMaker built-
in supervised learning algorithm.
B. Apply Amazon SageMaker BlazingText in Word2Vec mode to claims in the training set. Send the derived
features space as inputs for the downstream supervised task.
C. Apply Amazon SageMaker BlazingText in classification mode to labeled claims in the training set to
derive features for the claims that correspond to the compliant and non-compliant labels, respectively.
D. Apply Amazon SageMaker Object2Vec to claims in the training set. Send the derived features space as
inputs for the downstream supervised task.

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved | aws.amazon.com 3|Page
Machine Learning - Specialty (MLS-C01)
Sample Exam Questions
Answers
1) A – Amazon SageMaker Pipe mode streams the data directly to the container, which improves the
performance of training jobs. (Refer to this link for supporting information.) In Pipe mode, your training job
streams data directly from Amazon S3. Streaming can provide faster start times for training jobs and
better throughput. With Pipe mode, you also reduce the size of the Amazon EBS volumes for your
training instances. B would not apply in this scenario. C is a streaming ingestion solution, but is not
applicable in this scenario. D transforms the data structure.

2) A – There are 2 sentences, 8 unique unigrams, and 8 unique bigrams, so the result would be (2,16). The
phrases are “Please call the number below” and “Please do not call us.” Each word individually (unigram)
is “Please,” “call,” ”the,” ”number,” “below,” “do,” “not,” and “us.” The unique bigrams are “Please call,”
“call the,” ”the number,” “number below,” “Please do,” “do not,” “not call,” and “call us.” The tf–idf
vectorizer is described at this link.

3) B – AWS Glue is the correct answer because this option requires the least amount of setup and
maintenance since it is serverless, and it does not require management of the infrastructure. Refer to this
link for supporting information. A, C, and D are all solutions that can solve the problem, but require more
steps for configuration, and require higher operational overhead to run and maintain.

4) B – It is most likely that the loss function is very curvy and has multiple local minima where the training is
getting stuck. Decreasing the batch size would help the Data Scientist stochastically get out of the local
minima saddles. Decreasing the learning rate would prevent overshooting the global loss function
minimum. Refer to the paper at this link for an explanation.

5) D – The following calculations are required:

TP = True Positive
FP = False Positive
FN = False Negative
TN = True Negative
FN = False Negative

Recall = TP / (TP + FN)

False Positive Rate (FPR) = FP / (FP + TN)

Cost = 5 * FP + FN

A B C D
Recall 78 / (78 + 22) = 0.78 79 / (79 + 21) = 0.79 90 / (90 + 10) = 0.9 82 / (82 + 18) = 0.82
False Positive Rate 9 / (9 + 91) = 0.09 1 / (1 + 99) = 0.01 4 / (4 + 96) = 0.04 2 / (2 + 98) = 0.02
Costs 5 * 9 + 22 = 67 5 * 1 + 21 = 26 5 * 4 + 10 = 30 5 * 2 + 18 = 28

Options C and D have a recall greater than 80% and an FPR less than 10%, but D is the most cost
effective. For supporting information, refer to this link.

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved | aws.amazon.com 4|Page
Machine Learning - Specialty (MLS-C01)
Sample Exam Questions
6) B – Decreasing the class probability threshold makes the model more sensitive and, therefore, marks
more cases as the positive class, which is fraud in this case. This will increase the likelihood of fraud
detection. However, it comes at the price of lowering precision. This is covered in the Discussion section
of the paper at this link.

7) C – With datasets that are not fully populated, the Synthetic Minority Over-sampling Technique (SMOTE)
adds new information by adding synthetic data points to the minority class. This technique would be the
most effective in this scenario. Refer to Section 4.2 at this link for supporting information.

8) D – Use supervised learning to predict missing values based on the values of other features. Different
supervised learning approaches might have different performances, but any properly implemented
supervised learning approach should provide the same or better approximation than mean or median
approximation, as proposed in responses A and C. Supervised learning applied to the imputation of
missing values is an active field of research. Refer to this link for an example.

9) B – In this case, a full review summary usually contains the most descriptive phrases of the entire review
and is a valid stand-in for the missing full review text field. For supporting information, refer to page 1627
at this link, and this link and this link.

10) D – Amazon SageMaker Object2Vec generalizes the Word2Vec embedding technique for words to more
complex objects, such as sentences and paragraphs. Since the supervised learning task is at the level of
whole claims, for which there are labels, and no labels are available at the word level, Object2Vec needs
be used instead of Word2Vec. For supporting information, refer to this link and this link.

AWS Certified AI Practitioner Slides v10
No ratings yet
AWS Certified AI Practitioner Slides v10
367 pages
Handbook of Data Quality Research and Practice (Shazia Sadiq)
No ratings yet
Handbook of Data Quality Research and Practice (Shazia Sadiq)
440 pages
Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
100% (1)
Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
504 pages
MLS C01
0% (1)
MLS C01
4 pages
Azure Machine Learning Guide
100% (1)
Azure Machine Learning Guide
1,748 pages
MLOps Interview QnA
No ratings yet
MLOps Interview QnA
19 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
AIP-210 CertNexus Certified Artificial Intelligence Practitioner Practice Questions
No ratings yet
AIP-210 CertNexus Certified Artificial Intelligence Practitioner Practice Questions
8 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
41 pages
Tehnici Data Mining - Aws
No ratings yet
Tehnici Data Mining - Aws
9 pages
Tellhow Gensets Operation Manual
67% (3)
Tellhow Gensets Operation Manual
50 pages
MLA C01 Exam Dumps
No ratings yet
MLA C01 Exam Dumps
7 pages
50 Artificial Intelligence Interview Questions and Answers (2023)
No ratings yet
50 Artificial Intelligence Interview Questions and Answers (2023)
32 pages
Top 100 ML Interview Q&A
100% (1)
Top 100 ML Interview Q&A
39 pages
MLS-C01 Updated Dumps - AWS Certified Machine Learning - Specialty
No ratings yet
MLS-C01 Updated Dumps - AWS Certified Machine Learning - Specialty
19 pages
ML Certificate Preparation (Last Version)
No ratings yet
ML Certificate Preparation (Last Version)
288 pages
DevOps For Data Science (Alex K Gold) (Z-Library)
No ratings yet
DevOps For Data Science (Alex K Gold) (Z-Library)
274 pages
Aws Certified ML Slides
No ratings yet
Aws Certified ML Slides
497 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
100% (1)
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
6 pages
AWS Certified Machine Learning - Specialty - Exam Guide
No ratings yet
AWS Certified Machine Learning - Specialty - Exam Guide
3 pages
AWS Certified Machine Learning: Specialty - Exam Overview and Preparation
No ratings yet
AWS Certified Machine Learning: Specialty - Exam Overview and Preparation
14 pages
170 Machine Learning Interview Questios - Greatlearning
100% (1)
170 Machine Learning Interview Questios - Greatlearning
57 pages
Inverex Power Supply Circuit Daigram
No ratings yet
Inverex Power Supply Circuit Daigram
1 page
50 Deep Learning Technical Interview Questions With Answers
100% (1)
50 Deep Learning Technical Interview Questions With Answers
20 pages
Machine Learning Interviews - Lessons From Both Sides - FSDL
100% (2)
Machine Learning Interviews - Lessons From Both Sides - FSDL
70 pages
MLOps Masterclass
No ratings yet
MLOps Masterclass
13 pages
ACCA P5 Solution
100% (1)
ACCA P5 Solution
82 pages
30 Frequently Asked Deep Learning Interview Questions and Answers
100% (1)
30 Frequently Asked Deep Learning Interview Questions and Answers
28 pages
03 Concrete
No ratings yet
03 Concrete
68 pages
Aindumps 2023-Aug-06 by Edison 117q Vce
No ratings yet
Aindumps 2023-Aug-06 by Edison 117q Vce
9 pages
ML Questions
No ratings yet
ML Questions
56 pages
51 Machine Learning Interview Questions With Answers - Springboard
100% (1)
51 Machine Learning Interview Questions With Answers - Springboard
20 pages
Machine Learning Interview
No ratings yet
Machine Learning Interview
14 pages
Machine Learning Crashcourse
No ratings yet
Machine Learning Crashcourse
233 pages
TensorFlow Roadmap
No ratings yet
TensorFlow Roadmap
22 pages
Ai
No ratings yet
Ai
28 pages
Machine Learning Cheat Sheet ??? - ?
No ratings yet
Machine Learning Cheat Sheet ??? - ?
231 pages
Data Scientist Interview Questions
No ratings yet
Data Scientist Interview Questions
2 pages
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
No ratings yet
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
2 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
100% (1)
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
Service Manual: X1M/EW X1M/EW
No ratings yet
Service Manual: X1M/EW X1M/EW
44 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Preparing For Your Python Interview - The Basics - Techwers (Your Technical Interview Book 2) - KT Lindemann
No ratings yet
Preparing For Your Python Interview - The Basics - Techwers (Your Technical Interview Book 2) - KT Lindemann
83 pages
IBM Coding Questions With Answers 2024
No ratings yet
IBM Coding Questions With Answers 2024
13 pages
Pcap PDF
No ratings yet
Pcap PDF
35 pages
Machine Learning Interview Questions
100% (1)
Machine Learning Interview Questions
4 pages
ML Interview Questions
No ratings yet
ML Interview Questions
21 pages
Datascience With Answers
100% (1)
Datascience With Answers
36 pages
A JIT Translator For Oberon
No ratings yet
A JIT Translator For Oberon
57 pages
200+ Python Exercises For Beginners Solve Coding Challenges
No ratings yet
200+ Python Exercises For Beginners Solve Coding Challenges
8 pages
Machine Learning + Devops Using Azure ML Services
No ratings yet
Machine Learning + Devops Using Azure ML Services
17 pages
ML Performance Improvement Cheatsheet
No ratings yet
ML Performance Improvement Cheatsheet
11 pages
Machine Learning
No ratings yet
Machine Learning
216 pages
Factors Affecting Business Operations in Ralph Internet Café
No ratings yet
Factors Affecting Business Operations in Ralph Internet Café
16 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
100% (1)
40 Interview Questions Asked at Startups in Machine Learning - Data Science
33 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
PDF Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and Tensorflow, 2Nd Edition by Sebastian Raschka
67% (3)
PDF Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and Tensorflow, 2Nd Edition by Sebastian Raschka
3 pages
Waste Management in Steel Industry
No ratings yet
Waste Management in Steel Industry
192 pages
LA413001 PDC Intermittant Operation PDF
No ratings yet
LA413001 PDC Intermittant Operation PDF
2 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
100% (3)
40 Interview Questions Asked at Startups in Machine Learning - Data Science
33 pages
Amazon Programming and Technical Interview Questions
No ratings yet
Amazon Programming and Technical Interview Questions
5 pages
Vibrating Screen Data Sheet PDF
No ratings yet
Vibrating Screen Data Sheet PDF
1 page
How To Learn Machine Learning Algorithms For Interviews
No ratings yet
How To Learn Machine Learning Algorithms For Interviews
16 pages
Datanest - Data Science Interview
No ratings yet
Datanest - Data Science Interview
19 pages
Learning Path Machine Learning
No ratings yet
Learning Path Machine Learning
7 pages
Forty Six & 2 PDF
No ratings yet
Forty Six & 2 PDF
12 pages
AWS Certified Data Analytics - Specialty Exam Guide - v1.0!08!23-2019 - FINAL
0% (1)
AWS Certified Data Analytics - Specialty Exam Guide - v1.0!08!23-2019 - FINAL
2 pages
50 Machine Learning Interview
No ratings yet
50 Machine Learning Interview
8 pages
Havells India LTD
100% (1)
Havells India LTD
44 pages
Scott - Connection Three Phase To 2 - Phase Conversion - Index No. 11
No ratings yet
Scott - Connection Three Phase To 2 - Phase Conversion - Index No. 11
5 pages
Building Standard
No ratings yet
Building Standard
4 pages
Multigenre Project - Teaching Cellular Respiration and Photosynthesis
No ratings yet
Multigenre Project - Teaching Cellular Respiration and Photosynthesis
11 pages
Annual Review 2013 2014
No ratings yet
Annual Review 2013 2014
208 pages
Engineer Info Report RUR0088 PLD7800 Link 1 130715-154813
No ratings yet
Engineer Info Report RUR0088 PLD7800 Link 1 130715-154813
33 pages
Enae 641
No ratings yet
Enae 641
17 pages
ECE/CS 3724 - Microprocessors: Computer
No ratings yet
ECE/CS 3724 - Microprocessors: Computer
10 pages
Application Form English To
No ratings yet
Application Form English To
5 pages
S35 4E Manual
No ratings yet
S35 4E Manual
44 pages
Design and Fabrication of Spring Loaded Fan
No ratings yet
Design and Fabrication of Spring Loaded Fan
10 pages
Installation Compresso Connect 2104 Low
No ratings yet
Installation Compresso Connect 2104 Low
12 pages
Implementing Core Cisco ASA Security SASAC
No ratings yet
Implementing Core Cisco ASA Security SASAC
5 pages
LIA Result
No ratings yet
LIA Result
3 pages
1-Kickoff Meeting Template PDF
No ratings yet
1-Kickoff Meeting Template PDF
7 pages
Overall Process Integration
No ratings yet
Overall Process Integration
29 pages
MR Cold Start Aid
No ratings yet
MR Cold Start Aid
17 pages
22 Bootable Antivirus Rescue CDs Download List - Itechtics PDF
No ratings yet
22 Bootable Antivirus Rescue CDs Download List - Itechtics PDF
1 page
FNP Dedi 840097 Presentasi
No ratings yet
FNP Dedi 840097 Presentasi
7 pages

AWS Certified Machine Learning - Specialty - Sample Questions

Uploaded by

AWS Certified Machine Learning - Specialty - Sample Questions

Uploaded by

Machine Learning - Specialty (MLS-C01)

Sample Exam Questions

What can a Machine Learning Specialist do to address this concern?

A. Use Amazon SageMaker Pipe mode.

1. Please call the number below.

What are the dimensions of the tf–idf matrix?

Which solution will allow the company to achieve its goals?

What should the Data Scientist do to improve the training process?

The models should be evaluated based on the following criteria:

A. Using undersampling to balance the dataset

A. Oversampling using bootstrapping

What should the ML Engineer do to minimize bias due to missing values?

5) D – The following calculations are required:

Recall = TP / (TP + FN)

False Positive Rate (FPR) = FP / (FP + TN)

You might also like