Mla C01
Mla C01
MLA-C01 Exam
Certified Machine Learning
Engineer - Associate
https://www.dumpsvibe.com/amazon/mla-c01-dumps.html
Questions & Answers PDF Page 2
Question: 1
You are a machine learning engineer at a fintech company tasked with developing and deploying an
end-to-end machine learning workflow for fraud detection. The workflow involves multiple steps,
including data extraction, preprocessing, feature engineering, model training, hyperparameter tuning,
and deployment. The company requires the solution to be scalable, support complex dependencies
between tasks, and provide robust monitoring and versioning capabilities. Additionally, the workflow
needs to integrate seamlessly with existing AWS services.
Which deployment orchestrator is the MOST SUITABLE for managing and automating your ML
workflow?
A. Use AWS Step Functions to build a serverless workflow that integrates with SageMaker for model
training and deployment, ensuring scalability and fault tolerance
B. Use AWS Lambda functions to manually trigger each step of the ML workflow, enabling flexible
execution without needing a predefined orchestration tool
C. Use Amazon SageMaker Pipelines to orchestrate the entire ML workflow, leveraging its built-in
integration with SageMaker features like training, tuning, and deployment
D. Use Apache Airflow to define and manage the workflow with custom DAGs (Directed Acyclic
Graphs), integrating with AWS services through operators and hooks
Answer: C
Explanation:
Correct option:
Use Amazon SageMaker Pipelines to orchestrate the entire ML workflow, leveraging its built-
in integration with SageMaker features like training, tuning, and deployment
Incorrect options:
Use Apache Airflow to define and manage the workflow with custom DAGs (Directed Acyclic
Graphs), integrating with AWS services through operators and hooks - Apache Airflow is a
powerful orchestration tool that allows you to define complex workflows using custom DAGs. However,
it requires significant setup and maintenance, and while it can integrate with AWS services, it does not
provide the seamless, built-in integration with SageMaker that SageMaker Pipelines offers.
www.dumpsvibe.com
Questions & Answers PDF Page 3
via - https://aws.amazon.com/managed-workflows-for-apache-airflow/
Use AWS Step Functions to build a serverless workflow that integrates with SageMaker for
model training and deployment, ensuring scalability and fault tolerance- AWS Step Functions
is a serverless orchestration service that can integrate with SageMaker and other AWS services.
However, it is more general-purpose and lacks some of the ML-specific features, such as model
lineage tracking and hyperparameter tuning, that are built into SageMaker Pipelines.
Use AWS Lambda functions to manually trigger each step of the ML workflow, enabling
flexible execution without needing a predefined orchestration tool - AWS Lambda is useful for
triggering specific tasks, but manually managing each step of a complex ML workflow without a
comprehensive orchestration tool is not scalable or maintainable. It does not provide the task
dependency management, monitoring, and versioning required for an end-to-end ML workflow.
References:
https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html
https://aws.amazon.com/managed-workflows-for-apache-airflow/
Question: 2
You are tasked with building a predictive model for customer lifetime value (CLV) using Amazon
SageMaker. Given the complexity of the model, it’s crucial to optimize hyperparameters to achieve the
best possible performance. You decide to use SageMaker’s automatic model tuning (hyperparameter
optimization) with Random Search strategy to fine-tune the model. You have a large dataset, and the
tuning job involves several hyperparameters, including the learning rate, batch size, and dropout rate.
During the tuning process, you observe that some of the trials are not converging effectively, and the
results are not as expected. You suspect that the hyperparameter ranges or the strategy you are using
may need adjustment.
Which of the following approaches is MOST LIKELY to improve the effectiveness of the hyperparameter
tuning process?
A. Decrease the number of total trials but increase the number of parallel jobs to speed up the tuning
process
B. Switch from the Random Search strategy to the Bayesian Optimization strategy and narrow the
range of critical hyperparameters
C. Use the Grid Search strategy with a wide range for all hyperparameters and increase the number of
total trials
www.dumpsvibe.com
Questions & Answers PDF Page 4
D. Increase the number of hyperparameters being tuned and widen the range for all hyperparameters
Answer: B
Explanation:
Correct option:
Switch from the Random Search strategy to the Bayesian Optimization strategy and narrow the
range of critical hyperparameters
When you’re training machine learning models, each dataset and model needs a different set of
hyperparameters, which are a kind of variable. The only way to determine these is through multiple
experiments, where you pick a set of hyperparameters and run them through your model. This is called
hyperparameter tuning. In essence, you're training your model sequentially with different sets of
hyperparameters. This process can be manual, or you can pick one of several automated
hyperparameter tuning methods.
Bayesian Optimization is a technique based on Bayes’ theorem, which describes the probability of an
event occurring related to current knowledge. When this is applied to hyperparameter optimization, the
algorithm builds a probabilistic model from a set of hyperparameters that optimizes a specific metric. It
uses regression analysis to iteratively choose the best set of hyperparameters.
Random Search selects groups of hyperparameters randomly on each iteration. It works well when a
relatively small number of the hyperparameters primarily determine the model outcome.
Bayesian Optimization is more efficient than Random Search for hyperparameter tuning, especially when
dealing with complex models and large hyperparameter spaces. It learns from previous trials to predict
the best set of hyperparameters, thus focusing the search more effectively. Narrowing the range of
critical hyperparameters can further improve the chances of finding the optimal values, leading to better
model convergence and performance.
www.dumpsvibe.com
Questions & Answers PDF Page 5
via - https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
Incorrect options:
Increase the number of hyperparameters being tuned and widen the range for all
hyperparameters - Increasing the number of hyperparameters and widening the range without any
strategic approach can lead to a more extensive search space, which could cause the tuning process to
become inefficient and less likely to converge on optimal values.
Decrease the number of total trials but increase the number of parallel jobs to speed up the
tuning process - Reducing the total number of trials might speed up the tuning process, but it also
reduces the chances of finding the best hyperparameters, especially if the model is complex. Increasing
parallel jobs can improve throughput but doesn't necessarily enhance the quality of the search.
Use the Grid Search strategy with a wide range for all hyperparameters and increase the number
of total trials - Grid Search works well, but it’s relatively tedious and computationally intensive,
especially with large numbers of hyperparameters. It is less efficient than Bayesian Optimization for
complex models. A wide range of hyperparameters without focus would result in more trials, but it is not
guaranteed to find the best values, especially with a larger search space.
References:
www.dumpsvibe.com
Questions & Answers PDF Page 6
https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
https://aws.amazon.com/what-is/hyperparameter-tuning/
https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html
Question: 3
A company stores its training datasets on Amazon S3 in the form of tabular data running into millions of
rows. The company needs to prepare this data for Machine Learning jobs. The data preparation involves
data selection, cleansing, exploration, and visualization using a single visual interface.
Which Amazon SageMaker service is the best fit for this requirement?
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and
image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process
of data preparation and feature engineering, and complete each step of the data preparation workflow
(including data selection, cleansing, exploration, visualization, and processing at scale) from a single
visual interface. You can use SQL to select the data that you want from various data sources and import
it quickly. Next, you can use the data quality and insights report to automatically verify data quality and
detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over
300 built-in data transformations, so you can quickly transform data without writing code.
With the SageMaker Data Wrangler data selection tool, you can quickly access and select your tabular
and image data from various popular sources - such as Amazon Simple Storage Service (Amazon S3),
Amazon Athena, Amazon Redshift, AWS Lake Formation, Snowflake, and Databricks - and over 50
other third-party sources - such as Salesforce, SAP, Facebook Ads, and Google Analytics. You can also
write queries for data sources using SQL and import data directly into SageMaker from various file
formats, such as CSV, Parquet, JSON, and database tables.
How Data Wrangler works:
www.dumpsvibe.com
Questions & Answers PDF Page 7
via - https://aws.amazon.com/sagemaker/data-wrangler/
Incorrect options:
Amazon SageMaker Clarify - SageMaker Clarify helps identify potential bias during data
preparation without writing code. You specify input features, such as gender or age, and SageMaker
Clarify runs an analysis job to detect potential bias in those features.
Amazon SageMaker Feature Store - Amazon SageMaker Feature Store is a fully managed,
purpose-built repository to store, share, and manage features for machine learning (ML) models.
Features are inputs to ML models used during training and inference.
Reference:
https://aws.amazon.com/sagemaker/data-wrangler/
Question: 4
Which of the following strategies best aligns with the defense-in-depth security approach for generative
AI applications on AWS?
www.dumpsvibe.com
Questions & Answers PDF Page 8
Applying multiple layers of security measures including input validation, access controls, and
continuous monitoring to address vulnerabilities
Incorrect options:
Relying solely on data encryption to protect the AI training data - Data encryption is crucial for
protecting data at rest and in transit, but it does not address other vulnerabilities such as input validation
or unauthorized access. A holistic security strategy is needed.
Using a single authentication mechanism for all users and services accessing the AI models -
Employing a single authentication mechanism is a weak security practice. Multiple authentication and
authorization mechanisms should be used to ensure robust access control.
Reference:
https://aws.amazon.com/blogs/machine-learning/architect-defense-in-depth-security-for-generative-ai-ap
plications-using-the-owasp-top-10-for-llms/
Question: 5
www.dumpsvibe.com
Questions & Answers PDF Page 9
Answer: D
Explanation:
Correct option:
Use AWS CloudFormation with nested stacks to automate the provisioning of SageMaker,
EC2, and RDS resources, and configure outputs from one stack as inputs to another to
enable communication between them
AWS CloudFormation with nested stacks allows you to modularize your infrastructure, making it easier to
manage and reuse components. By passing outputs from one stack as inputs to another, you can
automate the provisioning of resources while ensuring that all stacks can communicate effectively. This
approach also enables consistent and scalable deployments across environments.
via - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-nested-stacks.html
Incorrect options:
Use AWS CDK (Cloud Development Kit) to define the infrastructure in a high-level programming
language, deploying each service as an independent stack without configuring inter-stack
communication - AWS CDK allows you to define infrastructure using high-level programming languages,
which is flexible and powerful. However, failing to configure inter-stack communication would lead to a
disjointed deployment, where services may not function together as required.
Manually provision the SageMaker, EC2, and RDS resources using the AWS Management Console,
ensuring that communication is established by manually updating security groups and networking
configurations - Manually provisioning resources through the AWS Management Console is error-prone
and not scalable. It lacks the automation and repeatability that infrastructure as code provides, making it
www.dumpsvibe.com
Questions & Answers PDF Page 10
unsuitable for managing complex ML solutions that require seamless communication between multiple
resources.
Use AWS Elastic Beanstalk to deploy the entire ML solution, relying on its built-in environment
management to handle the provisioning and communication between resources automatically -
AWS Elastic Beanstalk is a managed service for deploying applications, but it is not designed for
orchestrating complex ML workflows with multiple resource types like SageMaker, EC2, and RDS. It also
lacks fine-grained control over resource provisioning and inter-stack communication.
Reference:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-nested-stacks.html
Question: 6
You are a data scientist at a healthcare startup tasked with developing a machine learning model to
predict the likelihood of patients developing a specific chronic disease within the next five years. The
dataset available includes patient demographics, medical history, lab results, and lifestyle factors, but it
is relatively small, with only 1,000 records. Additionally, the dataset has missing values in some critical
features, and the class distribution is highly imbalanced, with only 5% of patients labeled as having
developed the disease.
Given the data limitations and the complexity of the problem, which of the following approaches is the
MOST LIKELY to determine the feasibility of an ML solution and guide your next steps?
A. Proceed with training a deep neural network (DNN) model using the available data, as DNNs can
handle small datasets by learning complex patterns
B. Increase the dataset size by generating synthetic data and then train a simple logistic regression
model to avoid overfitting
C. Conduct exploratory data analysis (EDA) to understand the data distribution, address missing
values, and assess the class imbalance before determining if an ML solution is feasible
D. Immediately apply an oversampling technique to balance the dataset, then train an XGBoost model
to maximize performance on the minority class
Answer: C
Explanation:
Correct option:
Conduct exploratory data analysis (EDA) to understand the data distribution, address missing
values, and assess the class imbalance before determining if an ML solution is feasible
Conducting exploratory data analysis (EDA) is the most appropriate first step. EDA allows you to
understand the data distribution, identify and address missing values, and assess the extent of the class
imbalance. This process helps determine whether the available data is sufficient to build a reliable model
and what preprocessing steps might be necessary.
www.dumpsvibe.com
Questions & Answers PDF Page 11
According to The State of Data Science 2020 survey, data management, exploratory data analysis (EDA), feature
selection, and feature engineering accounts for more than 66% of a data scientist's time (see the following diagram).
Data
19o/
loading
eta
cleansing
Data 21%
visualization
Model
selection
Model training
and scoring 12%.
Deploying
11% models
The same survey highlights that the top three biggest roadblocks to deploying a model in production are managing
dependencies and environments, security, and skill gaps (see the following diagram).
www.dumpsvibe.com
Questions & Answers PDF Page 12
via -
https://aws.amazon.com/blogs/machine-learning/exploratory-data-analysis-feature-engineering-and-
oper ationalizing-your-data-flow-into-your-ml-pipeline-with-amazon-sagemaker-data-wrangler/
Incorrect options:
Proceed with training a deep neural network (DNN) model using the available data, as DNNs
can handle small datasets by learning complex patterns - Training a deep neural network on a
small dataset is not advisable, as DNNs typically require large amounts of data to perform well and
avoid overfitting. Additionally, jumping directly to model training without assessing the data first may
lead to poor results.
Increase the dataset size by generating synthetic data and then train a simple logistic
regression model to
- While generating avoid overfitting
synthetic data can help increase the dataset size, it may introduce biases if not
done carefully. Additionally, without first understanding the data through EDA, you risk applying
the wrong strategy or misinterpreting the results.
Immediately apply an oversampling technique to balance the dataset, then train an XGBoost
model to maximize performance on the minority class - Although oversampling can address
class imbalance, it’s important to first understand the underlying data issues through EDA.
Oversampling should not be the immediate next step without understanding the data quality, feature
importance, and potential need for feature engineering.
References:
https://aws.amazon.com/blogs/machine-learning/exploratory-data-analysis-feature-engineering-and-
oper ationalizing-your-data-flow-into-your-ml-pipeline-with-amazon-sagemaker-data-wrangler/
https://aws.amazon.com/blogs/machine-learning/use-amazon-sagemaker-canvas-for-exploratory-
data-a nalysis/
Question: 7
You are a data scientist at a marketing agency tasked with creating a sentiment analysis model to
analyze customer reviews for a new product. The company wants to quickly deploy a solution with
minimal training time and development effort. You decide to leverage a pre-trained natural language
processing (NLP) model and fine-tune it using a custom dataset of labeled customer reviews. Your
team has access to both Amazon Bedrock and SageMaker JumpStart.
Which approach is the MOST APPROPRIATE for fine-tuning the pre-trained model with your
custom dataset?
A. Use SageMaker JumpStart to create a custom container for your pre-trained model and
manually implement fine-tuning with TensorFlow
B. Use SageMaker JumpStart to deploy a pre-trained NLP model and use the built-in fine-
tuning functionality with your custom dataset to create a customized sentiment analysis
model
C. Use Amazon Bedrock to train a model from scratch using your custom dataset, as
Bedrock is optimized for training large models efficiently
D. Use Amazon Bedrock to select a foundation model from a third-party provider, then fine-
tune the model directly in the Bedrock interface using your custom dataset
Answer: B
Explanation:
www.dumpsvibe.com
Questions & Answers PDF Page 13
Correct option:
Use SageMaker JumpStart to deploy a pre-trained NLP model and use the built-in fine-
tuning functionality with your custom dataset to create a customized sentiment analysis
model
Amazon Bedrock is the easiest way to build and scale generative AI applications with foundation
models. Amazon Bedrock is a fully managed service that offers a choice of high-performing
foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta,
Mistral AI, Stability AI, and
Amazon through a single API, along with a broad set of capabilities you need to build generative
AI applications with security, privacy, and responsible AI.
Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your
ML journey. With SageMaker JumpStart, you can evaluate, compare, and select FMs quickly based
on pre- defined quality and responsibility metrics to perform tasks like article summarization and
image generation. SageMaker JumpStart provides managed infrastructure and tools to accelerate
scalable, reliable, and secure model building, training, and deployment of ML models.
Fine-tuning trains a pretrained model on a new dataset without training from scratch. This process,
also known as transfer learning, can produce accurate models with smaller datasets and less
training time.
SageMaker JumpStart is specifically designed for scenarios like this, where you can quickly
deploy a pre-trained model and fine-tune it using your custom dataset. This approach allows you
to leverage existing NLP models, reducing both development time and computational resources
needed for training from scratch.
www.dumpsvibe.com
Questions & Answers PDF Page 14
via - https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-fine-tune.html
Incorrect options:
Use Amazon Bedrock to select a foundation model from a third-party provider, then fine-tune
the model directly in the Bedrock interface using your custom dataset - Amazon Bedrock
provides access to foundation models from third-party providers, allowing for easy deployment and
integration into applications. However, as of now, Bedrock does not directly support fine-tuning these
models within its interface. Fine-tuning is better suited for SageMaker JumpStart in this scenario.
Use Amazon Bedrock to train a model from scratch using your custom dataset, as Bedrock
is optimized for training large models efficiently - Amazon Bedrock is not intended for training
models from scratch, especially not for scenarios where fine-tuning a pre-trained model would be
more efficient. Bedrock is optimized for deploying and scaling foundation models, not for raw model
training.
Use SageMaker JumpStart to create a custom container for your pre-trained model and
manually implement fine-tuning with TensorFlow - While it’s possible to create a custom
container and manually fine-tune a model, SageMaker JumpStart already offers an integrated
solution for fine-tuning pre-trained models without the need for custom containers or manual
implementation. This makes it a more efficient and straightforward option for the task at hand.
Reference:
https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-fine-tune.html
Question: 8
You are a machine learning engineer at a healthcare company responsible for developing and
deploying an end-to-end ML workflow for predicting patient readmission rates. The workflow
involves data preprocessing, model training, hyperparameter tuning, and deployment. Additionally,
the solution must support regular retraining of the model as new data becomes available, with
minimal manual intervention. You need to select the right solution to orchestrate this workflow
efficiently while ensuring scalability, reliability, and ease of management.
Given these requirements, which of the following options is the MOST SUITABLE for orchestrating
your ML workflow?
A. Implement the entire ML workflow using Amazon SageMaker Pipelines, which provides
integrated orchestration for data processing, model training, tuning, and deployment
B. Use AWS Step Functions to define and orchestrate each step of the ML workflow, integrate
with SageMaker for model training and deployment, and leverage AWS Lambda for data
preprocessing tasks
C. Leverage Amazon EC2 instances to manually execute each step of the ML workflow, use
Amazon RDS for storing intermediate results, and deploy the model using Amazon SageMaker
endpoints
D. Use AWS Glue for data preprocessing, Amazon SageMaker for model training and tuning,
and manually deploy the model to an Amazon EC2 instance for inference
Answer: A
Explanation:
Correct option:
Implement the entire ML workflow using Amazon SageMaker Pipelines, which provides
www.dumpsvibe.com
Questions & Answers PDF Page 15
integrated orchestration for data processing, model training, tuning, and deployment
via - https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html
Incorrect options:
Use AWS Step Functions to define and orchestrate each step of the ML workflow, integrate
with SageMaker for model training and deployment, and leverage AWS Lambda for data
preprocessing tasks - AWS Step Functions is a powerful service for orchestrating workflows,
and it can integrate with SageMaker and Lambda. However, using Step Functions for the entire
ML workflow adds complexity since it requires coordinating multiple services, whereas
SageMaker Pipelines provides a more seamless, integrated solution for ML-specific workflows.
Leverage Amazon EC2 instances to manually execute each step of the ML workflow, use
Amazon RDS for storing intermediate results, and deploy the model using Amazon
SageMaker endpoints - Manually managing each step of the ML workflow using EC2 instances
and RDS is labor- intensive, prone to errors, and not scalable. It also lacks the automation and
orchestration capabilities needed for a robust ML workflow.
Use AWS Glue for data preprocessing, Amazon SageMaker for model training and tuning,
and manually deploy the model to an Amazon EC2 instance for inference - While using AWS
Glue for data preprocessing and SageMaker for training is possible, manually deploying the model
on EC2 lacks the orchestration and management features provided by SageMaker Pipelines. This
approach also misses out on the integrated tracking, automation, and scalability features offered by
SageMaker Pipelines.
Use AWS Glue for data preprocessing, Amazon SageMaker for model training and tuning, and
manually deploy the model to an Amazon EC2 instance for inference - While using AWS Glue for
data preprocessing and SageMaker for training is possible, manually deploying the model on EC2
lacks the orchestration and management features provided by SageMaker Pipelines. This approach
also misses out on the integrated tracking, automation, and scalability features offered by
SageMaker Pipelines.
Reference:
https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines
.html
Question: 9
You are an ML engineer at a data analytics company tasked with training a deep learning model on
a large, computationally intensive dataset. The training job can tolerate interruptions and is expected
to run for several hours or even days, depending on the available compute resources. The company
has a limited budget for cloud infrastructure, so you need to minimize costs as much as possible.
Which strategy is the MOST EFFECTIVE for your ML training job while minimizing cost and ensuring
the job completes successfully?
www.dumpsvibe.com
Questions & Answers PDF Page 16
A. Start the training job using only Spot Instances to minimize cost, and switch to On-
Demand instances manually if any Spot Instances are interrupted during training
B. Use Amazon SageMaker Managed Spot Training to dynamically allocate Spot Instances
for the training job, automatically retrying any interrupted instances via checkpoints
C. Deploy the training job on a fixed number of On-Demand EC2 instances to ensure stability,
and manually add Spot Instances as needed to speed up the job during off-peak hours
D. Use Amazon EC2 Auto Scaling to automatically add Spot Instances to the training job
based on demand, and configure the job to continue processing even if some Spot Instances
are interrupted
Answer: B
Explanation:
Correct option:
Use Amazon SageMaker Managed Spot Training to dynamically allocate Spot Instances for the
training job, automatically retrying any interrupted instances via checkpoints
Managed Spot Training uses Amazon EC2 Spot instance to run training jobs instead of on-demand
instances. You can specify which training jobs use spot instances and a stopping condition that
specifies how long SageMaker waits for a job to run using Amazon EC2 Spot instances. Spot
instances can be interrupted, causing jobs to take longer to start or finish. You can configure your
managed spot training job to use checkpoints. SageMaker copies checkpoint data from a local path
to Amazon S3. When the job is restarted, SageMaker copies the data from Amazon S3 back into
the local path. The training job can then resume from the last checkpoint instead of restarting.
via -
https://aws.amazon.com/blogs/aws/managed-spot-training-save-up-to-90-on-your-amazon-
sagemaker- training-jobs/
Incorrect options:
Use Amazon EC2 Auto Scaling to automatically add Spot Instances to the training job
based on demand, and configure the job to continue processing even if some Spot
Instances are interrupted - Amazon EC2 Auto Scaling can add Spot Instances based on
demand, but it does not provide the same level of automation and resilience as SageMaker
Managed Spot Training, especially for ML-specific workloads where Spot interruptions need to be
handled gracefully.
Deploy the training job on a fixed number of On-Demand EC2 instances to ensure stability,
and manually add Spot Instances as needed to speed up the job during off-peak hours -
Using a fixed number of On-Demand EC2 instances provides stability, but manually adding Spot
Instances introduces complexity and may not fully optimize costs. Automating this process with
SageMaker is more efficient.
Start the training job using only Spot Instances to minimize cost, and switch to On-
Demand instances manually if any Spot Instances are interrupted during training - Starting
with only Spot Instances minimizes costs, but manually switching to On-Demand instances
increases the risk of delays and interruptions if Spot capacity becomes unavailable. SageMaker
Managed Spot Training offers a more reliable and automated solution.
References:
www.dumpsvibe.com
Questions & Answers PDF Page 17
https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html
https://aws.amazon.com/blogs/aws/managed-spot-training-save-up-to-90-on-your-amazon-
sagemaker-tr aining-jobs/
Question: 10
Your data science team is working on developing a machine learning model to predict customer
churn. The dataset that you are using contains hundreds of features, but you suspect that not all
of these features are equally important for the model's accuracy. To improve the model's
performance and
reduce its complexity, the team wants to focus on selecting only the most relevant features that
contribute significantly to minimizing the model's error rate.
Which feature engineering process should your team apply to select a subset of features that
are the most relevant towards minimizing the error rate of the trained model?
A. Feature extraction
B. Feature creation
C. Feature transformation
D. Feature
selection
Answer: D
Explanation:
Correct option:
Feature selection
Feature selection is the process of selecting a subset of extracted features. This is the subset that is
relevant and contributes to minimizing the error rate of a trained model. Feature importance score
and correlation matrix can be factors in selecting the most relevant features for model training.
Incorrect options:
Feature creation - Feature creation refers to the creation of new features from existing data to help
with better predictions. Examples of feature creation include: one-hot-encoding, binning, splitting,
and calculated features.
Feature transformation - Feature transformation and imputation include steps for replacing missing
features or features that are not valid. Some techniques include: forming Cartesian products of
features, non-linear transformations (such as binning numeric variables into categories), and
creating re extraction involves reducing the amount of data to be processed using dimensionality
reduction techniques. These
techniques include: Principal Components Analysis (PCA), Independent Component Analysis (ICA),
and Linear Discriminant Analysis (LDA).
Reference:
https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/feature-engineering.html
Question: 11
www.dumpsvibe.com
Questions & Answers PDF Page 18
You are a DevOps engineer at a tech company that is building a scalable microservices-based
application. The application is composed of several containerized services, each responsible for
different parts of the application, such as user authentication, data processing, and
recommendation systems.
The company wants to standardize and automate the deployment and management of its
infrastructure using Infrastructure as Code (IaC). You need to choose between AWS
CloudFormation and AWS Cloud Development Kit (CDK) for defining the infrastructure.
Additionally, you must decide on the appropriate AWS container service to manage and deploy
these microservices efficiently.
Given the requirements, which combination of IaC option and container service is MOST
SUITABLE for this scenario, and why?
A. Use AWS CloudFormation with YAML templates for infrastructure automation and deploy
the containerized microservices using Amazon Lightsail Containers to simplify management
and reduce costs
B. Use AWS CloudFormation to define and deploy the infrastructure as code, and Amazon ECR
(Elastic Container Registry) with Fargate for running the containerized microservices without
needing to manage the underlying servers
C. Use AWS CDK for infrastructure as code, allowing you to define the infrastructure in a high-
level programming language, and deploy the containerized microservices using Amazon EKS
(Elastic Kubernetes Service) for advanced orchestration and scalability
D. Use AWS CDK with Amazon ECS on EC2 instances to combine the flexibility of
programming languages with direct control over the underlying server infrastructure for the
microservices
Answer: C
Explanation:
Correct option:
Use AWS CDK for infrastructure as code, allowing you to define the infrastructure in a high-
level programming language, and deploy the containerized microservices using Amazon
EKS (Elastic Kubernetes Service) for advanced orchestration and scalability
AWS CDK offers the flexibility of using high-level programming languages (e.g., Python,
JavaScript) to define infrastructure, making it easier to manage complex infrastructure setups
programmatically.
via - https://docs.aws.amazon.com/cdk/v2/guide/home.html
Amazon EKS is designed for running containerized microservices with Kubernetes, providing
advanced orchestration, scalability, and integration with CI/CD pipelines. This combination is ideal
for a microservices-based application with complex deployment and scaling needs.
via - https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html
Incorrect options:
Use AWS CloudFormation to define and deploy the infrastructure as code, and Amazon ECR
(Elastic Container Registry) with Fargate for running the containerized microservices without
needing to manage the underlying servers - AWS CloudFormation is powerful for defining
infrastructure using JSON or YAML. However, Amazon ECR is an AWS managed container image
www.dumpsvibe.com
Questions & Answers PDF Page 19
Use AWS CloudFormation with YAML templates for infrastructure automation and deploy the
containerized microservices using Amazon Lightsail Containers to simplify management and
reduce costs - AWS CloudFormation with YAML templates is suitable for traditional IaC, but
Amazon Lightsail Containers is better for simple, low-cost container deployments. It may lack the
scalability and orchestration features required for a complex microservices architecture.
Use AWS CDK with Amazon ECS on EC2 instances to combine the flexibility of programming
languages with direct control over the underlying server infrastructure for the microservices - AWS
CDK combined with Amazon ECS on EC2 gives more control over the underlying infrastructure but
adds complexity in managing the servers. For a microservices-based application, this might
introduce unnecessary overhead compared to using managed services like Fargate or EKS.
References:
https://docs.aws.amazon.com/cdk/v2/guide/home
.html
https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html
Question: 12
A company uses a generative model to analyze animal images in the training dataset to record
variables like different ear shapes, eye shapes, tail features, and skin patterns.
Which of the following tasks can the generative model perform?
A. The model can classify multiple species of animals such as cats, dogs, etc
B. The model can recreate new animal images that were not in the training dataset
C. The model can identify any image from the training dataset
D. The model can classify a single species of animals such as cats
Answer: B
Explanation:
Correct option:
The model can recreate new animal images that were not in the training dataset
Generative artificial intelligence (generative AI) is a type of AI that can create new content and
ideas, including conversations, stories, images, videos, and music. AI technologies attempt to
mimic human intelligence in nontraditional computing tasks like image recognition, natural language
processing (NLP), and translation.
Generative models can analyze animal images to record variables like different ear shapes, eye
shapes, tail features, and skin patterns. They learn features and their relations to understand what
different animals look like in general. They can then recreate new animal images that were not in
the training set.
via - https://aws.amazon.com/what-is/generative-ai/
Incorrect options:
Traditional machine learning models were discriminative or focused on classifying data points.
They attempted to determine the relationship between known and unknown factors. For example,
they look at images—known data like pixel arrangement, line, color, and shape—and map them to
words—the unknown factor. Only discriminative models can act as single-class classifiers or multi-
class classifiers. Therefore, both these options are incorrect.
The model can identify any image from the training dataset - This option has been added as a
distractor. A generative model is not an image-matching algorithm. It cannot identify an image from
the training dataset.
Reference:
https://aws.amazon.com/what-is/generative-ai/
Question: 13
You are a Cloud Financial Manager at a SaaS company that uses various AWS services to run its
applications and machine learning workloads. Your management team has asked you to reduce
overall AWS spending while ensuring that critical applications remain highly available and
performant. To achieve this, you need to use AWS cost analysis tools to monitor spending, identify
cost-saving opportunities, and optimize resource utilization across the organization.
Which of the following actions can you perform using the appropriate AWS cost analysis tools to
achieve your goal of reducing costs and optimizing AWS resource utilization? (Select two)
A. Use AWS Cost Explorer to analyze historical spending patterns, identify cost trends, and
forecast future costs to help with budgeting and planning
B. Use AWS Cost Explorer to automatically delete unused resources across your AWS
environment, ensuring that no unnecessary costs are incurred
C. Leverage AWS Trusted Advisor to receive recommendations for cost optimization,
such as identifying underutilized or idle resources, and reserved instance purchasing
opportunities
D. Use AWS Cost Explorer to set custom budgets for cost and usage to govern costs across
your organization and receive alerts when costs exceed your defined thresholds
E. Leverage AWS Trusted Advisor to directly modify and reconfigure resources based on
cost optimization recommendations without manual intervention
Answer: A, C
Explanation:
Correct options:
Use AWS Cost Explorer to analyze historical spending patterns, identify cost trends, and forecast
future costs to help with budgeting and planning
AWS Cost Explorer allows you to analyze your past AWS spending, identify cost trends, and
forecast future costs based on historical data. This tool is valuable for budgeting and financial
planning, helping you make informed decisions about resource allocation and cost
management.
Leverage AWS Trusted Advisor to receive recommendations for cost optimization, such as
identifying underutilized or idle resources, and reserved instance purchasing opportunities
www.dumpsvibe.com
Questions & Answers PDF Page 21
via - https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-is.html
Incorrect options:
Use AWS Cost Explorer to set custom budgets for cost and usage to govern costs across your
organization and receive alerts when costs exceed your defined thresholds You can only use AWS
Budgets to set custom budgets for cost and usage to govern costs across your organization and
receive alerts when costs exceed your defined thresholds.
Use AWS Cost Explorer to automatically delete unused resources across your AWS environment,
ensuring that no unnecessary costs are incurred AWS Cost Explorer does not have the capability
to automatically delete unused resources. It is a cost analysis tool that helps you visualize and
understand your costs but does not manage or modify your AWS resources directly.
Leverage AWS Trusted Advisor to directly modify and reconfigure resources based on cost
optimization recommendations without manual intervention - AWS Trusted Advisor provides
recommendations but does not automatically modify or reconfigure resources. Changes must be
made manually based on the insights provided by the tool.
References:
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-
is.html https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
https://docs.aws.amazon.com/awssupport/latest/user/trusted-advisor.html
Question: 14
You are a Machine Learning Operations (MLOps) Engineer at a large technology company that
runs multiple machine learning workloads across different environments. Your company has a
variety of ML use cases, including continuous real-time predictions, scheduled batch processing for
weekly model retraining, and small-scale experimentation with multiple hyperparameter tuning jobs
that can tolerate failure.
Which of the following strategies represents the best use of spot instances, on-demand instances,
and reserved instances for different machine learning workloads, considering the requirements for
cost optimization, reliability, and performance? (Select two)
A. Use on-demand instances for hyperparameter tuning jobs where interruptions can be tolerated
B. Use on-demand instances for real-time predictions that require high availability
C. Use reserved instances for real-time predictions that require high availability
D. Use reserved instances for scheduled batch processing for model retraining
E. Use spot instances for hyperparameter tuning jobs where interruptions can be tolerated
Answer: C, E
Explanation:
Correct options:
Use spot instances for hyperparameter tuning jobs where interruptions can be tolerated
www.dumpsvibe.com
Questions & Answers PDF Page 22
Spot instances are well-suited for hyperparameter tuning jobs where interruptions are
acceptable, as they offer significant cost savings but may be terminated by AWS if the capacity is
needed elsewhere. Use reserved instances for real-time predictions that require high availability
Reserved instances are the best choice for real-time predictions that require high availability, as
they ensure that the necessary resources are always available while optimizing costs over
time.
via - https://aws.amazon.com/ec2/pricing/
Incorrect options:
Use on-demand instances for real-time predictions that require high availability - Although on-demand
instances can certainly be used for real-time predictions that require high availability, however, reserved
instances are a better fit, as they are more cost-effective.
Use on-demand instances for hyperparameter tuning jobs where interruptions can be tolerated -
Although on-demand instances can certainly be used for hyperparameter tuning jobs where
interruptions can be tolerated, however, spot instances are a better fit, as they are more cost-
effective.
Use reserved instances for scheduled batch processing for model retraining - Since the model re-
training is done on a weekly schedule, so on-demand instances are a better fit compared to the
reserved instances.
References:
https://aws.amazon.com/ec2/pri
cing/
https://aws.amazon.com/ec2/pricing/reserved-
instances/ https://aws.amazon.com/ec2/spot/
Question: 15
You are a Senior ML Engineer at a global logistics company that heavily relies on machine learning
models for optimizing delivery routes, predicting demand, and detecting anomalies in real-time. The
company is rapidly expanding, and you are tasked with building a maintainable, scalable, and cost-
effective ML infrastructure that can handle increasing data volumes and evolving model
requirements. You must implement best practices to ensure that the infrastructure can support
ongoing development, deployment, monitoring, and scaling of multiple models across different
regions.
Which of the following strategies should you implement to create a maintainable, scalable, and
cost- effective ML infrastructure for your company using AWS services? (Select three)
A. Provision fixed resources for each model to avoid unexpected costs, ensuring that the
infrastructure is always available for each model
B. Store all model artifacts and data in Amazon CodeCommit for version control and
managing changes over time
C. Use a monolithic architecture to manage all machine learning models in a single
environment, simplifying management and reducing overhead
D. Store all model artifacts and data in Amazon S3, and use versioning to manage changes over
www.dumpsvibe.com
Questions & Answers PDF Page 23
Implement a microservices-based architecture with Amazon SageMaker endpoints, where each model is
deployed independently, allowing for isolated scaling and updates
Store all model artifacts and data in Amazon S3, and use versioning to manage changes over
time, ensuring that models can be easily rolled back if needed
Storing model artifacts and data in Amazon S3 with versioning is a good practice for maintaining
model history and enabling rollbacks.
Incorrect options:
Use a monolithic architecture to manage all machine learning models in a single environment,
simplifying management and reducing overhead - A monolithic architecture can simplify
management in the short term but becomes difficult to maintain and scale as the number of models
and services grows. It also limits flexibility in updating or scaling individual models, leading to
potential bottlenecks and higher costs. Provision fixed resources for each model to avoid
unexpected costs, ensuring that the infrastructure is always available for each model - Provisioning
fixed resources for each model may lead to underutilization or overprovisioning, resulting in higher
costs. Dynamic resource allocation, such as using auto-scaling or spot instances, is generally more
cost-effective and scalable.
Store all model artifacts and data in Amazon CodeCommit for version control and managing
changes over time - Amazon CodeCommit is the right fit for code-specific version control. You
should not use CodeCommit to store model related data.
References:
https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-
overview.html https://aws.amazon.com/codecommit/
www.dumpsvibe.com
Thank You for trying MLA-C01 PDF Demo
https://www.dumpsvibe.com/amazon/mla-c01-dumps.html
www.Dumpsvibe.com