21 Machine Learning Design Patterns Interview Questions (ANSWERED) MLStack
21 Machine Learning Design Patterns Interview Questions (ANSWERED) MLStack
Café
Interview Questions
some
approaches
that you can
take to
implement
the Ensemble
Design
Pattern
Answer
Ensemble design patterns are meta-algorithms that combine several
machine learning submodels as a technique to decrease the bias and/or
variance and improve the general model performance. The idea is that
combining submultiple models helps to improve the machine learning
results. The approach or methods in ensemble learning are:
some
methods you
know for
Rebalancing
a dataset
using
Rebalancing
Design
MLStackML
Pattern SC
Answer
Café
Rebalancing Design Pattern provides various approaches for handling
datasets that are inherently imbalanced. By this we mean datasets
where one label makes up the majority of the dataset, leaving far fewer
examples of other labels. Some methods to address this are:
the benefits
of using the
Workflow
Pipeline
Design
Pattern?
Answer
As ML practitioners, we can often find our daily routine following some or
MLStackML SC
all of the steps outlined in the figure below.
Café
the
difference
between
Multiclass
Classification
models and
Multi Label
model?
Answer
Multiclass classification problems:
Multilabel models:
would you
use the
Hashed
Feature
Design
Pattern?
Answer
The Hashed Feature design pattern is used to address three possible
problems associated with categorical features:
Incomplete vocabulary,
Model size due to cardinality,
and cold start.
After the model is placed into production, new hospitals might be built
and new physicians hired. The model will be unable to make
predictions for these, and so a separate serving infrastructure will be
required to handle such cold-start problems.
problems
would you use
the Neutral
Class Design
Pattern in
Machine
Learning?
Answer
Neutral Class Design Pattern involves introducing a third class (a neutral
class) when trying to solve a binary classification problem.
The need for a neutral class also arises with models that attempt to
predict customer satisfaction. If the training data consists of survey
responses where customers grade their experience on a scale of 1 to
MLStackML SC
10 , it might be helpful to bucket the ratings into three categories: 1 to 4
as bad, 8 to 10 as good, and 5 to 7 is neutral. If, instead, we attempt
Café
to train a binary classifier by thresholding at 6 , the model will spend too
much effort trying to get essentially neutral responses correct.
Feature Cross
Design Pattern
work in
Machine
Learning?
Answer
The Feature Cross design pattern helps models learn relationships
between inputs faster by explicitly making each combination of input
values a separate feature.
We can't draw a single straight line that neatly separates the blue and
orange dots. To solve the nonlinear problem we can create a feature cross
named x3 by crossing x1 and x2 :
We can treat this newly minted x3 feature cross just like any other
feature. The linear formula becomes:
A linear algorithm can learn a weight for x3 just as it would for x1 and
x2 . In other words, although x3 encodes nonlinear information, you
don’t need to change how the linear model trains to determine the value of
x3 .
In this way, feature crosses provide a way to have the ML model learn
MLStackML
relationships betweenSC
the features faster. While more complex models like
neural networks and trees can learn feature crosses on their own, using
Café
feature crosses explicitly can allow us to get away with training just a
linear model. Consequently, feature crosses can speed up model training
(less expensive) and reduce model complexity (less training data is
needed).
ML Interview Questio ns
Q&As
Design
Patterns can
you use to
ensure
Reproducibilit
y of Machine
Learning jobs?
Answer
Transform: it works by capturing data preparation dependencies from
the model training pipeline to reproduce them during serving.
Repeatable Splitting: captures the way data is split among training,
MLStackML SC
validation, and test datasets to ensure that a training example that is
used in training is never used for evaluation or testing even as the
Café
dataset grows.
Bridged Schema: it looks at how to ensure reproducibility when the
training dataset is a hybrid of data conforming to a different schema.
Workflow Pipeline: captures all the steps in the machine learning
process to ensure that as the model is retrained, parts of the pipeline
can be reused.
Feature Store: addresses reproducibility and reusability of features
across different machine learning jobs.
problems are
solved by the
Transformatio
n Design
Pattern?
Answer
The problem is that the inputs to a machine learning model are not the
features that the machine learning model uses in its computations. In a
text classification model, for example, the inputs are the raw text
documents and the features are the numerical embedding representations
of this text.
some trade-
offs when
using
Embeddings in
Machine
Learning?
Answer
An embedding is a relatively low-dimensional space into which you can
translate high-dimensional vectors. Embeddings make it easier to do
machine learning on large inputs like sparse vectors representing words.
However, if we’re in a hurry, there are two rules of thumb that we could
take:
does the
Bridged
Schema
Design Pattern
do?
Answer
The Bridged Schema Design Pattern provides ways to adapt the data
used to train a model from its older, original data schema to newer, better
data.
For example, assuming that we are training a regression model, and one
of the (categorical) inputs is called payment_type . In the older training
data, this has been recorded as cash or card , However, the newer
MLStackML
training data provide SC
more detail on the type of card ( gift_card ,
debit_card , credit_card ) that was used.
Café
What the bridged schema do is find a representation (schema) for the
input (in this example, payment_type ), that works for both the older and
newer data by getting the observed frequency of the new data. In
general, this can be done in two ways:
Probabilistic method:
Static method:
With this approach, we one-hot encode the input using the newer
data.
For the running example, the payment_type is one-hot to a
4 -dimension vector, as the cardinality is four (for newer data).
the difference
between the
Transform and
Feature Store
Design
MLStackML
Patterns in SC
Café
Machine
Learning?
Answer
Transform Design Pattern:
Suppose that after training your model several times, you realize that all
your predicted rainfall amounts are off from the real values. The model
says it’ll rain 0.2 cm but it actually rained 0.4 cm, surprisingly for the
same set of features.
Café
why would you
use
Checkpoints
for a ML
pipeline?
Answer
In Checkpoints, we store the full state of the model periodically so that
we have partially trained models available. These partially trained models
can serve as the final model (in the case of early stopping) or as the
starting points for continued training (in the cases of machine failure and
fine-tuning).
Checkpoints are useful to deal with complex models. The more complex
a model is (for example, the more layers and nodes a neural network has),
the larger the dataset that is needed to train it effectively. This is because
more complex models tend to have more tunable parameters. As model
sizes increase, the time it takes to fit one batch of examples also
increases. As the data size increases (and assuming batch sizes are
fixed), the number of batches also increases. Therefore, in terms of
computational complexity, this double whammy means that training will
take a long time. So when we have training that takes this long, the
chances of machine failure are uncomfortably high. If there is a problem,
we’d like to be able to resume from an intermediate point -using a
checkpoint, instead of from the very beginning.
ML Interview Questio ns
Q&As
MLStackML SC
The solution is to use the sigmoid activation function in our final output
layer. Rather than generating an array where all values sum to 1 (as in
softmax), each individual value in a sigmoid array is a float between 0
and 1 . That is to say when implementing the Multilabel design pattern,
our label needs to be multi-hot encoded. The length of the multi-hot
array corresponds with the number of classes in our model, and each
output in this label array will be a sigmoid value.
For example, suppose that we building a classifier model and our training
dataset included images with more than one animal: cats, dogs, and
rabbits. The sigmoid output for an image that contained a cat and a dog
but not a rabbit might look like the following: [.92, .85, .11] . This output
means the model is 92% confident the image contains a cat, 85%
confident it contains a dog, and 11% confident it contains a rabbit.
1. Building the offline model: We start with a smaller model that can
be deployed on-device. The idea is that the model has a simpler task,
such that it can accomplish this task on-device with relatively high
accuracy. It should be small enough that it can be loaded on a mobile
device for quick inference without relying on internet connectivity.
2. Building the cloud model: Then we build a more complex model, we
deploy it in the cloud and it's triggered only when needed if the user
asks for something more complex. Depending on the use case, this
second model could take many different forms.
would you
need to
implement
Transfer
Learning
Design
Pattern?
Answer
MLStackML SC design pattern, we can take a model that
With the Transfer Learning
has been trained on the same type of data for a similar task and apply it to
Café task using our own custom data.
a specialized
would you
need to use
the Two-
Phases
predictions
Design
Pattern?
Answer
The Two-Phases predictions Design Pattern is useful when we cannot
always rely on end users having reliable internet connections. In such
situations, models are deployed at the edge, meaning they are loaded on
MLStackML SCrequire an internet connection to generate
a user’s device and don’t
predictions. Given device constraints, models deployed on the edge
Café
typically need to be smaller than models deployed in the cloud, and
consequently require balancing trade-offs between model complexity and
size, update frequency, accuracy, and low latency.
There are various scenarios where we’d want our model deployed on an
edge device:
In these cases, we’d still want our application to work, and even if we have
internet connectivity, it may be slow and expensive to continuously
generate predictions from a model deployed in the cloud, so Two-phases
predictions design pattern provides a way to deal with such cases.
would you
need to use
the
Continued
Model
Evaluation
Design
Pattern?
Answer
The Continued Model Evaluation design pattern handles the common
problem of needing to detect and take action when a deployed model is no
longer fit-for-purpose.
MLStackML
The world is dynamic,SC
but developing a machine learning model usually
creates a static model from historical data. This means that once the
Café
model goes into production, it can start to degrade and its predictions can
grow increasingly unreliable. Two of the main reasons models degrade
over time are concept drift and data drift: