0% found this document useful (0 votes)
19 views2 pages

Machine Learning - XLSX - TF Questions

The document contains a series of true/false questions related to machine learning concepts, including AdaBoost, k-NN, SVMs, and Bayesian Networks. Each question is accompanied by explanations for both true and false answers, along with references to relevant chapters or videos. The content serves as a study guide for understanding key machine learning principles and their applications.

Uploaded by

raquel.panapalen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

Machine Learning - XLSX - TF Questions

The document contains a series of true/false questions related to machine learning concepts, including AdaBoost, k-NN, SVMs, and Bayesian Networks. Each question is accompanied by explanations for both true and false answers, along with references to relevant chapters or videos. The content serves as a study guide for understanding key machine learning principles and their applications.

Uploaded by

raquel.panapalen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Machine Learning.

xlsx

Question T F Explanation for True Explanation for False Explanation Video Chapter
In AdaBoost weights are uniformly initialized T 1/n or 1
quantitative (numerical) data
Categorical data should be normalized before training a k-NN F knn
should be normalised
Trivially true, since d(x,x) =
0 holds for a metric,
The error of a 1-NN classifier on the training set is 0 T assuming there are no knn
equal data points with
different class labels
One-vs-all not needed; decision
trees can hand multi-class
One-vs-all is an approach to solve multi-class problems for DTs F
classification problems out-of-
the-box
Dependencies between
Boosting ensembles can be easily parallelized F models/ sequential training of
datasets
Tranform categorical/ordinal to
1-hot encoding is used to transform numerical into categorical attributes F
numerical 1/0
Normalized measure of
The Pearson coefficient has a value range from -1 to 1 T covariance -> Normalized from
-1 to 1
Off-the-shelf is a transfer learning technique that uses the output of layers from a deep-learning architecture as input for a
T
shallow model
SVMs search for a decision boundary with the maximum margin T
SVM tries to find the best
fitting hyperplane, while
Perceptron stops at the Would be careful with the word
first solution. in the context "always". ... Depends on the
SVMs always find a more optimal decision boundary (hyperplane) than Perceptrons T F
of linearly separable data. problem at hand and the kernel
the optimality here is used in SVM.
defined in terms of Margins
between classes.
(ChatGPT answer) No, this is
Its an assumption that is not necessarily true. Bayesian
made, but might not networks can represent
necessarily be true (usually dependencies between
In Bayesian Networks we assume that attributes are statistically independent given the class T F
isnt), however the attributes, even given the class.
implementation works with This assumption is related to
that assumption Naive Bayes not Bayesian
Networks
Majority voting is used
when doing a classification
Majority voting is not used when k-NN is applied for linear regression T task, when doing
regression the avg is used.

it allows to express joint


Chain Rule does not simplify calculation of probabilities for BNs F probabilities as product of
conditional probabilities
The parameters that are
learned in Naive Bayes are the
prior probabilities of different
Naive Bayes is a lazy learner F
classes, as well as the
likelihood of different features
for each class
Depends on the data. For small
Gradient Descent, Step-by-
Gradient descent is always more efficient than Normal Equation (analytical approach) for linear regression F datasets the analytical
Step - YouTube
approach might be faster.
Depends on the size of the
dataset. Normally for large high
Normal Equation (analytical approach) is always more efficient than gradient descent for linear regression F dimensional datasets the
analytical solution is to
expensive.
supervised learning = data
knn is based on supervised paradigm T knn
is labeled
knn is based on unsupervised paradigm F knn
one vs all is approach used by Naive Bayes F Used in DTs
Classification is a machine learning task where the target attribute is nominal T Intro
Decision trees can handle only binary classification problems F also multi-class classification
in range (0, 1), tanh is in range
A softmax function in MLPs transforms the activation to a range of -1…1 F
(-1, 1)
Computing the mean for a
Macro-averaging for classifier evaluation first calculates accuracy/precision/recall/… per class, before averaging across
T performance measure
classes
across all classes
The paired t-test is used when testing for statistical significance of results obtained with holdout validation F
The paired t-test is used when testing for statistical significance of results obtained with cross validation T
Entropy (for data science)
In a dataset the entropy is lowest when all classes have the same amount of samples F Clearly Explained!!! -
YouTube
In a dataset the entropy is highest when all classes have the same amount of samples T Entropy is 1 in this case
In AdaBoost, the weights are randomly initialised F Normally 1/n as initialization
Support Vector Machines
not sure about the linear
Support Vector Machines with a linear kernel are particularly suitable for classification of very high dimensional, sparse data T Part 1 (of 3): Main Ideas!!! -
kernel?
YouTube
Can compute probabilities
If Naive Bayes is applied on a data set that contains also numeric attributes then a probability density function must always only for nominal values,
T
be used PDF is needed for
numerical
those would be statistical and
information-theoretic features.
not extracted directly from the
dataset in the way raw data
features are. Instead, they are
derived from models that have
been trained on the dataset or
subsets of the dataset. raw data
Model based features used for meta-learning are extracted directly from the data set F features are directly extracted
from datasets, model-based
features in meta-learning are
derived from the analysis of
models that have interacted
with those datasets, aiming to
capture higher-level knowledge
about the learning process
itself.
used when applied for
Majority voting is not used when k-nn is applied for linear regression T knn
classification tasks
For the Monte Carlo method in reinforcement learning, value estimates and policies are changed only on the completion of
T
an episode
L3 -
Information gain is an unsupervised feature selection method F
DecisionTree
Not really, bc in theory the more
features you use the more
Feature selection is primarly useful to improve the effectiveness of machine learning T F information the model has. But
selecting features might
improve efficiency.
Ordinal data can be
ordered, but distance
Ordinal data does not allow distances to be computed between data points T cannot be computed
between points (ex: military
rank)
The first tree in a gradient
The first model in gradient boosting is a zero rule model T boosting is just the root
node.
PCA is a supervised feature selection method F Check the last slides

28.05.2024 1
Machine Learning.xlsx

Can end up in a local min (!=


SVM with gradient descent always finds the optimal hyperplane F
global min)
Can end up in a local min (!=
Gradient descent always finds the global optimum F
global min)
GPT answer is no, If RNN is
unrolled it remains finite.
Unrolling provides a way to
A RNN can be unrolled into an infinite fully connected network F
understand how information
flows through an RNN during
training
Pooling and convolution are operations related to RNNs F Related to CNN
Homogenous = all models
Random forest is a homogeneous ensemble method T
are from the same class
If the weak learners are linear
classifiers, such as decision
If you use use several weak learners h with boosting to get a classifier H and all h are linear classifiers, then H is also a stumps or decision trees with a
F
linear classifier single split, then the final
classifier H will be a non-linear
classifier.
K-d Trees split the search
K-d-Tree can be used as search space optimisation for k-NN T
space in smaller areas
No boosting applied, the trees
Random Forests is a boosting ensemble technique F are not dependent on previous
grown trees.
BP is used to update
Back propagation is a method for training Multi-Layer Perceptrons T weights in an MLP after
each forward pass
An MLP with only linear
activation function can be
Suppose we have a neural network with ReLU activation function. Let's say, we replace ReLU activations by linear
reduced to a single layer linear
activations. Would this new neural network be able to approximate an XOR function? (Note: The neural network was able to F
model thus will not be able to
approximate the XOR function with activation function ReLu)
approximate XOR/non-linear
decision boundaries.
The entropy of a data set is based solely on the relative frequencies of the data distribution, not on the absolute number of
T E = -Sum(p(x) * log(p(x))
data points present
Though the original set up was
for binary, with one vs. rest
Support Vector Machine can by default only solve binary classification problems F approach one could set up a
SVM for multi class
classification.
gives goo resuts for
Naive Bayes gives usually good results for regression data sets F
classification tasks
Learning the structure of Bayesian networks is usually simpler than learning the probabilities F
Learning the structure of Bayesian networks is usually more complicated than learning the probabilities T
Error is greater when
The mean absolute error (a performance metric used for regression) is less sensitive to outliers than MSE T
squared
"Number of attributes of data set" is not a model based features that is used for meta-learning T
Kernel projections is just a
technique also used by SVM
Kernel projections can only be used in conjunction with support vector machines F
but can be applied to a
multitude of tasks.
Suppose a convolutional neural network is trained on ImageNet dataset. This trained model is then given a completely white Would favor classes with mostly
F
image as an input. The output probabilities for this input would be equal for all classes. white pixels
Not always. It is used Quadratic
Programming to solve for the
When learning an SVM with gradient descent, it is guaranteed to find the globally optimal hyper plane. F
parameters that define the
hyperplane.
Grid search is probably not the
state of the art optimizer, other
optimizers as random search,
Usually state of the art AutoML systems use grid search to find best hyperparameters T F baysian optimization or genetic
algorithms normally are better
suited since they are often
faster.
Linear regression converges when performed on linearly separable data F
Linear regression converges when performed on linearly not separable data F
It is not always necessary to
use Laplace smoothing with
Laplace Corrector must be used when using Naive Bayes F Naive Bayes. It depends on the
specific application and the
data you are working with.
Note: only for AdaBoost this is
Gradient boosting minimizes residual of previous classifiers T
FALSE.
Decision Trees using error rate vs entropy leads to different results T
depth of tree not larger than
Depth of decision tree can be larger than the number of training samples used to create a tree F
number of samples
Later models are based on
Is ensamble boosting easily parallelizable F previous ones, so not that easy
to parellize.
Multi-Armed Bandit : Data
k-armed bandits choose the next action based on the expected future reward. T Science Concepts -
YouTube
A bayesian network is a directed cyclic graph F It is a directed ACYCLIC graph.
A decision tree can be converted into a rule set T

28.05.2024 2

You might also like