0% found this document useful (0 votes)

15 views16 pages

AI and Math - Python Multiple-Choice Questions

The document contains a series of multiple-choice questions and answers related to AI and machine learning concepts, including gradient descent, loss functions, model evaluation metrics, activation functions, and reinforcement learning. Each question is followed by an explanation of the correct answer, providing insights into the underlying principles of the topics discussed. The content is structured to test knowledge and understanding of key machine learning concepts and practices.

Uploaded by

actionmaster679

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

AI and Math - Python Multiple-Choice Questions

Uploaded by

actionmaster679

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

AI and Math/Python Multiple-Choice Questions

AI and Machine Learning MCQs

1. Which statement about gradient descent on a non-convex function is true?
A) It always finds the global minimum.
B) It may converge to a local minimum or saddle point.
C) It cannot be applied to non-convex functions.
D) It always oscillates without converging.
Answer: B) It may converge to a local minimum or saddle point.
Explanation: Gradient descent can be applied to non-convex loss functions, but unlike the convex
case, there is no guarantee of finding a global minimum. It will converge to a stationary point (which
could be a local minimum or saddle point) rather than necessarily the global minimum 1 .

2. What happens if the learning rate in gradient descent is set too large?
A) Training will be very slow.
B) The algorithm may overshoot and diverge.
C) It ensures convergence to the minimum faster.
D) It has no effect on convergence.
Answer: B) The algorithm may overshoot and diverge.
Explanation: A very large learning rate causes each update to overshoot the optimum. In fact, if the
learning rate exceeds a critical threshold, gradient descent will diverge (i.e., fail to converge) 2 . This
can make the loss jump around or grow indefinitely instead of settling.

3. What is the effect of a very small learning rate in gradient descent?

A) It prevents convergence.
B) It makes training extremely slow and possibly stuck.
C) It causes oscillations around the optimum.
D) It has the same effect as a large learning rate.
Answer: B) It makes training extremely slow and possibly stuck.
Explanation: If the learning rate is too small, weight updates are tiny, so gradient descent converges
very slowly and may require many epochs. The model might also get stuck in flat regions or poor
local minima because updates are too small to move to better solutions 3 .

4. Why is cross-entropy loss often preferred over mean squared error (MSE) for classification with
a sigmoid output?
A) MSE leads to convex optimization, cross-entropy does not.
B) Cross-entropy simplifies to MSE in logistic regression.
C) Cross-entropy is convex in final weights, MSE with sigmoid may not be.
D) Cross-entropy always produces smaller loss values than MSE.
Answer: C) Cross-entropy is convex in final weights, MSE with sigmoid may not be.
Explanation: When using a sigmoid activation for binary classification, the cross-entropy loss is
convex in the final layer’s parameters, whereas MSE combined with a sigmoid is not convex. This

1
means MSE could get stuck in a local minimum, whereas cross-entropy provides a more direct
gradient for learning 4 . Practically, cross-entropy loss tends to converge faster for classification
tasks.

5. Which loss function is appropriate for a binary classification problem with a sigmoid output?
A) Mean Squared Error (MSE)
B) Hinge Loss
C) Binary Cross-Entropy (Log Loss)
D) Categorical Cross-Entropy
Answer: C) Binary Cross-Entropy (Log Loss).
Explanation: For binary classification (sigmoid output), binary cross-entropy (also called log loss) is
commonly used. It measures the difference between predicted probabilities and actual binary labels
5 . MSE can be used but is less effective in this setting. Hinge loss is typical for SVMs, and

categorical cross-entropy is for multi-class classification.

6. Which loss function is commonly used to train a (linear) Support Vector Machine (SVM)?
A) Mean Squared Error
B) Binary Cross-Entropy
C) Hinge Loss
D) Categorical Cross-Entropy
Answer: C) Hinge Loss.
Explanation: SVMs are margin-based classifiers, and their objective uses the hinge loss. Hinge loss
is defined as max(0, 1 – y·f(x)) for labels y ∈ {+1, -1} 6 . This loss penalizes points within the
margin or misclassified points, enforcing a margin of at least 1 for correct classifications.

7. Which of the following is a characteristic of a generative model (as opposed to a discriminative

model)?
A) It models only the conditional probability P(Y|X).
B) It cannot generate new data samples.
C) It captures the joint distribution P(X, Y) and can generate new data.
D) It always has lower error than discriminative models.
Answer: C) It captures the joint distribution P(X, Y) and can generate new data.
Explanation: Generative models learn a model of the joint probability (or equivalently model P(X|Y)
and P(Y)). They can generate new synthetic data points similar to the training set. As one source
notes, “Generative models can generate new data points … They capture the joint probability and
can be used for generative tasks” 7 . Discriminative models learn only P(Y|X) and do not generate
new data.

8. In a confusion matrix for a binary classifier, what does a false positive (Type I error) represent?
A) Model predicted positive, actual negative.
B) Model predicted negative, actual positive.
C) Both predicted and actual are positive.
D) Both predicted and actual are negative.
Answer: A) Model predicted positive, actual negative.
Explanation: A false positive (FP) occurs when the model predicts the positive class (e.g., “yes” or “1”)
but the true label is negative (e.g., “no” or “0”). It is indeed the case of “predicted positive/actual
negative” 8 .

2
9. What is precision in a binary classification context?
A) TP / (TP + FN)
B) TP / (TP + FP)
C) (TP + TN) / (TP + FP + TN + FN)
D) FP / (FP + TN)
Answer: B) TP / (TP + FP).
Explanation: Precision measures how many of the positively predicted instances are actually
positive. It is defined as true positives divided by all predicted positives (TP + FP) 9 . A high precision
means most predicted positives are correct.

10. What is recall (sensitivity) in a binary classification context?

A) TP / (TP + FN)
B) TN / (TN + FP)
C) (TP + TN) / (TP + FP + TN + FN)
D) FP / (FP + TN)
Answer: A) TP / (TP + FN).
Explanation: Recall (also called sensitivity or true positive rate) measures how many of the actual
positive instances were correctly identified. It is defined as true positives divided by all actual
positives (TP + FN) 10 .

11. Which statement about F1 score is correct?

A) F1 is the arithmetic mean of precision and recall.
B) F1 = (Precision + Recall) / 2.
C) F1 = 2 * (Precision * Recall) / (Precision + Recall).
D) F1 only depends on accuracy and precision.
Answer: C) F1 = 2 * (Precision * Recall) / (Precision + Recall).
Explanation: The F1-score is the harmonic mean of precision and recall, given by
2 * precision * recall / (precision + recall) 11 . This combines both metrics to give
a single measure, which is especially useful for imbalanced classes.

12. Why might one prefer F1 score over accuracy for imbalanced classification problems?
A) F1 ignores false positives completely.
B) F1 equally weighs precision and recall, capturing performance on the minority class.
C) Accuracy is always unreliable.
D) F1 only considers true positives.
Answer: B) F1 equally weighs precision and recall, capturing performance on the minority class.
Explanation: For imbalanced datasets, accuracy can be misleading because it may be high simply by
predicting the majority class. F1 score balances precision and recall, providing a single measure of a
model’s accuracy on the positive class. It “gives a better sense of the classifier’s performance,
especially on skewed datasets” 12 .

13. In a spam detection task where false positives (legitimate email marked spam) are costly,
which metric should be maximized?
A) Precision
B) Recall
C) Accuracy
D) AUC-ROC

3
Answer: A) Precision.
Explanation: In this scenario, we want to minimize false positives (legitimate emails incorrectly
flagged). Maximizing precision (TP/(TP+FP)) ensures that when we predict spam, it is indeed spam
9 . This reduces the rate of false positives.

14. In a medical test where missing a true case (false negative) is critical, which metric should be
maximized?
A) Precision
B) Recall
C) Accuracy
D) Specificity
Answer: B) Recall.
Explanation: Here false negatives are very costly (missing a sick patient). We want to maximize recall
(sensitivity = TP/(TP+FN)) to catch as many true positive cases as possible 10 . A high recall means
few actual positives are missed.

15. What is the accuracy of a classifier?

A) TP / (TP + FP)
B) TN / (TN + FP)
C) (TP + TN) / (TP + TN + FP + FN)
D) 2 * (Precision * Recall) / (Precision + Recall)
Answer: C) (TP + TN) / (TP + TN + FP + FN).
Explanation: Accuracy measures the overall fraction of correct predictions: both true positives and
true negatives out of all predictions 13 .

16. Which activation function is defined as f(x) = max(0, x)?

A) Sigmoid
B) Tanh
C) ReLU (Rectified Linear Unit)
D) Softmax
Answer: C) ReLU (Rectified Linear Unit).
Explanation: The ReLU activation outputs the input directly if it is positive, otherwise it outputs zero
14 . In formula terms, f(x) = max(0, x). It is very popular in deep networks because it is simple and

alleviates the vanishing gradient problem.

17. What is a drawback of the sigmoid activation function in deep networks?

A) It outputs values outside [0,1].
B) It causes vanishing gradients for large positive inputs.
C) It is not differentiable.
D) It always produces negative outputs.
Answer: B) It causes vanishing gradients for large positive (or negative) inputs.
Explanation: Sigmoid squashes inputs to (0,1). For very large magnitude inputs, its gradient
becomes near zero, leading to the vanishing gradient problem in deep networks 15 . This makes
learning slow or stalled in deep layers.

18. How does the ReLU activation help with the vanishing gradient problem?
A) It bounds outputs, preventing overflow.

4
B) Its derivative is either 0 or 1, avoiding small gradients.
C) It is non-monotonic.
D) It normalizes the input distribution.
Answer: B) Its derivative is either 0 or 1, avoiding small gradients.
Explanation: Unlike sigmoid, ReLU’s derivative is 1 for positive inputs and 0 for negative. This means
positive values propagate gradients effectively without diminishing. As one source notes, using ReLU
“prevents the gradient from vanishing” because the gradient does not shrink towards zero for
positive inputs 16 .

19. What does the softmax function do when applied to the output of a neural network?
A) Converts raw scores to a probability distribution over classes.
B) Scales values to the range [-1,1].
C) Shifts all values by their mean.
D) Selects the highest scoring class.
Answer: A) Converts raw scores to a probability distribution over classes.
Explanation: The softmax function exponentiates each score and normalizes by the sum of
exponentials, resulting in values in (0,1) that sum to 1 17 . This makes the outputs interpretable as
class probabilities in multi-class classification.

20. What is the purpose of dropout in training neural networks?

A) To speed up matrix computations.
B) To augment data by mixing inputs.
C) To prevent overfitting by randomly omitting units.
D) To ensure deterministic outputs.
Answer: C) To prevent overfitting by randomly omitting units.
Explanation: Dropout randomly “drops” (sets to zero) a subset of neurons in each training step,
which prevents units from co-adapting. It effectively trains an ensemble of thinner networks. This
technique “significantly reduces overfitting” in large networks 18 .

21. What is batch normalization used for?

A) It adds dropout layers to a network.
B) It accelerates training by normalizing layer inputs.
C) It ensures batch sizes are equal in training.
D) It sums gradients across batches.
Answer: B) It accelerates training by normalizing layer inputs.
Explanation: Batch normalization rescales and recenters the inputs of each layer (within each mini-
batch) so they have zero mean and unit variance during training. This makes training faster and
more stable 19 .

22. In reinforcement learning, how does an agent learn?

A) By supervised labels from a dataset.
B) By receiving rewards from interactions with an environment.
C) By clustering data points.
D) By minimizing reconstruction error.
Answer: B) By receiving rewards from interactions with an environment.
Explanation: In RL, an agent takes actions in an environment and learns from the feedback.

5
Specifically, the environment provides a reward signal and new state, guiding the agent to improve
its policy 20 . The agent’s goal is to maximize cumulative reward.

23. Which ensemble method does a Random Forest primarily use?

A) Boosting (sequential).
B) Bagging (parallel).
C) Stacking.
D) AdaBoost.
Answer: B) Bagging (parallel).
Explanation: Random Forest builds many decision trees independently on bootstrap-sampled
subsets of data (and random feature subsets). This is a form of “bagging” (bootstrap aggregating)
21 . The final prediction averages (or votes) over these trees.

24. Which ensemble technique builds models sequentially, where each new model focuses on the
errors of the previous models?
A) Bagging
B) Boosting
C) Random Subspace
D) Cross-Validation
Answer: B) Boosting.
Explanation: Boosting trains an ensemble of “weak learners” sequentially. Each new model pays
more attention (higher weight) to instances the previous models misclassified, thereby iteratively
correcting errors 22 . This is opposite to bagging, which builds models independently.

25. Which of the following learning algorithms is unsupervised?

A) Support Vector Machine (SVM)
B) K-Means Clustering
C) Logistic Regression
D) Decision Tree Classifier
Answer: B) K-Means Clustering.
Explanation: K-Means is a classic unsupervised algorithm that groups unlabeled data into clusters
based on similarity 23 . The other listed methods (SVM, logistic regression, decision tree classifier)
are supervised learning algorithms.

26. What is one fundamental limitation of a single-layer perceptron?

A) It cannot perform regression.
B) It cannot solve problems that are not linearly separable.
C) It always overfits the training data.
D) It cannot classify any data beyond binary.
Answer: B) It cannot solve problems that are not linearly separable.
Explanation: A perceptron is a linear classifier (a single-layer neural network). It can only separate
data with a straight line (hyperplane). As famously noted, it cannot solve problems like XOR which
are not linearly separable 24 . This limitation motivated multi-layer networks.

27. Which part of a Generative Adversarial Network (GAN) is responsible for creating new data
samples?
A) The discriminator

6
B) The convolutional layer
C) The generator
D) The loss function
Answer: C) The generator.
Explanation: In a GAN, there are two neural networks: the generator creates new synthetic data
(e.g., images) that resemble the training data, while the discriminator tries to distinguish real from
generated samples 25 . The generator’s goal is to produce outputs so realistic that the discriminator
cannot tell them apart.

28. Which scenario is typical of reinforcement learning?

A) An agent learns to classify images given labeled examples.
B) An agent learns to predict future stock prices from historical data.
C) An agent learns a policy to maximize reward by trial-and-error interactions with an environment.
D) An agent compresses data into a lower-dimensional representation.
Answer: C) An agent learns a policy to maximize reward by trial-and-error interactions with an
environment.
Explanation: Reinforcement learning involves an agent that takes actions in an environment,
receiving a reward signal. The agent’s goal is to maximize cumulative reward through trial and error
20 . This distinguishes it from supervised or unsupervised paradigms.

29. What metric is given by (2 × Precision × Recall)/(Precision + Recall) ?

A) Accuracy
B) Specificity
C) F1 score
D) Dice coefficient (note: similar formula)
Answer: C) F1 score.
Explanation: This formula is exactly the F1-score (harmonic mean of precision and recall) 11 . (The
Dice coefficient in binary classification is the same formula as F1, but the standard term here is F1-
score.)

30. Why is F1-score especially useful in imbalanced datasets?

A) It only uses true negatives.
B) It balances precision and recall, capturing minority class performance.
C) It gives more weight to the majority class.
D) It is equivalent to accuracy when classes are imbalanced.
Answer: B) It balances precision and recall, capturing minority class performance.
Explanation: F1 combines precision and recall, giving insight into how well the model does on the
positive (often minority) class. For imbalanced data, focusing solely on accuracy can be misleading,
while F1 reflects performance on both false positives and false negatives 11 12 .

31. Which of the following is a generative model?

A) Logistic Regression
B) Support Vector Machine (SVM)
C) Naive Bayes
D) Decision Tree (CART)
Answer: C) Naive Bayes.
Explanation: Naive Bayes is a generative model: it models the joint distribution P(X, Y) by P(X|Y)P(Y).

7
Logistic regression, SVM, and CART are discriminative, modeling P(Y|X) directly. (This follows the idea
that generative models capture joint probabilities 7 .)

32. If a model predicts 100% of instances as the positive class in a highly imbalanced dataset,
which metric will it appear deceptively high on?
A) Precision
B) Recall
C) Accuracy
D) F1 Score
Answer: C) Accuracy.
Explanation: In imbalanced data, predicting all samples as the majority class yields high accuracy
(since that class dominates), but precision/recall on the minority class is poor. This shows accuracy
can be misleading for imbalanced problems.

33. Which activation function would you choose to mitigate the vanishing gradient problem in a
deep network?
A) Sigmoid
B) Hyperbolic tangent (tanh)
C) ReLU or its variants
D) Linear (identity)
Answer: C) ReLU or its variants.
Explanation: ReLU (Rectified Linear Unit) is non-saturating for positive inputs, with a constant
gradient of 1. This avoids gradient shrinkage that plagues sigmoid/tanh. As noted, replacing sigmoid
with ReLU “is the simplest solution to the vanishing gradient problem” 26 .

34. In a binary classification with an extremely imbalanced class distribution, which loss function
is most suitable?
A) Regular (unweighted) cross-entropy
B) Mean Squared Error
C) Weighted or focal loss variant of cross-entropy
D) Hinge loss
Answer: C) Weighted or focal loss variant of cross-entropy.
Explanation: For extreme class imbalance, one often uses weighted cross-entropy or specialized
losses like focal loss to give more importance to the minority class. A standard unweighted loss (like
regular cross-entropy or MSE) would bias toward the majority class. (Focal loss, for example, down-
weights easy examples to focus on hard ones.)

35. Which method is a form of regularization that encourages model weights to become sparse
(many zeros)?
A) L1 regularization
B) L2 regularization
C) Dropout
D) Batch normalization
Answer: A) L1 regularization.
Explanation: L1 regularization adds the sum of absolute weights to the loss. This has the effect of
pushing many weights exactly to zero, yielding sparse solutions. In contrast, L2 regularization (sum
of squares) only shrinks weights towards zero but rarely makes them exactly zero 27 .

8
36. What effect does L2 regularization (weight decay) have on model weights?
A) It sets all weights exactly to zero.
B) It encourages weights to become small (but usually nonzero).
C) It only affects biases.
D) It increases the magnitude of weights to prevent underfitting.
Answer: B) It encourages weights to become small (but usually nonzero).
Explanation: L2 regularization adds the squared norm of weights to the loss, causing weights to
decay towards zero. However, unlike L1, L2 typically produces small weights rather than exact zeros
28 .

37. Which scenario describes internal covariate shift that batch normalization addresses?
A) Changing distribution of inputs to hidden layers during training.
B) Data labels changing during training.
C) Overfitting due to low training error.
D) Underfitting due to insufficient model capacity.
Answer: A) Changing distribution of inputs to hidden layers during training.
Explanation: Internal covariate shift refers to the changing distribution of layer inputs as the
network parameters update. Batch normalization reduces this shift by keeping layer inputs
normalized (zero mean and unit variance), thus stabilizing training 29 .

38. Why might one use an unsupervised dimensionality reduction technique before training a
supervised model?
A) To label new data.
B) To remove noise and reduce overfitting.
C) To increase the number of features.
D) To convert categorical to numerical features.
Answer: B) To remove noise and reduce overfitting.
Explanation: Unsupervised reduction (like PCA) can compress data by capturing most variance,
potentially filtering noise and reducing model complexity. This can improve generalization and
reduce overfitting by lowering dimensionality before training a supervised model.

39. If your model is overfitting the training data, which of the following is a valid approach?
A) Remove regularization.
B) Increase model complexity (more layers, more neurons).
C) Add dropout or increase regularization (e.g., L2).
D) Use a larger learning rate to train faster.
Answer: C) Add dropout or increase regularization (e.g., L2).
Explanation: Overfitting indicates the model is too complex for the amount of data. To combat it,
one can introduce or strengthen regularization (like L2 weight decay) or use dropout (randomly
omitting units during training) to reduce co-adaptation 18 . Increasing model complexity or
removing regularization would worsen overfitting.

40. What does an activation function add to a neural network?

A) Linearity to the model.
B) Non-linearity enabling learning of complex functions.
C) Additional regularization.
D) The optimization algorithm.

9
Answer: B) Non-linearity enabling learning of complex functions.
Explanation: Activation functions (e.g., ReLU, tanh) introduce non-linear transformations to
neurons. This allows the network to approximate complex non-linear mappings. Without non-linear
activation, a deep network would collapse to a linear function regardless of depth.

Math and Python MCQs

1. What is the value of the combination “n choose k” (the number of ways to choose k items from
n)?
n!
A) (n−k)!
n!
B) k!(n−k)!
C) nk
n!
D) (k−1)!(n−k+1)!
n!
Answer: B) k!(n−k)! .
n!
Explanation: The binomial coefficient, read as "n choose k", is given by k!(n−k)! . This formula counts
the number of ways to select k elements from n without regard to order 30 .

2. Which of the following correctly states Bayes’ theorem?

A) P (A∣B) = P (A) + P (B)
B) P (A∣B) = P (B∣A) P (A)/P (B)
C) P (A∣B) = P (A ∩ B)/P (B)
D) P (A∣B) = P (A ∪ B)
P (B∣A) P (A)
Answer: B) P (A∣B) = P (B) .
P (B∣A)P (A)
Explanation: Bayes’ theorem relates conditional probabilities by P (A∣B) = P (B)
31 .

3. What is the probability of getting exactly k successes in n independent Bernoulli trials with
success probability p?
A) pk (1 − p)n−k
n
B) (k )pn−k (1 − p)k
n
C) (k )pk (1 − p)n−k
n!
D) k!(n−k)! (without p factors)
n
Answer: C) (k )pk (1 − p)n−k .
Explanation: The probability of exactly k successes in n Bernoulli(p) trials is given by the binomial
n n
formula (k )pk (1 − p)n−k 32 , where (k ) = n!/(k!(n − k)!) .

4. Which statement is equivalent to saying "matrix A is invertible"?

A) det(A) = 0 .
B) det(A) 0. =
C) Zero is an eigenvalue of A.
D) A is rectangular.
Answer: B) det(A)  0 . =
Explanation: A square matrix A is invertible (non-singular) if and only if its determinant is nonzero.
Equivalently, 0 is not an eigenvalue of A 33 .

10
5. If a 3×3 matrix has eigenvalues 2, 3, and 4, what is its determinant?
A) 9
B) 24
C) 20
D) 6
Answer: B) 24.
Explanation: The determinant of a matrix equals the product of its eigenvalues (for an n×n matrix).
Here det(A) = 2 × 3 × 4 = 24.

6. What is the output of the following Python code?

def append_to(element, to=[]):

to.append(element)
return to

print(append_to(12))
print(append_to(42))

A) [12] then [42]

B) [12] then [42, 12]
C) [12] then [12, 42]
D) [12, 42] then [42]
Answer: C) [12] then [12, 42].
Explanation: The default list to is created once and shared across function calls. On the first call it
becomes [12] . On the second call, the same list gets appended with 42, yielding [12, 42] . This
behaviour arises because Python evaluates default arguments only once at definition time 34 .

7. What does the expression a is b check in Python?

A) Whether a and b have the same value.
B) Whether a and b are the same object in memory.
C) Whether the contents of a and b are deeply equal.
D) Whether a and b have the same type.
Answer: B) Whether a and b are the same object in memory.
Explanation: The is operator checks identity: it returns True only if both variables point to the
exact same object 35 . (This is different from == , which checks if values are equal.)

8. Given the Python code below, what will be printed?

x = [1, 2, 3]
y = [1, 2, 3]
print(x == y)
print(x is y)

A) True then True

B) True then False

11
C) False then True
D) False then False
Answer: B) True then False.
Explanation: x == y checks value equality, which is True because both lists contain [1,2,3].
However, x is y checks object identity. x and y are two distinct list objects, so x is y is
False 36 .

9. What is a key difference between a Python list and tuple?

A) Lists can be used as dictionary keys, tuples cannot.
B) Lists are immutable, tuples are mutable.
C) Lists are mutable, tuples are immutable.
D) Lists support fewer operations than tuples.
Answer: C) Lists are mutable, tuples are immutable.
Explanation: In Python, lists can be modified (mutable), whereas tuples cannot be changed after
creation (immutable) 37 . This also means tuples can be used as keys in dictionaries (since they are
hashable) but lists cannot.

10. Which of these can be used as a key in a Python dictionary?

A) A list
B) A tuple
C) A dict
D) A set
Answer: B) A tuple.
Explanation: Dictionary keys must be immutable (hashable). Tuples are immutable and can serve as
keys; lists, dicts, and sets are mutable and cannot be keys. (This follows from the list-vs-tuple
immutability property 37 .)

11. What will be the output of the following code?

P = [[0] * 3] * 3
P[0][0] = 5
print(P)

A) [[5, 0, 0], [0, 0, 0], [0, 0, 0]]

B) [[5, 0, 0], [5, 0, 0], [5, 0, 0]]
C) [[5, 0, 0], [0, 5, 0], [0, 0, 5]]
D) Error
Answer: B) [[5, 0, 0], [5, 0, 0], [5, 0, 0]] .
Explanation: Using [[0] * 3] * 3 creates three references to the same inner list. So modifying
one row ( P[0][0] = 5 ) changes all rows at that column. All three rows share the same list object
38 .

12. What is the result of 0.1 + 0.2 == 0.3 in Python, and why?
A) True, because 0.1 + 0.2 exactly equals 0.3.
B) False, because floating-point representations are imprecise.
C) True, because of automatic rounding.

12
D) False, because the == operator is broken in Python.
Answer: B) False, because floating-point representations are imprecise.
Explanation: In binary floating point (IEEE 754), numbers like 0.1 and 0.2 cannot be represented
exactly. Thus 0.1+0.2 results in a number very close to but not exactly 0.3, making
(0.1 + 0.2) == 0.3 evaluate to False 39 .

13. What is the formula for the sum of the first n positive integers (1 + 2 + … + n)?
A) n(n + 1)/2
B) n2
C) n(n − 1)/2
D) n2 +n
Answer: A) n(n + 1)/2 .
Explanation: The well-known formula for the sum of the first n integers is 1+2+⋯+n=
n(n+1)
2 . This can be derived by pairing terms or via induction.

14. What is the probability of getting exactly 2 heads in 3 fair coin flips?
A) 0.25
B) 0.375
C) 0.5
D) 0.75
Answer: B) 0.375.
3
Explanation: There are (2) = 3 ways to get 2 heads out of 3 flips, and each specific outcome has
probability (0.5)3 = 0.125 . Thus the probability is 3 × 0.125 = 0.375 .

15. Given a = [1, 2] and b = a , what does a become after a += [3] ?

A) [1, 2, 3] and b is still [1, 2] .
B) [1, 2, 3] and b is also [1, 2, 3] .
C) [1, 2] and b is [1, 2, 3] .
D) Error, because b is a copy of a.
Answer: B) [1, 2, 3] and b is also [1, 2, 3] .
Explanation: b = a makes b reference the same list as a . The += [3] operation modifies the
list in place, so both a and b will reflect the change, yielding [1, 2, 3] .

16. What will be printed by the following code?

for i in range(5):
pass
print(i)

A) 4
B) 5
C) 0
D) Error, since i is not defined outside the loop.
Answer: A) 4 .

13
Explanation: In Python, the loop variable i remains defined after the loop ends, retaining its last
value. After range(5) , the last value assigned to i was 4, so print(i) outputs 4 .

17. In Python, what does the expression len('42') return?

A) 42
B) 2
C) Error, because '42' is not numeric.
D) 1
Answer: B) 2.
Explanation: The string '42' has two characters ('4' and '2'), so len('42') returns 2. (Calling
len() on an integer value like 42 would cause a TypeError 40 , but on the string it counts
characters.)

18. Which of the following is not a mutable data type in Python?

A) list
B) dict
C) tuple
D) set
Answer: C) tuple.
Explanation: Tuples are immutable (cannot be changed after creation) 37 . Lists, dictionaries, and
sets are all mutable.

19. What is the output of this Python code?

def func(x, y=0):

return x + y

print(func(3))
print(func(3, 4))

A) 3 then 7
B) 0 then 7
C) 3 then 4
D) Error, because y has a default.
Answer: A) 3 then 7 .
Explanation: Calling func(3) uses the default y=0 , so it returns 3+0=3 . Calling func(3,4)
overrides the default, returning 3+4=7 .

20. In Python, what exception is raised by len(42) ?

A) ValueError
B) TypeError
C) NameError
D) No exception, it returns 2.
Answer: B) TypeError.

14
Explanation: Calling len() on an integer (non-sequence) raises TypeError: object of type
'int' has no len() 40 , since integers do not support the len operation.

1 optimization - Can gradient descent be applied to non-convex functions? - Cross Validated

https://stats.stackexchange.com/questions/172900/can-gradient-descent-be-applied-to-non-convex-functions

2 3 Learning Rate in Gradient Descent

https://apxml.com/courses/introduction-to-deep-learning/chapter-3-training-loss-optimization/learning-rate

4 neural network - Loss Function for Probability Regression - Data Science Stack Exchange
https://datascience.stackexchange.com/questions/45285/loss-function-for-probability-regression

5 Binary Cross Entropy/Log Loss for Binary Classification

https://www.analyticsvidhya.com/blog/2021/03/binary-cross-entropy-log-loss-for-binary-classification/

6 Hinge-loss & relationship with Support Vector Machines | GeeksforGeeks

https://www.geeksforgeeks.org/hinge-loss-relationship-with-support-vector-machines/

7 25 Generative vs Discriminative Models: Differences & Use Cases | DataCamp

https://www.datacamp.com/blog/generative-vs-discriminative-models

8 9 10 11 12 13 Understanding the Confusion Matrix in Machine Learning | GeeksforGeeks

https://www.geeksforgeeks.org/confusion-matrix-machine-learning/

14 ReLU Activation Function in Deep Learning | GeeksforGeeks

https://www.geeksforgeeks.org/relu-activation-function-in-deep-learning/

15 16 26 Vanishing Gradient Problem: Causes, Consequences, and Solutions - KDnuggets

https://www.kdnuggets.com/2022/02/vanishing-gradient-problem.html

17 Softmax function - Wikipedia

https://en.wikipedia.org/wiki/Softmax_function

18 Dropout: A Simple Way to Prevent Neural Networks from Overfitting

https://jmlr.org/papers/v15/srivastava14a.html

19 29 Batch normalization - Wikipedia

https://en.wikipedia.org/wiki/Batch_normalization

20 Reinforcement learning - Wikipedia

https://en.wikipedia.org/wiki/Reinforcement_learning

21 22 Bagging vs Boosting in Machine Learning | GeeksforGeeks

https://www.geeksforgeeks.org/bagging-vs-boosting-in-machine-learning/

23 K means Clustering – Introduction | GeeksforGeeks

https://www.geeksforgeeks.org/k-means-clustering-introduction/

24 The Limitations of Perceptron: Why it Struggles with XOR | by Aryan Rusia | Medium
https://medium.com/@aryanrusia8/the-limitations-of-perceptron-why-it-struggles-with-xor-21905d31f924

27 28 How L1 Regularization brings Sparsity` | GeeksforGeeks

https://www.geeksforgeeks.org/how-l1-regularization-brings-sparsity/

15
30 Binomial coefficient - Wikipedia
https://en.wikipedia.org/wiki/Binomial_coefficient

31 Bayes' theorem - Wikipedia

https://en.wikipedia.org/wiki/Bayes%27_theorem

32 5.4.1: Binomial Distribution Formula - Statistics LibreTexts

https://stats.libretexts.org/Courses/Las_Positas_College/Math_40%3A_Statistics_and_Probability/
05%3A_Discrete_Probability_Distributions/5.03%3A_Binomial_Distribution/5.4.01%3A_Binomial_Distribution_Formula

33 Invertible matrix - Wikipedia

https://en.wikipedia.org/wiki/Invertible_matrix

34 Common Gotchas — The Hitchhiker's Guide to Python

https://docs.python-guide.org/writing/gotchas/

35 36 Python Object Comparison : “is” vs “==” | GeeksforGeeks

https://www.geeksforgeeks.org/python-object-comparison-is-vs/

37 Difference Between List and Tuple in Python | GeeksforGeeks

https://www.geeksforgeeks.org/python-difference-between-list-and-tuple/

38 Python list multiplication: [[...]]*3 makes 3 lists which mirror each other when modified - Stack Overflow
https://stackoverflow.com/questions/6688223/python-list-multiplication-3-makes-3-lists-which-mirror-each-other-when

39 c - is (0.1 + 0.2) == 0.3 true or false? - Stack Overflow

https://stackoverflow.com/questions/62727051/is-0-1-0-2-0-3-true-or-false

40 Python's Mutable vs Immutable Types: What's the Difference? – Real Python

https://realpython.com/python-mutable-vs-immutable-types/

Product Key List
100% (2)
Product Key List
6 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Loss Functions
No ratings yet
Loss Functions
7 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
Lect 9 - Loss Functions
No ratings yet
Lect 9 - Loss Functions
28 pages
Loss Functions
No ratings yet
Loss Functions
29 pages
What Are Probabilistic Machine Learning Models?
No ratings yet
What Are Probabilistic Machine Learning Models?
61 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
Module 6 - Loss Function
No ratings yet
Module 6 - Loss Function
22 pages
Ds 2
No ratings yet
Ds 2
27 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Week 7
No ratings yet
Week 7
25 pages
9.b Handout-1-Loss Functions
No ratings yet
9.b Handout-1-Loss Functions
3 pages
CS6910 Tutorial5
No ratings yet
CS6910 Tutorial5
9 pages
Lect 8
No ratings yet
Lect 8
117 pages
Exam 21
No ratings yet
Exam 21
17 pages
Unit 2 - Part A - B - C
No ratings yet
Unit 2 - Part A - B - C
25 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Loss Function in Deep Learning
No ratings yet
Loss Function in Deep Learning
15 pages
Lesson 4 Deep Neural Network and Tools
No ratings yet
Lesson 4 Deep Neural Network and Tools
159 pages
Assignment 02: Submitted To
No ratings yet
Assignment 02: Submitted To
4 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
3 - Loss Functions
No ratings yet
3 - Loss Functions
14 pages
Deep Learning (Part 2) - Loss Function and Gradient Function - by Sumbatilinda - Medium
No ratings yet
Deep Learning (Part 2) - Loss Function and Gradient Function - by Sumbatilinda - Medium
30 pages
ML Interview Questions Placements
No ratings yet
ML Interview Questions Placements
99 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Loss
No ratings yet
Loss
18 pages
Assignment 1 - Machine Learning
No ratings yet
Assignment 1 - Machine Learning
9 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
CS115 01
No ratings yet
CS115 01
38 pages
Loss Functions Types
No ratings yet
Loss Functions Types
11 pages
2 Softmaxregression
No ratings yet
2 Softmaxregression
4 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
Linear Classfiers, Loss
No ratings yet
Linear Classfiers, Loss
38 pages
Notes Chapter Linear Classifiers
No ratings yet
Notes Chapter Linear Classifiers
4 pages
Chapter 2 - Linear Classifiers
No ratings yet
Chapter 2 - Linear Classifiers
4 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Quiz1 Solutions Quiz 1 Soln
No ratings yet
Quiz1 Solutions Quiz 1 Soln
7 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Interview Questions For Machine Learning Total 215 Questions
100% (1)
Interview Questions For Machine Learning Total 215 Questions
70 pages
Lecture-4 Emprical Risk and Optimization
No ratings yet
Lecture-4 Emprical Risk and Optimization
20 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
4-Loss Function
No ratings yet
4-Loss Function
8 pages
Loss Functions
No ratings yet
Loss Functions
16 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Module 6
No ratings yet
Module 6
24 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Cross Entropy Loss Intro, Applications
No ratings yet
Cross Entropy Loss Intro, Applications
21 pages
Losses
No ratings yet
Losses
9 pages
CS-31002 (ML) - CS End April 2025
No ratings yet
CS-31002 (ML) - CS End April 2025
19 pages
DeekshikaJadyada20 AP24LDS11
No ratings yet
DeekshikaJadyada20 AP24LDS11
4 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
TRY OUT UJIAN NASIONAL SMA KOTAMADYA JAKARTA PUSAT Mata Pelajaran ...
No ratings yet
TRY OUT UJIAN NASIONAL SMA KOTAMADYA JAKARTA PUSAT Mata Pelajaran ...
10 pages
(FE) FE Hax V15 Gui
No ratings yet
(FE) FE Hax V15 Gui
86 pages
MSBC1201
No ratings yet
MSBC1201
3 pages
System Shock Walkthrough
No ratings yet
System Shock Walkthrough
16 pages
Mozambique Interactive Infrastructure Atlas PDF
No ratings yet
Mozambique Interactive Infrastructure Atlas PDF
3 pages
Finite Element Method in Structural Mechanics
No ratings yet
Finite Element Method in Structural Mechanics
5 pages
Commit Manual: Community Model Interface For Tsunami
No ratings yet
Commit Manual: Community Model Interface For Tsunami
29 pages
MySQL HADR Solution in Azure
No ratings yet
MySQL HADR Solution in Azure
15 pages
Script Satoshi Mine Gabriel Menezes
No ratings yet
Script Satoshi Mine Gabriel Menezes
2 pages
Flat Main
No ratings yet
Flat Main
18 pages
Steam Table
No ratings yet
Steam Table
43 pages
CH 06A Operating System Basics
No ratings yet
CH 06A Operating System Basics
49 pages
Security in The Software Lifecycle
No ratings yet
Security in The Software Lifecycle
219 pages
Project Online Domestic Helper
No ratings yet
Project Online Domestic Helper
90 pages
Estrategias para Triunfar Miguel Angel Cornejo Descargar
50% (2)
Estrategias para Triunfar Miguel Angel Cornejo Descargar
2 pages
Description: Tags: EDconnect v72 InstallGuide
No ratings yet
Description: Tags: EDconnect v72 InstallGuide
27 pages
E Balagurusamy ,"object Oriented Programming With C++", 4 Edition, Mcgraw-Hill 2008. 2. Robert L. Kruse and Alexander J. Ryba, "Data Structures and Program Design in C++", Prentice-Hall 2000
No ratings yet
E Balagurusamy ,"object Oriented Programming With C++", 4 Edition, Mcgraw-Hill 2008. 2. Robert L. Kruse and Alexander J. Ryba, "Data Structures and Program Design in C++", Prentice-Hall 2000
16 pages
Technical Communication Theory Lecture Notes
25% (4)
Technical Communication Theory Lecture Notes
12 pages
Seamore - Smartphone Linphone Guide For AESM
No ratings yet
Seamore - Smartphone Linphone Guide For AESM
9 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
Performance Analysis of Voip Using Udp and TCP Protocol: Ravonimanantsoa Ndaohialy Manda-Vy, Ratiarison Adolphe
No ratings yet
Performance Analysis of Voip Using Udp and TCP Protocol: Ravonimanantsoa Ndaohialy Manda-Vy, Ratiarison Adolphe
6 pages
Apparaisal of Fuzzy Social Utility: Michel - Garrabe@
No ratings yet
Apparaisal of Fuzzy Social Utility: Michel - Garrabe@
17 pages
What Can You Do With Python - Ecourse Review PDF
No ratings yet
What Can You Do With Python - Ecourse Review PDF
14 pages
Defect Tracking System: Sujata Solanke and Prof. Prakash N. Kalavadekar
No ratings yet
Defect Tracking System: Sujata Solanke and Prof. Prakash N. Kalavadekar
6 pages
Basic Key Logger Documentation
No ratings yet
Basic Key Logger Documentation
22 pages
Midterm 2 Professional Salesmanship Lesson
No ratings yet
Midterm 2 Professional Salesmanship Lesson
3 pages
Yocto Project Update and Overview - Building Custom Embedded Distributions
No ratings yet
Yocto Project Update and Overview - Building Custom Embedded Distributions
22 pages
Mobile Commerce 1.1 Overview
No ratings yet
Mobile Commerce 1.1 Overview
7 pages
Ops Last
No ratings yet
Ops Last
36 pages

AI and Math - Python Multiple-Choice Questions

Uploaded by

AI and Math - Python Multiple-Choice Questions

Uploaded by

AI and Math/Python Multiple-Choice Questions

AI and Machine Learning MCQs

3. What is the effect of a very small learning rate in gradient descent?

categorical cross-entropy is for multi-class classification.

7. Which of the following is a characteristic of a generative model (as opposed to a discriminative

10. What is recall (sensitivity) in a binary classification context?

11. Which statement about F1 score is correct?

15. What is the accuracy of a classifier?

16. Which activation function is defined as f(x) = max(0, x)?

alleviates the vanishing gradient problem.

17. What is a drawback of the sigmoid activation function in deep networks?

20. What is the purpose of dropout in training neural networks?

21. What is batch normalization used for?

22. In reinforcement learning, how does an agent learn?

23. Which ensemble method does a Random Forest primarily use?

25. Which of the following learning algorithms is unsupervised?

26. What is one fundamental limitation of a single-layer perceptron?

28. Which scenario is typical of reinforcement learning?

29. What metric is given by (2 × Precision × Recall)/(Precision + Recall) ?

30. Why is F1-score especially useful in imbalanced datasets?

31. Which of the following is a generative model?

40. What does an activation function add to a neural network?

Math and Python MCQs

2. Which of the following correctly states Bayes’ theorem?

4. Which statement is equivalent to saying "matrix A is invertible"?

6. What is the output of the following Python code?

def append_to(element, to=[]):

A) [12] then [42]

7. What does the expression a is b check in Python?

8. Given the Python code below, what will be printed?

A) True then True

9. What is a key difference between a Python list and tuple?

10. Which of these can be used as a key in a Python dictionary?

11. What will be the output of the following code?

A) [[5, 0, 0], [0, 0, 0], [0, 0, 0]]

15. Given a = [1, 2] and b = a , what does a become after a += [3] ?

16. What will be printed by the following code?

17. In Python, what does the expression len('42') return?

18. Which of the following is not a mutable data type in Python?

19. What is the output of this Python code?

def func(x, y=0):

20. In Python, what exception is raised by len(42) ?

1 optimization - Can gradient descent be applied to non-convex functions? - Cross Validated

2 3 Learning Rate in Gradient Descent

5 Binary Cross Entropy/Log Loss for Binary Classification

6 Hinge-loss & relationship with Support Vector Machines | GeeksforGeeks

7 25 Generative vs Discriminative Models: Differences & Use Cases | DataCamp

8 9 10 11 12 13 Understanding the Confusion Matrix in Machine Learning | GeeksforGeeks

14 ReLU Activation Function in Deep Learning | GeeksforGeeks

15 16 26 Vanishing Gradient Problem: Causes, Consequences, and Solutions - KDnuggets

17 Softmax function - Wikipedia

18 Dropout: A Simple Way to Prevent Neural Networks from Overfitting

19 29 Batch normalization - Wikipedia

20 Reinforcement learning - Wikipedia

21 22 Bagging vs Boosting in Machine Learning | GeeksforGeeks

23 K means Clustering – Introduction | GeeksforGeeks

27 28 How L1 Regularization brings Sparsity` | GeeksforGeeks

31 Bayes' theorem - Wikipedia

32 5.4.1: Binomial Distribution Formula - Statistics LibreTexts

33 Invertible matrix - Wikipedia

34 Common Gotchas — The Hitchhiker's Guide to Python

35 36 Python Object Comparison : “is” vs “==” | GeeksforGeeks

37 Difference Between List and Tuple in Python | GeeksforGeeks

39 c - is (0.1 + 0.2) == 0.3 true or false? - Stack Overflow

40 Python's Mutable vs Immutable Types: What's the Difference? – Real Python

You might also like