(Final) 600+ ML MCQ
(Final) 600+ ML MCQ
Do Not Misuse
Owner – Asmit
I tried to combine MCQs from different sources on internet in one place so that it'll be easy to find questions and
searching in a PDF is very efficient and easy.
If I'm sharing this PDF with you then instead of taking it for granted have some respect for someone's efforts.
I included almost each and every question that I was able to find on internet.
It is practically impossible that you'll get each and every MCQ that exist in world in the PDF cuz I'm not the one
making the questions,
and if you're intentionally making a manual question and spreading hate about me that you can't find a specific
question in the PDF, then I don't fucking care cuz you didn't ordered me to make the PDF, I made the PDF for myself.
PEACE
MORE MCQ
Ans : D
Ans : D
3. p → 0q is not a?
A. hack clause
B. horn clause
C. structural clause
D. system clause
Ans : B
A. STACK(A,B)
B. LIST(A,B)
C. QUEUE(A,B)
D. ARRAY(A,B)
Ans : A
A. bottow-up parser
B. top parser
C. top-down parser
D. bottom parser
Ans : C
A. System Unit
B. structural units.
C. data units
D. empirical units
Ans : B
A. Introduction
B. Analogy
C. Deduction
D. Memorization
Ans : A
A. Batch learning
B. Offline learning
C. Both A and B
D. None of the above
Ans : C
Ans : A
Ans : C
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Ans : D
Ans : A
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above
Ans : C
Ans : D
Ans : D
A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
Ans : C
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
Ans : D
Ans : A
Answer : A
Explanation: “Machine learning” is the autonomous
acquisition of knowledge through the use of
computer programs.
02. What is true about Machine Learning?
A. Machine Learning (ML) is the field of
computer science
B. ML is a type of artificial intelligence that
extract patterns out of raw data by using an
algorithm or method
C. The main focus of ML is to allow computer
systems learn from experience without being
explicitly programmed or human intervention
D. All of the above
Answer : D
Explanation: All the statements are true about
Machine Learning.
Answer : D
Explanation: Machine learning is a field of AI
consisting of learning algorithms that: Improve their
performance (P), At executing some task (T), Over
time with experience (E).
Answer : C
Explanation: Different learning methods in the ML
do not include Introdution.
Answer : B
Explanation: Random Forest
Answer : B
Explanation: Entropy is a measure of the
randomness in the information being processed So
the higher the entropy, the harder it is to draw any
conclusions from that information. Entropy is a
measure of disorder or purity or unpredictability or
uncertainty. So Low entropy means less uncertain
and high entropy means more uncertain.
Answer : A
Explanation: The following are various Machine
learning methods based on some broad categories:
Based on human supervision, Unsupervised
Learning, Semi-supervised Learning, and
Reinforcement Learning.
Answer : C
Explanation: In language understanding, the levels
of knowledge do not include empirical knowledge.
Answer : D
Explanation: Maximum possible different examples
are the products of the possible values of each
attribute and the number of classes so the result
would be
3 * 2 * 2 * 2 * 3 = 72
Answer : A
Explanation: First Normalize the data then PCA then
training.
Answer : D
Explanation: All of the above techniques are
different ways of imputing the missing or corrupted
data in a dataset.
Answer : D
Answer : B
Explanation: A model of language consists of
categories which does not include structural units.
Answer : D
Explanation: The density-based clustering methods
recognize clusters based on the density function
distribution of the data object. For clusters with
arbitrary shapes, these algorithms connect regions
with sufficiently high densities into clusters.
Answer : C
Explanation: Allowing a decision tree to split to a
granular degree makes decision trees prone to
learning every point extremely well to the point of
perfect classification that is overfitting.
Answer : C
Answer : A
Explanation: p → Øq is not a horn clause from the
above options.
18. Which of the following techniques can not be
used for normalization in text mining?
A. Stop Word Removal
B. Stemming
C. Lemmatization
D. None of the above
Answer : A
Explanation: Stop word removal is not but
Lemmatization and stemming are the techniques of
keyword normalization.
Answer : A
Explanation: Choose k to be the smallest value so
that at least 99% of the variance is retained and This
will maintain the structure of the data and also
reduce its dimension.
Answer: a
Explanation: Machine learning is the autonomous
acquisition of knowledge through the use of
computer programs.
2. Which of the factors affect the performance of
learner system does not include?
Answer: d
Explanation: Factors that affect the performance of
learner system does not include good data
structures.
Answer: c
Explanation: In language understanding, the levels
of knowledge that does not include empirical
knowledge.
Answer: d
Explanation: A model of language consists of the
categories which does not include structural units.
6. What is a top-down parser?
Answer: a
Explanation: A top-down parser begins by
hypothesizing a sentence (the symbol S) and
successively predicting lower level constituents until
individual preterminal symbols are written.
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
Answer: d
Explanation: p → Øq is not a horn clause.
Answer: d
Explanation: The action ‘STACK(A,B)’ of a robot arm
specify to Place block A on block B.
Module 01
3. p → 0q is not a?
A. hack clause
B. horn clause
C. structural clause
D. system clause
Answer : B
Explanation: p → 0q is not a horn clause.
Answer: a
Explanation: Machine learning is the autonomous
acquisition of knowledge through the use of
computer programs.
Module 02
a) or gate
b) and gate
c) nor gate
d) nand gate
Answer: c
Explanation: Form the truth table of above figure by
taking inputs as 0 or 1.
45. When both inputs are 1, what will be the output
of the above figure?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of nor gate.
46. When both inputs are different, what will be the
output of the above figure?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of nor gate.
47. Which of the following model has ability to
learn?
a) pitts model
b) rosenblatt perceptron model
c) both rosenblatt and pitts model
d) neither rosenblatt nor pitts
Answer: b
Explanation: Weights are fixed in pitts model but
adjustable in rosenblatt.
48. When both inputs are 1, what will be the output
of the pitts model nand gate ?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of simply a nand
gate.
49. When both inputs are different, what will be the
logical output of the figure of question 4?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of nor gate.
50. Does McCulloch-pitts model have ability of
learning?
a) yes
b) no
Answer: b
Explanation: Weights are fixed.
Module 03
A) Vertical offset
B) Perpendicular offset
C) Both, depending on the situation
D) None of above
answer: (A)
12) True- False: Overfitting is more likely when you
have huge amount of data to train?
A) TRUE
B) FALSE
answer: (B)
13) We can also compute the coefficient of linear
regression with the help of an analytical method
called “Normal Equation”. Which of the following
is/are true about Normal Equation?
We don’t have to choose the learning rate
It becomes slow when number of features is very
large
Thers is no need to iterate
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
answer: (D)
14) Which of the following statement is true about
sum of residuals of A and B?
Below graphs show two fitted regression lines (A &
B) on randomly generated data. Now, I want to find
the sum of residuals in both cases A and B.
Note:
Scale is same in both graphs for both axis.
X axis is independent variable and Y-axis is
dependent variable.
A) A has higher sum of residuals than B
B) A has lower sum of residual than B
C) Both have same sum of residuals
D) None of these
answer: (C)
15) Choose the option which describes bias in best
manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
answer: (B)
16) What will happen when you apply very large
penalty?
A) Some of the coefficient will become absolute
zero
B) Some of the coefficient will approach zero but
not absolute zero
C) Both A and B depending on the situation
D) None of these
answer: (B)
17) What will happen when you apply very large
penalty in case of Lasso?
A) Some of the coefficient will become zero
B) Some of the coefficient will be approaching to
zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
answer: (A)
18) Which of the following statement is true about
outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
answer: (A)
19) Suppose you plotted a scatter plot between the
residuals and predicted values in linear regression
and you found that there is a relationship between
them. Which of the following conclusion do you
make about this situation?
A) Since the there is a relationship means our model
is not good
B) Since the there is a relationship means our model
is good
C) Can’t say
D) None of these
answer: (A)
20) What will happen when you fit degree 4
polynomial in linear regression?
A) There are high chances that degree 4 polynomial
will over fit the data
B) There are high chances that degree 4 polynomial
will under fit the data
C) Can’t say
D) None of these
answer: (A)
21) What will happen when you fit degree 2
polynomial in linear regression?
A) It is high chances that degree 2 polynomial will
over fit the data
B) It is high chances that degree 2 polynomial will
under fit the data
C) Can’t say
D) None of these
answer: (B)
22) In terms of bias and variance. Which of the
following is true when you fit degree 2 polynomial?
A) Bias will be high, variance will be high
B) Bias will be low, variance will be high
C) Bias will be high, variance will be low
D) Bias will be low, variance will be low
answer: (C)
23) Suppose l1, l2 and l3 are the three learning rates
for A,B,C respectively. Which of the following is true
about l1,l2 and l3?
A) l2 < l1 < l3
B) l1 > l2 > l3
C) l1 = l2 = l3
D) None of these
answer: (A)
24) Now we increase the training set size gradually.
As the training set size increases, what do you
expect will happen with the mean training error?
A) Increase
B) Decrease
C) Remain constant
D) Can’t Say
answer: (D)
25) What do you expect will happen with bias and
variance as you increase the size of training data?
A) Bias increases and Variance increases
B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
E) Can’t Say False
answer: (D)
26) What would be the root mean square training
error for this data if you run a Linear Regression
model of the form (Y = A0+A1X)?
A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these
answer: (C)
Question Context 27-28:
Suppose you have been given the following scenario
for training and validation error for Linear
Regression.
Module 05
MORE MCQ
MORE MCQ
Question Context
A feature F1 can take certain value: A, B, C, D, E, & F
and represents grade of students from a college.
1) Which of the following statement is true in
following case?
A) Feature F1 is an example of nominal variable.
B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these
Solution: (B)
Ordinal variables are the variables which has some
order in their categories. For example, grade A
should be consider as high grade than grade B.
2) Which of the following is an example of a
deterministic algorithm?
A) PCA
B) K-Means
C) None of the above
Solution: (A)A deterministic algorithm is that in
which output does not change on different runs.
PCA would give the same result if we run again, but
not k-means.
A)
B)
C)
A) 1 is tanh, 2 is ReLU and 3 is SIGMOID activation
functions.
B) 1 is SIGMOID, 2 is ReLU and 3 is tanh activation
functions.
C) 1 is ReLU, 2 is tanh and 3 is SIGMOID activation
functions.
D) 1 is tanh, 2 is SIGMOID and 3 is ReLU activation
functions.
Solution: (D)
The range of SIGMOID function is [0,1].
The range of the tanh function is [-1,1].
The range of the RELU function is [0, infinity].
So Option D is the right answer.
8) Below are the 8 actual values of target variable in
the train file.
[0,0,0,1,1,1,1,1]
What is the entropy of the target variable?
A) -(5/8 log(5/8) + 3/8 log(3/8))
B) 5/8 log(5/8) + 3/8 log(3/8)
C) 3/8 log(5/8) + 5/8 log(3/8)
D) 5/8 log(3/8) – 3/8 log(5/8)
Solution: (A)The formula for entropy is
So the answer is A.
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3
Solution: (D)
Larger k value means less bias towards
overestimating the true expected error (as training
folds will be closer to the total dataset) and higher
running time (as you are getting closer to the limit
case: Leave-One-Out CV). We also need to consider
the variance between the k folds accuracy while
selecting the k.
1. 1<2<3<4
2. 1>2>3 > 4
3. 7<6<5<4
4. 7>6>5>4
A) 1 and 3
B) 2 and 3
C) 1 and 4
D) 2 and 4
Solution: (B)
from image 1to 4 correlation is decreasing (absolute
value). But from image 4 to 7 correlation is
increasing but values are negative (for example, 0, -
0.3, -0.7, -0.99).
30) You can evaluate the performance of a binary
class classification problem using different metrics
such as accuracy, log-loss, F-Score. Let’s say, you are
using the log-loss function as evaluation metric.
Which of the following option is / are true for
interpretation of log-loss as an evaluation metric?
1.
If a classifier is confident about an incorrect
classification, then log-loss will penalise it
heavily.
2. For a particular observation, the classifier
assigns a very small probability for the correct
class then the corresponding contribution to the
log-loss will be very large.
3. Lower the log-loss, the better is the model.
A) 1 and 3
B) 2 and 3
C) 1 and 2
D) 1,2 and 3
Solution: (D)Options are self-explanatory.
33) Suppose you are given the below data and you
want to apply a logistic regression model for
classifying it in two given classes.
You are using logistic regression with L1
regularization.
Where C is the
regularization parameter and w1 & w2 are the
coefficients of x1 and x2.
Which of the following option is correct when you
increase the value of C from zero to a very large
value?
A) First w2 becomes zero and then w1 becomes zero
B) First w1 becomes zero and then w2 becomes zero
C) Both becomes zero at the same time
D) Both cannot be zero even after very large value
of C
Solution: (B)
By looking at the image, we see that even on just
using x2, we can efficiently perform classification.
So at first w1 will become 0. As regularization
parameter increases more, w2 will come more and
more closer to 0.
34) Suppose we have a dataset which can be trained
with 100% accuracy with help of a decision tree of
depth 6. Now consider the points below and choose
the option based on these points.
Note: All other hyper parameters are same and
other factors are not affected.
1. Depth 4 will have high bias and low variance
2. Depth 4 will have low bias and low variance
A) Only 1
B) Only 2
C) Both 1 and 2
D) None of the above
Solution: (A)If you fit decision tree of depth 4 in
such data means it will more likely to underfit the
data. So, in case of underfitting you will have high
bias and low variance.
35) Which of the following options can be used to
get global minima in k-Means Algorithm?
1. Try to run algorithm for different centroid
initialization
2. Adjust number of iterations
3. Find out the optimal number of clusters
A) 2 and 3
B) 1 and 3
C) 1 and 2
D) All of above
Solution: (D)All of the option can be tuned to find
the global minima.
A) C1 = C2 = C3
B) C1 > C2 > C3
C) C1 < C2 < C3
D) None of these
Solution: (C)
Penalty parameter C of the error term. It also
controls the trade-off between smooth decision
boundary and classifying the training points
correctly. For large values of C, the optimization will
choose a smaller-margin hyperplane.
MCQ
• Regression
• Classification
• Decision Tree
• Random Forest
View Answer
Random Forest
• Regression
• Classification
• Random Forest
• Decision Tree
View Answer
Random Forest
• Confusion matrix
• Cost-sensitive accuracy
• Area under the ROC curve
• All of the above
View Answer
All of the above
8. Machine learning algorithms build a model based
on sample data, known as .................
• Training Data
• Transfer Data
• Data Training
• None of the above
View Answer
Training Data
• Deep Learning
• Artificial Intelligence
• Data Learining
• None of the above
View Answer
Artificial Intelligence
10. A Machine Learning technique that helps in
detecting the outliers in data.
• Clustering
• Classification
• Anamoly Detection
• All of the above
View Answer
Anamoly Detection
• Geoffrey Hill
• Geoffrey Chaucer
• Geoffrey Everest Hinton
• None of the above
View Answer
Geoffrey Everest Hinton
12. What is the most significant phase in a genetic
algorithm?
• Selection
• Mutation
• Crossover
• Fitness function
View Answer
Crossover
• Physics
• Information Theory
• Neurostatistics
• Optimization Control
View Answer
Neurostatistics
14. Machine Learning has various function
representation, which of the following is not
function of symbolic?
• Decision Trees
• Rules in propotional Logic
• Rules in first-order predicate logic
• Hidden-Markov Models (HMM)
View Answer
Hidden-Markov Models (HMM)
• Deep Learning
• Machine Learning
• Artificial Intelligence
• None of the above
View Answer
Machine Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• All of the above
View Answer
All of the above
• PCA
• Naive Bayesian
• Linear Regression
• Decision Tree Answer
View Answer
PCA
• Reinforcement Learning
• Supervised Learning: Classification
• Unsupervised Learning: Regression
• None of the above
View Answer
Reinforcement Learning
• Case-based
• Neural Network
• Linear Regression
• Support Vector Machines
View Answer
Case-based
• Clustering
• Regression
• Classification
• All of the above
View Answer
All of the above