RCS-080 Machine Learning MCQs
RCS-080 Machine Learning MCQs
MCQ Questions
2. Which of the factors affect the performance of learner system does not
include?
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
12. Supervised learning and unsupervised clustering both require at least one
A. hidden attribute.
B. output attribute.
C. input attribute.
D. categorical attribute.
14. A regression model in which more than one independent variable is used to
predict the dependent variable is called
A. a simple linear regression model
B. a multiple regression models
C. an independent model
D. none of the above
15. A term used to describe the case when the independent variables in a
multiple regression model are correlated is
A. regression
B. correlation
C. multicollinearity
D. none of the above
18. A measure of goodness of fit for the estimated regression equation is the
A. multiple coefficient of determination
B. mean square due to error
C. mean square due to regression
D. none of the above
25. Which statement is true about neural network and linear regression models?
A. Both models require input attributes to be numeric.
B. Both models require numeric attributes to range between 0 and 1.
C. The output of both models is a categorical attribute value.
D. Both techniques build models whose output is determined by a linear sum of
weighted input attribute values.
E. More than one of a,b,c or d is true.
27. The average positive difference between computed and desired outcome
values.
A. root mean squared error
B. mean squared error
C. mean absolute error
D. mean positive error
28. Selecting data so as to assure that each class is properly represented in both
the training and test set.
A. cross validation
B. stratification
C. verification
D. bootstrapping
29. The standard error is defined as the square root of this computation.
A. The sample variance divided by the total number of sample instances.
B. The population variance divided by the total number of sample instances.
C. The sample variance divided by the sample mean.
D. The population variance divided by the sample mean.
30. Data used to optimize the Data used to optimize the parameter settings of a s
parameter settings of a supervised learner model. unsupervised learner model.
A. training
B. test
C. verification
D. validation
32. The correlation between the number of years an employee has worked for a
company and the salary of the employee is 0.75. What can be said about
employee salary and years worked?
A. There is no relationship between salary and years worked.
B. Individuals that have worked for the company the longest have higher
salaries.
C. Individuals that have worked for the company the longest have lower
salaries.
D. The majority of employees have been with the company a long time.
E. The majority of employees have been with the company a short period of
time.
33. The correlation coefficient for two real-valued attributes is –0.85. What does
this value tell you?
A. The attributes are not linearly related.
B. As the value of one attribute increases the value of the second attribute also
increases.
C. As the value of one attribute decreases the value of the second attribute
increases.
D. The attributes show a curvilinear relationship.
34. The average squared difference between classifier predicted output and
actual output.
A. mean squared error
B. root mean squared error
C. mean absolute error
D. mean relative error
35. Simple regression assumes a _____ relationship between the input and
output attribute.
A. linear
B. quadratic
C. reciprocal
D. inverse
39. This technique associates a conditional probability value with each data
instance.
A. linear regression
B. logistic regression
C. simple regression
D. multiple linear regression
40. This supervised learning technique can process both numeric and
categorical input attributes.
A. linear regression
B. Bayes classifier
C. logistic regression
D. backpropagation learning
42. This clustering algorithm merges and splits nodes to help modify
nonoptimal partitions.
A. agglomerative clustering
B. expectation maximization
C. conceptual clustering
D. K-Means clustering
43. This clustering algorithm initially assumes that each data instance represents
a single cluster.
A. agglomerative clustering
B. conceptual clustering
C. K-Means clustering
D. expectation maximization
46. A _________ is a decision support tool that uses a tree-like graph or model
of decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
50. Choose from the following that are Decision Tree nodes
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
54. Following are the advantage/s of Decision Trees. Choose that apply.
a) Possible Scenarios can be added
b) Use a white box model, if given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned
57. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear
with the constant of proportionality being equal to 2. The inputs are 4, 10, 5 and
20 respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
Explanation: The output is found by multiplying the weights with their
respective inputs, summing the results and multiplying with the transfer
function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
58. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high
„computational‟ rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
59. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights
to be adjusted so that the network can learn
d) None of the mentioned
60. Which of the following is not the promise of artificial neural network?
a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
62. A perceptron adds up all the weighted inputs it receives, and if it exceeds a
certain value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can‟t say
63. The network that involves backward links from output to the input and
hidden layers is called as ____
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
65. The process by which you become aware of messages through your sense is
called
a) Organization
b) Sensation
c) Interpretation-Evaluation
d) Perception
72. Supervised learning and unsupervised clustering both require at both require
at least one least one
A. hidden attribute.
B. output attribute.
C. input attribute.
D. categorical attribute
77. The average positive difference between computed and desired outcome
values.
A.root mean squared error
B.mean squared error
C.mean absolute error
D.mean positive error
78. Selecting data so as to assure that each class is properly represented in both
the training and test set.
A.cross validation
B.stratification
C.verification
D. bootstrapping
79. The standard error is defined as the square root of this computat
computation.
A.The sample variance divided by the total number of sample instances.
B.The population variance divided by the total number of sample instances.
C.The sample variance divided by the sample mean.
D.The population variance divided by the sample mean.
80. Data used Data used to optimize to optimize the parameter settings of
parameter settings of a supervised learner supervised learner model.
A.training
B.test
C.verification
D.validation
82. The correlation coefficient for two real-valued attributes is – 0.85. What
does this value tell you?
A.The attributes are not linearly related.
B.As the value of one attribute increases the value of the second attribute also
increases.
C.As the value of one attribute decreases the value of the second attribute
increases.
D.The attributes show a curvilinear relationship.
83. The average squared difference between classifier predicted output and
actual output.
A.mean squared error
B.root mean squared error
C.mean absolute error
D.mean relative error
85. A statement about a population developed for the purpose of testing is rpose
of testing is called:
(a) Hypothesis
(b) Hypothesis testing
(c) Level of significance
(d) Test-statistic
86. Any hypothesis which is tested for the purpose of rejection under the
assumption that it is true is called:
(a) Null hypothesis
(b) Alternative hypothesis
(c) Statistical hypothesis
(d) Composite hypothesis
88. Any statement whose validity is tested on the basis of a sample is called:
(a) Null hypothesis
(b) Alternative hypothesis
(c) Statistical hypothesis
(d) Simple hypothesis
90. A statement that is accepted if the sample data provide sufficient evidence
that the null hypothesis is false is called:
(a) Simple hypothesis
(b) Composite hypothesis
(c) Statistical hypothesis
(d) Alternative hypothesis
93. The probability of rejecting the null hypothesis when it is true is called:
(a) Level of confidence
(b) Level of significance
(c) Power of the test
(d) Difficult to tell
94. The dividing point between the region where the null hypothesis is rejected
and the region where it is not rejected is said to be:
(a) Critical region
(b) Critical value
(c) Acceptance region
(d) Significant region
95. If the critical region is located equally in both sides of the sampling
distribution of test-statistic, the test is called:
(a) One tailed
(b) Two tailed
(c) Right tailed
(d) Left tailed
96. A rule or formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
(b) Population statistic
(c) Both of these
(d) None of the above
99. Suppose you are given an EM algorithm that finds maximum likelihood
estimates for a model with latent variables. You are asked to modify the
algorithm so that it finds MAP estimates instead. Which step or steps do you
need to modify:
A. Expectation
B. Maximization
C. No modification necessary
D. Both
100. A regression model in which more than one independent variable is used to
predict the dependentvariable is called
A.a simple linear regression model
B.a multiple regression models
C.an independent model
D.none of the above
101. A term used to describe the case when the independent variables in a
multiple regression model arecorrelated is
A.regression
B.correlation
C.multicollinearity
D.none of the above
102. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 + 4x2.
As x1 increases by 1 unit x1 increases by 1 unit(holding x2constant), y will
A.increase by 3 units
B.decrease by 3 units
C.increase by 4 units
D.decrease by 4 units
103. A multiple regression model has
A.only one independent variable
B.more than one dependent variable
C.more than one independent variable
D.none of the above
104. A measure of goodness of fit for the estimated regression equation is the
A.multiple coefficient of determination
B.mean square due to error
C.mean square due to regression
D.none of the above
107. Which statement is true about neural network and linear regression
models?
A.Both models require input attributes to be numeric.
B.Both models require numeric attributes to range between 0 and 1.
C.The output of both models is a categorical attribute value.
D.Both techniques build models whose output is determined by a linear sum of
weighted input attribute values.
E.More than one of a,b,c or d is true.
108. Simple regression assumes regression assumes a __________ relationship
between the input attribute and output attribute.
A.linear
B.quadratic
C.reciprocal
D.inverse
110. The leaf nodes of a The leaf nodes of a model tree are model tree are
A.averages of numeric output attribute values.
B.nonlinear regression equations.
C.linear regression equations.
D.sums of numeric output attribute values.
112. This technique associates a conditional probability value with value with
each data instance.
A.linear regression
B.logistic regression
C.simple regression
D.multiple linear regression
113. This supervised learning technique can process both numeric and
categorical input attributes.
A.linear regression
B.Bayes classifier
C.logistic regression
D. backpropagation learning
114. This clustering algorithm merges and splits nodes to help modify non-
optimal partitions.
A.agglomerative clustering
B.expectation maximization
C.conceptual clustering
D.K-Means clustering
115. This clustering algorithm initially assumes that each data instance
represents a single cluster.
A.agglomerative clustering
B.conceptual clustering
C.K-Means clustering
D.expectation maximization
118. We can get multiple local optimum solutions if we solve a linear regression
regression problem by minimizing the sum of squared errors using gradient
descent.
A. True
B. False
119. When the feature space is larger, over fitting is more likely.
A. True
B. False
121. As the number of training examples goes to infinity, your model trained on
that data will have:
A. Lower variance
B. Higher variance
C. Same variance
122. As the number of training examples goes to infinity, your model trained on
that data will have:
A. Lower bias
B. Higher bias
C. Same bias
123. What are the issues on which biological networks proves to be superior
than AI networks?
a) robustness & fault tolerance
b) flexibility
c) collective computation
d) all of the mentioned
134. What is the feature of ANNs due to which they can deal with noisy, fuzzy,
inconsistent data?
a) associative nature of networks
b) distributive nature of networks
c) both associative & distributive
d) none of the mentioned
135. Operations in the neural networks can perform what kind of operations?
a) serial
b) parallel
c) serial or parallel
d) none of the mentioned
139. The amount of output of one unit received by another unit depends on
what?
a) output unit
b) input unit
c) activation value
d) weight
165. If some of output patterns in pattern association problem are identical then
problem shifts to?
a) pattern storage problem
b) pattern classification problem
c) pattern mapping problem
d) none of the mentioned
168. In case of pattern by feedback nets in pattern recognition task, what is the
behaviour expected?
a) accretive
b) interpolative
c) can be either accretive or interpolative
d) none of the mentioned
176. Does pattern association involves non linear units in feedforward neural
network?
a) yes
b) no
178. What is the feature that doesn‟t belongs to pattern mapping in feeddorward
neural networks?
a) recall is direct
b) delta rule learning
c) non linear processing units
d) two layers
183. When two classes can be separated by a separate line, they are known as?
a) linearly separable
b) linearly inseparable classes
c) may be separable or inseparable, it depends on system
d) none of the mentioned
190. What are 3 basic types of neural nets that form basic functional units
among
i)feedforward ii) loop iii) recurrent iv) feedback v) combination of feed forward
& back
a) i, ii, iii
b) i, ii, iv
c) i, iv, v
d) i, iii, v
192. The backpropagation law is also known as generalized delta rule, is it true?
a) yes
b) no
198. What are the general tasks that are performed with backpropagation
algorithm?
a) pattern mapping
b) function approximation
c) prediction
d) all of the mentioned
201. How can false minima be reduced in case of error in recall in feedback
neural networks?
a) by providing additional units
b) by using probabilistic update
c) can be either probabilistic update or using additional units
d) none of the mentioned
205. If input is „ a(l) + e „ where „e‟ is the noise introduced, then what is the
output in case of autoassociative feedback network?
a) a(l)
b) a(l) + e
c) could be either a(l) or a(l) + e
d) e
206. If input is „ a(l) + e „ where „e‟ is the noise introduced, then what is the
output if system is accretive in nature?
a) a(l)
b) a(l) + e
c) could be either a(l) or a(l) + e
d) e
207. If input is „ a(l) + e „ where „e‟ is the noise introduced, then what is the
output if system is interpolative in nature?
a) a(l)
b) a(l) + e
c) could be either a(l) or a(l) + e
d) e
208. What property should a feedback network have, to make it useful for
storing information?
a) accretive behaviour
b) interpolative behaviour
c) both accretive and interpolative behaviour
d) none of the mentioned
212. What is the advantage of basis function over mutilayer feedforward neural
networks?
a) training of basis function is faster than MLFFNN
b) training of basis function is slower than MLFFNN
c) storing in basis function is faster than MLFFNN
d) none of the mentioned
214. Which application out of these of robots can be made of single layer
feedforward network?
a) wall climbing
b) rotating arm and legs
c) gesture control
d) wall following
219. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4
223. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
224. How the bayesian network can be used to answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned
227. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
228. What is the consequence between a node and its predecessors while
creating bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
229. k-NN algorithm does more computation on test time rather than train time.
A) TRUE
B) FALSE
230. Which of the following distance metric can not be used in k-NN?
A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used
1. k-NN performs much better if all of the data have the same scale
2. k-NN works well with a small number of input variables (p), but struggles
when the number of inputs is very large
3. k-NN makes no assumptions about the functional form of the problem being
solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
233. Which of the following machine learning algorithm can be used for
imputing missing values of both categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
1. Hamming Distance
2. Euclidean Distance
3. Manhattan Distance
A) 1
B) 2
C) 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
236. Which of the following will be Euclidean Distance between the two data
point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
sqrt( (1-2)^2 + (3-3)^2) = sqrt(1^2 + 0^2) = 1
237. Which of the following will be Manhattan Distance between the two data
point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 + 0) = 1
238. Which of the following will be true about k in k-NN in terms of Bias?
A) When you increase the k the bias will be increases
B) When you decrease the k the bias will be increases
C) Can‟t say
D) None of these
239. Which of the following will be true about k in k-NN in terms of variance?
A) When you increase the k the variance will increases
B) When you decrease the k the variance will increases
C) Can‟t say
D) None of these
240. When you find noise in data which of the following option would you
consider in k-NN?
A) I will increase the value of k
B) I will decrease the value of k
C) Noise can not be dependent on value of k
D) None of these
242. Below are two statements given. Which of the following will be true both
statements?
1. k-NN is a memory-based approach is that the classifier immediately adapts as
we collect new training data.
2. The computational complexity for classifying new samples grows linearly
with the number of samples in the training dataset in the worst-case scenario.
A) 1
B) 2
C) 1 and 2
D) None of these
243. Which of the following value of k in the following graph would you give
least leave one out cross validation accuracy?
A) 1
B) 2
C) 3
D) 5
244. A company has build a kNN classifier that gets 100% accuracy on training
data. When they deployed this model on client side it has been found that the
model is not at all accurate. Which of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at
client side except the model performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can‟t say
D) None of these
245. You have given the following 2 statements, find which of these option
is/are true in case of k-NN?
1. In case of very large value of k, we may include points from other classes
into the neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise
A) 1
B) 2
C) 1 and 2
D) None of these
246. Which of the following statements is true for k-NN classifiers?
248. In k-NN what will happen when you increase/decrease the value of k?
A) The boundary becomes smoother with increasing value of K
B) The boundary becomes smoother with decreasing value of K
C) Smoothness of boundary doesn‟t dependent on value of K
D) None of these
249. Following are the two statements given for k-NN algorthm, which of the
statement(s) is/are true?
1. We can choose optimal value of k with the help of cross validation
2. Euclidean distance treats each feature as equally important
A) 1
B) 2
C) 1 and 2
D) None of these
251. In reality, the null hypothesis may or may not be true, and a decision is
made to reject or not to reject it on the basis of the data obtained from a
sample.
a. True
b. False
252. The level of significance is the maximum probability of committing a type
II error.
a. True
b. False
254. Which value separates the critical region from the noncritical region in a
normal curve when testing the hypothesis?
computed value
a. t-value
b. z-value
c. critical value
261. What separates the critical region from the noncritical region?
a) critical value
b) computed value
c) z-score
264. If the test is two-tailed, the critical region, with an area equal to α, will be
on the left side of the mean.
a) True
b) False
265. A P-value indicates:
a) the probability that the null hypothesis is true
b) the probability of obtaining the results (or one more extreme) if the
null hypothesis is true
c) the probability that the alternative hypothesis is true
d) probability of a Type I error
273. All of the following are suitable problems for genetic algorithms EXCEPT
a) dynamic process control
b) pattern recognition with complex patterns
c) simulation of biological models
d) simple optimization with few variables
279. In which of the following learning the teacher returns reward and
punishment to learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
280. Decision trees are appropriate for the problems where ___________
a) Attributes are both numeric and nominal
b) Target function takes on a discrete number of values.
c) Data may have errors
d) All of the mentioned
284. Which of the following algorithm are not an example of ensemble learning
algorithm?
A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees
288. Which of the following statement(s) is / are true for Gradient Decent (GD)
and Stochastic Gradient Decent (SGD)?
1. In GD and SGD, you update a set of parameters in an iterative manner to
minimize the error function.
2. In SGD, you have to run through all the samples in your training set for a
single update of a parameter in each iteration.
3. In GD, you either use the entire data or a subset of training data to update
a parameter in each iteration.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
288. Which of the following hyper parameter(s), when increased may cause
random forest to over fit the data?
1. Number of Trees
2. Depth of Tree
3. Learning Rate
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
289. Which of the following options is/are true for K-fold cross-validation?
1. Increase in K will result in higher time required to cross validate the
result.
2. Higher values of K will result in higher confidence on the cross-
validation result as compared to lower value of K.
3. If K=N, then it is called Leave one out cross validation, where N is the
number of observations.
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3
290. In ______ learning we can say that the output depends on the state of the
current input and the next input depends on the output of the previous input.
a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning
291. In _______ learning decision is dependent, So we give labels to sequences
of dependent decisions.
a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning
294. The goal of ______ is to find the hypothesis that best fits the training
examples.
a) concept learning
b) genetic algorithm
c) nutation
d) none of the above
295. The ________ only considers the positive examples and eliminates
negative examples.
a) List-Then-Eliminate algorithm
b) Find-S algorithm
c) Candidate Elimination algorithm
d) None of the above
296. The _________ initializes the version space to contain all hypotheses in H,
then eliminates the hypotheses that are inconsistent, from training examples.
a) List-Then-Eliminate algorithm
b) Find-S algorithm
c) Candidate Elimination algorithm
d) None of the above
297. The ________ incrementally builds the version space given a hypothesis
space H and a set E of examples.
a) List-Then-Eliminate algorithm
b) Find-S algorithm
c) Candidate Elimination algorithm
d) None of the above
298. The _______ of a learning algorithm is the set of assumptions that the
learner uses to predict outputs of given inputs that it has not encountered.
a) inductive bias
b) learning bias
c) deductive bias
d) Both a & b
305. In the intermediate steps of EM algorithm, the number of each base in each
column is determined and then converted to fractions.
a) True
b) False