100% found this document useful (2 votes)
2K views319 pages

(Final) 600+ ML MCQ

The document provides a 600+ question machine learning multiple choice question (MCQ) quiz. It states that the questions were compiled from various online sources to make them easy to find in one place. It asks users to have respect for the effort that went into compiling the questions and notes that it is impossible to include every possible MCQ. It provides a sample of MCQs with answers.

Uploaded by

Asmit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views319 pages

(Final) 600+ ML MCQ

The document provides a 600+ question machine learning multiple choice question (MCQ) quiz. It states that the questions were compiled from various online sources to make them easy to find in one place. It asks users to have respect for the effort that went into compiling the questions and notes that it is impossible to include every possible MCQ. It provides a sample of MCQs with answers.

Uploaded by

Asmit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 319

For Revision Only

Do Not Misuse
Owner – Asmit

I tried to combine MCQs from different sources on internet in one place so that it'll be easy to find questions and
searching in a PDF is very efficient and easy.
If I'm sharing this PDF with you then instead of taking it for granted have some respect for someone's efforts.

I included almost each and every question that I was able to find on internet.
It is practically impossible that you'll get each and every MCQ that exist in world in the PDF cuz I'm not the one
making the questions,
and if you're intentionally making a manual question and spreading hate about me that you can't find a specific
question in the PDF, then I don't fucking care cuz you didn't ordered me to make the PDF, I made the PDF for myself.
PEACE

600+ Machine Learning MCQs

1. What is Machine Learning (ML)?


A. The autonomous acquisition of knowledge
through the use of manual programs
B. The selective acquisition of knowledge
through the use of computer programs
C. The selective acquisition of knowledge
through the use of manual programs
D. The autonomous acquisition of knowledge
through the use of computer programs
Correct option is D

2. Father of Machine Learning (ML)


A. Geoffrey Chaucer
B. Geoffrey Hill
C. Geoffrey Everest Hinton
D. None of the above
Correct option is C

3. Which is FALSE regarding regression?


A. It may be used for interpretation
B. It is used for prediction
C. It discovers causal relationships
D. It relates inputs to outputs
Correct option is C

4. Choose the correct option regarding machine


learning (ML) and artificial intelligence (AI)
A. ML is a set of techniques that turns a dataset
into a software
B. AI is a software that can emulate the human
mind
C. ML is an alternate way of programming
intelligent machines
D. All of the above
Correct option is D

5. Which of the factors affect the performance


of the learner system does not include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A

6. In general, to have a well-defined learning


problem, we must identity which of the
following
A. The class of tasks
B. The measure of performance to be improved
C. The source of experience
D. All of the above
Correct option is D
7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical
structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E

8. Which of the following does not include


different learning methods
A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B

9. In language understanding, the levels of


knowledge that does not include?
A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A

10. Designing a machine learning approach


involves:-
A. Choosing the type of training experience
B. Choosing the target function to be learned
C. Choosing a representation for the target
function
D. Choosing a function approximation
algorithm
E. All of the above
Correct option is E

11. Concept learning inferred a valued


function from training examples of its input and
output.
A. Decimal
B. Hexadecimal
C. Boolean
D. All of the above
Correct option is C

12. Which of the following is not a supervised


learning?
A. Naive Bayesian
B. PCA
C. Linear Regression
D. Decision Tree Answer
Correct option is B

13. What is Machine Learning?


• Artificial Intelligence
• Deep Learning
• Data Statistics
A. Only (i)
B. (i) and (ii)
C. All
D. None
Correct option is B

14. What kind of learning algorithm for “Facial


identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B

15. Which of the following is not type of


learning?
A. Unsupervised Learning
B. Supervised Learning
C. Semi-unsupervised Learning
D. Reinforcement Learning
Correct option is C
16. Real-Time decisions, Game AI, Learning
Tasks, Skill Aquisition, and Robot Navigation are
applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning
C. Unsupervised Learning: Clustering
D. Unsupervised Learning: Regression
Correct option is B

17. Targetted marketing, Recommended


Systems, and Customer Segmentation are
applications in which of the following
A. Supervised Learning: Classification
B. Unsupervised Learning: Clustering
C. Unsupervised Learning: Regression
D. Reinforcement Learning
Correct option is B
18. Fraud Detection, Image Classification,
Diagnostic, and Customer Retention are
applications in which of the following
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
C. Unsupervised Learning: Clustering
D. Reinforcement Learning
Correct option is B

19. Which of the following is not function of


symbolic in the various function representation
of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B
20. Which of the following is not numerical
functions in the various function representation
of Machine Learning?
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C

21. FIND-S Algorithm starts from the most


specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
D. None of the above
Correct option is B

22. FIND-S algorithm ignores


A. Negative
B. Positive
C. Both
D. None of the above
Correct option is A

23. The Candidate-Elimination Algorithm


represents the .
A. Solution Space
B. Version Space
C. Elimination Space
D. All of the above
Correct option is B

24. Inductive learning is based on the knowledge


that if something happens a lot it is likely to be
generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and
generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B

26. A drawback of the FIND-S is that it assumes


the consistency within the training set
A. True
B. False
Correct option is A

27. What strategies can help reduce overfitting


in decision trees?
• Enforce a maximum depth for the tree
• Enforce a minimum number of samples in leaf
nodes
• Pruning
• Make sure each leaf node is one pure class
A. All
B. (i), (ii) and (iii)
C. (i), (iii), (iv)
D. None
Correct option is B

28. Which of the following is a widely used and


effective machine learning algorithm based on
the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B
29. To find the minimum or the maximum of a
function, we set the gradient to zero because
which of the following
A. Depends on the type of problem
B. The value of the gradient at extrema of a
function is always zero
C. Both (A) and (B)
D. None of these
Correct option is B

30. Which of the following is a disadvantage of


decision trees?
A. Decision trees are prone to be overfit
B. Decision trees are robust to outliers
C. Factor analysis
D. None of the above
Correct option is A

31. What is perceptron?


A. A single layer feed-forward neural network
with pre-processing
B. A neural network that contains feedback
C. A double layer auto-associative neural
network
D. An auto-associative neural network
Correct option is A

32. Which of the following is true for neural


networks?
• The training time depends on the size of the
• Neural networks can be simulated on a
conventional
• Artificial neurons are identical in operation to
biological
A. All
B. Only (ii)
C. (i) and (ii)
D. None
Correct option is C
33. What are the advantages of neural networks
over conventional computers?
• They have the ability to learn by
• They are more fault
• They are more suited for real time operation
due to their high „computational‟
A. (i) and (ii)
B. (i) and (iii)
C. Only (i)
D. All
E. None
Correct option is D

34. What is Neuro software?


A. It is software used by Neurosurgeon
B. Designed to aid experts in real world
C. It is powerful and easy neural network
D. A software used to analyze neurons
Correct option is C
35. Which is true for neural networks?
A. Each node computes it‟s weighted input
B. Node could be in excited state or non-
excited state
C. It has set of nodes and connections
D. All of the above
Correct option is D

36. What is the objective of backpropagation


algorithm?
A. To develop learning algorithm for multilayer
feedforward neural network, so that network
can be trained to capture the mapping implicitly
B. To develop learning algorithm for multilayer
feedforward neural network
C. To develop learning algorithm for single
layer feedforward neural network
D. All of the above
Correct option is A
37. Which of the following is true?
Single layer associative neural networks do not have
the ability to:-
• Perform pattern recognition
• Find the parity of a picture
• Determine whether two or more shapes in a
picture are connected or not
A. (ii) and (iii)
B. Only (ii)
C. All
D. None
Correct option is A

38. The backpropagation law is also known as


generalized delta rule
A. True
B. False
Correct option is A
38. Which of the following is true?
• On average, neural networks have higher
computational rates than conventional
computers.
• Neural networks learn by
• Neural networks mimic the way the human
brain
A. All
B. (ii) and (iii)
C. (i), (ii) and (iii)
D. None
Correct option is A

39. What is true regarding backpropagation


rule?
A. Error in output is propagated backwards only
to determine weight updates
B. There is no feedback of signal at nay stage
C. It is also called generalized delta rule
D. All of the above
Correct option is D

40. There is feedback in final stage of


backpropagation
A. True
B. False
Correct option is B

41. An auto-associative network is


A. A neural network that has only one loop
B. A neural network that contains feedback
C. A single layer feed-forward neural network
with pre-processing
D. A neural network that contains no loops
Correct option is B

42. A 3-input neuron has weights 1, 4 and 3. The


transfer function is linear with the constant of
proportionality being equal to 3. The inputs are
4, 8 and 5 respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B

43. What of the following is true regarding


backpropagation rule?
A. Hidden layers output is not all important,
they are only meant for supporting input and
output layers
B. Actual output is determined by computing
the outputs of units for each hidden layer
C. It is a feedback neural network
D. None of the above
Correct option is B

44. What is back propagation?


A. It is another name given to the curvy
function in the perceptron
B. It is the transmission of error back through
the network to allow weights to be adjusted so
that the network can learn
C. It is another name given to the curvy
function in the perceptron
D. None of the above
Correct option is B

45. The general limitations of back propagation


rule is/are
A. Scaling
B. Slow convergence
C. Local minima problem
D. All of the above
Correct option is D

46. What is the meaning of generalized in


statement “backpropagation is a generalized
delta rule” ?
A. Because delta is applied to only input and
output layers, thus making it more simple and
generalized
B. It has no significance
C. Because delta rule can be extended to
hidden layer units
D. None of the above
Correct option is C

47. Neural Networks are complex functions


with many parameter
A. Linear
B. Non linear
C. Discreate
D. Exponential
Correct option is A

48. The general tasks that are performed with


backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D

49. Backpropagaion learning is based on the


gradient descent along error surface.
A. True
B. False
Correct option is A

50. In backpropagation rule, how to stop the


learning process?
A. No heuristic criteria exist
B. On basis of average gradient value
C. There is convergence involved
D. None of these
Correct option is B

51. Applications of NN (Neural Network)


A. Risk management
B. Data validation
C. Sales forecasting
D. All of the above
Correct option is D

52. The network that involves backward links


from output to the input and hidden layers is
known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A

53. Decision Tree is a display of an Algorithm?


A. True
B. False
Correct option is A
54. Which of the following is/are the decision
tree nodes?
A. End Nodes
B. Decision Nodes
C. Chance Nodes
D. All of the above
Correct option is D

55. End Nodes are represented by which of the


following
A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is B

56. Decision Nodes are represented by which of


the following
A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is D

57. Chance Nodes are represented by which of


the following
A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is C

58. Advantage of Decision Trees


A. Possible Scenarios can be added
B. Use a white box model, if given result is
provided by a model
C. Worst, best and expected values can be
determined for different scenarios
D. All of the above
Correct option is D

59. terms are required for building a bayes


model.
A. 1
B. 2
C. 3
D. 4
Correct option is C

60. Which of the following is the consequence


between a node and its predecessors while
creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A
61. Why it is needed to make probabilistic
systems feasible in the world?
A. Feasibility
B. Reliability
C. Crucial robustness
D. None of the above
Correct option is C

62. Bayes rule can be used for:-


A. Solving queries
B. Increasing complexity
C. Answering probabilistic query
D. Decreasing complexity
Correct option is C

63. provides way and means of weighing


up the desirability of goals and the likelihood of
achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A

64. Which of the following provided by the


Bayesian Network?
A. Complete description of the problem
B. Partial description of the domain
C. Complete description of the domain
D. All of the above
Correct option is C

65. Probability provides a way of summarizing


the that comes from our laziness and
A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B
66. The entries in the full joint probability
distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C

67. Causal chain (For example, Smoking cause


cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
D. None of the above
Correct option is A

68. The bayesian network can be used to answer


any query by using:-
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the above
Correct option is B

69. Bayesian networks allow compact


specification of:-
A. Joint probability distributions
B. Belief
C. Propositional logic statements
D. All of the above
Correct option is A

70. The compactness of the bayesian network


can be described by
A. Fully structured
B. Locally structured
C. Partially structured
D. All of the above
Correct option is B
71. The Expectation-Maximization Algorithm has
been used to identify conserved domains in
unaligned proteins only. State True or False.
A. True
B. False
Correct option is B

72. Which of the following is correct about the


Naive Bayes?
A. Assumes that all the features in a dataset are
independent
B. Assumes that all the features in a dataset are
equally important
C. Both
D. All of the above
Correct option is C

73. Which of the following is false regarding EM


Algorithm?
A. The alignment provides an estimate of the
base or amino acid composition of each column
in the site
B. The column-by-column composition of the
site already available is used to estimate the
probability of finding the site at any position in
each of the sequences
C. The row-by-column composition of the site
already available is used to estimate the
probability
D. None of the above
Correct option is C

74. Naïve Bayes Algorithm is a learning


algorithm.
A. Supervised
B. Reinforcement
C. Unsupervised
D. None of these
Correct option is A
75. EM algorithm includes two repeated steps,
here the step 2 is .
A. The normalization
B. The maximization step
C. The minimization step
D. None of the above
Correct option is C

76. Examples of Naïve Bayes Algorithm is/are


A. Spam filtration
B. Sentimental analysis
C. Classifying articles
D. All of the above
Correct option is D

77. In the intermediate steps of “EM Algorithm”,


the number of each base in each column is
determined and then converted to
A. True
B. False
Correct option is A

78. Naïve Bayes algorithm is based on and used


for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
D. None of the above
Correct option is A

79. Types of Naïve Bayes Model:


A. Gaussian
B. Multinomial
C. Bernoulli
D. All of the above
Correct option is D

80. Disadvantages of Naïve Bayes Classifier:


A. Naive Bayes assumes that all features are
independent or unrelated, so it cannot learn the
relationship between
B. It performs well in Multi-class predictions as
compared to the other
C. Naïve Bayes is one of the fast and easy ML
algorithms to predict a class of
D. It is the most popular choice for text
classification problems.
Correct option is A

81. The benefit of Naïve Bayes:-


A. Naïve Bayes is one of the fast and easy ML
algorithms to predict a class of
B. It is the most popular choice for text
classification problems.
C. It can be used for Binary as well as Multi-
class
D. All of the above
Correct option is D
82. In which of the following types of sampling
the information is carried out under the opinion
of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B

83. Full form of MDL?


A. Minimum Description Length
B. Maximum Description Length
C. Minimum Domain Length
D. None of these
Correct option is A

84. For the analysis of ML algorithms, we need


A. Computational learning theory
B. Statistical learning theory
C. Both A & B
D. None of these
Correct option is C

85. PAC stand for


A. Probably Approximate Correct
B. Probably Approx Correct
C. Probably Approximate Computation
D. Probably Approx Computation
Correct option is A

86. hypothesis h with respect to target


concept c and distribution D , is the probability that
h will misclassify an instance drawn at random
according to D.
A. True Error
B. Type 1 Error
C. Type 2 Error
D. None of these
Correct option is A
87. Statement: True error defined over entire
instance space, not just training data
A. True
B. False
Correct option is A

88. What are the area CLT comprised of?


A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. All of these
Correct option is D

88. What area of CLT tells “How many examples


we need to find a good hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is A

89. What area of CLT tells “How much


computational power we need to find a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is B

90. What area of CLT tells “How many mistakes


we will make before finding a good hypothesis
?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is C
91. (For question no. 9 and 10) Can we say that
concept described by conjunctions of Boolean
literals are PAC learnable?
A. Yes
B. No
Correct option is A

92. How large is the hypothesis space when we


have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n
D. |H| = 4n
Correct option is A

93. The VC dimension of hypothesis space H1 is


larger than the VC dimension of hypothesis
space H2. Which of the following can be inferred
from this?
A. The number of examples required for
learning a hypothesis in H1 is larger than the
number of examples required for H2
B. The number of examples required for
learning a hypothesis in H1 is smaller than the
number of examples required for
C. No relation to number of samples required
for PAC learning.
Correct option is A

94. For a particular learning task, if the


requirement of error parameter changes from
0.1 to 0.01. How many more samples will be
required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D
95. Computational complexity of classes of
learning problems depends on which of the
following?
A. The size or complexity of the hypothesis
space considered by learner
B. The accuracy to which the target concept
must be approximated
C. The probability that the learner will output a
successful hypothesis
D. All of these
Correct option is D

96. The instance-based learner is a


A. Lazy-learner
B. Eager learner
C. Can‟t say
Correct option is A

97. When to consider nearest neighbour


algorithms?
A. Instance map to point in kn
B. Not more than 20 attributes per instance
C. Lots of training data
D. None of these
E. A, B & C
Correct option is E

98. What are the advantages of Nearest


neighbour alogo?
A. Training is very fast
B. Can learn complex target functions
C. Don‟t lose information
D. All of these
Correct option is D

99. What are the difficulties with k-nearest


neighbour algo?
A. Calculate the distance of the test case from
all training cases
B. Curse of dimensionality
C. Both A & B
D. None of these
Correct option is C

100. What if the target function is real valued in


kNN algo?
A. Calculate the mean of the k nearest
neighbours
B. Calculate the SD of the k nearest neighbour
C. None of these
Correct option is A

101. What is/are true about Distance-weighted


KNN?
A. The weight of the neighbour is considered
B. The distance of the neighbour is considered
C. Both A & B
D. None of these
Correct option is C
102. What is/are advantage(s) of Distance-
weighted k-NN over k-NN?
A. Robust to noisy training data
B. Quite effective when a sufficient large set of
training data is provided
C. Both A & B
D. None of these
Correct option is C

103. What is/are advantage(s) of Locally


Weighted Regression?
A. Pointwise approximation of complex target
function
B. Earlier data has no influence on the new
ones
C. Both A & B
D. None of these
Correct option is C
104. The quality of the result depends on (LWR)
A. Choice of the function
B. Choice of the kernel function K
C. Choice of the hypothesis space H
D. All of these
Correct option is D

105. How many types of layer in radial basis


function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and
Output layer

106. The neurons in the hidden layer contains


Gaussian transfer function whose output
are to the distance from the
centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B

107. PNN/GRNN networks have one neuron for


each point in the training file, While RBF
network have a variable number of neurons that
is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A

108. Which network is more accurate when the


size of training set between small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A

109. What is/are true about RBF network?


A. A kind of supervised learning
B. Design of NN as curve fitting problem
C. Use of multidimensional surface to
interpolate the test data
D. All of these
Correct option is D

110. Application of CBR


A. Design
B. Planning
C. Diagnosis
D. All of these
Correct option is A
111. What is/are advantages of CBR?
A. A local approx. is found for each test case
B. Knowledge is in a form understandable to
human
C. Fast to train
D. All of these
Correct option is D

112 In k-NN algorithm, given a set of training


examples and the value of k < size of training set (n),
the algorithm predicts the class of a test example to
be the. What is/are advantages of CBR?
A. Least frequent class among the classes of k
closest training
B. Most frequent class among the classes of k
closest training
C. Class of the closest
D. Most frequent class among the classes of the
k farthest training examples.
Correct option is B
113. Which of the following statements is true
about PCA?
• We must standardize the data before applying
• We should select the principal components
which explain the highest variance
• We should select the principal components
which explain the lowest variance
• We can use PCA for visualizing the data in lower
dimensions
A. (i), (ii) and (iv).
B. (ii) and (iv)
C. (iii) and (iv)
D. (i) and (iii)
Correct option is A

114. Genetic algorithm is a


A. Search technique used in computing to find
true or approximate solution to optimization
and search problem
B. Sorting technique used in computing to find
true or approximate solution to optimization
and sort problem
C. Both A & B
D. None of these
Correct option is A

115. GA techniques are inspired by


A. Evolutionary
B. Cytology
C. Anatomy
D. Ecology
Correct option is A

116. When would the genetic algorithm


terminate?
A. Maximum number of generations has been
produced
B. Satisfactory fitness level has been reached
for the
C. Both A & B
D. None of these
Correct option is C

117. The algorithm operates by iteratively


updating a pool of hypotheses, called the
A. Population
B. Fitness
C. None of these
Correct option is A

118. What is the correct representation of GA?


A. GA(Fitness, Fitness_threshold, p)
B. GA(Fitness, Fitness_threshold, p, r )
C. GA(Fitness, Fitness_threshold, p, r, m)
D. GA(Fitness, Fitness_threshold)
Correct option is C
119. Genetic operators includes
A. Crossover
B. Mutation
C. Both A & B
D. None of these
Correct option is C

120. Produces two new offspring from two parent


string by copying selected bits from each parent
is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C

121. Each schema the set of bit strings containing


the indicated as
A. 0s, 1s
B. only 0s
C. only 1s
D. 0s, 1s, *s
Correct option is D

122. 0*10 represents the set of bit strings that


includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A

123. Correct ( h ) is the percent of all training


examples correctly classified by hypothesis then
Fitness function is equal to
A. Fitness ( h) = (correct ( h)) 2
B. Fitness ( h) = (correct ( h)) 3
C. Fitness ( h) = (correct ( h))
D. Fitness ( h) = (correct ( h)) 4
Correct option is A
124. Statement: Genetic Programming individuals
in the evolving population are computer
programs rather than bit
A. True
B. False
Correct option is A

125. evolution over many generations


was directly influenced by the experiences of
individual organisms during their lifetime
A. Baldwin
B. Lamarckian
C. Bayes
D. None of these
Correct option is B

126. Search through the hypothesis space cannot


be characterized. Why?
A. Hypotheses are created by crossover and
mutation operators that allow radical changes
between successive generations
B. Hypotheses are not created by crossover and
mutation
C. None of these
Correct option is A

127. ILP stand for


A. Inductive Logical programming
B. Inductive Logic Programming
C. Inductive Logical Program
D. Inductive Logic Program
Correct option is B

128. What is/are the requirement for the Learn-


One-Rule method?
A. Input, accepts a set of +ve and -ve training
examples.
B. Output, delivers a single rule that covers
many +ve examples and few -ve.
C. Output rule has a high accuracy but not
necessarily a high
D. A&B
E. A, B & C
Correct option is E

129. is any predicate (or its negation)


applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A

130. Ground literal is a literal that


A. Contains only variables
B. does not contains any functions
C. does not contains any variables
D. Contains only functions Answer
Correct option is C

131. emphasizes learning feedback


that evaluates the learner’s performance
without providing standards of correctness in
the form of behavioural
A. Reinforcement learning
B. Supervised Learning
C. None of these
Correct option is A

132. Features of Reinforcement learning


A. Set of problem rather than set of techniques
B. RL is training by reward and
C. RL is learning from trial and error with the
D. All of these
Correct option is D

133. Which type of feedback used by RL?


A. Purely Instructive feedback
B. Purely Evaluative feedback
C. Both A & B
D. None of these
Correct option is B

134. What is/are the problem solving methods for


RL?
A. Dynamic programming
B. Monte Carlo Methods
C. Temporal-difference learning
D. All of these
Correct option is D

135. The FIND-S Algorithm


A. Starts with starts from the most specific
hypothesis Answer
B. It considers negative examples
C. It considers both negative and positive
D. None of these Correct
136. The hypothesis space has a general-to-specific
ordering of hypotheses, and the search can be
efficiently organized by taking advantage of a
naturally occurring structure over the hypothesis
space
1.
A. TRUE
B. FALSE
Correct option is A

137. The Version space is:


A. The subset of all hypotheses is called the
version space with respect to the hypothesis
space H and the training examples D, because it
contains all plausible versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A
138. The Candidate-Elimination Algorithm
A. The key idea in the Candidate-
Elimination algorithm is to output a
description of the set of all hypotheses
consistent with the training
B. Candidate-Elimination algorithm
computes the description of this set without
explicitly enumerating all of its
C. This is accomplished by using the more-
general-than partial ordering and
maintaining a compact representation of the
set of consistent
D. All of these
Correct option is D

139. Concept learning is basically acquiring the


definition of a general category from given
sample positive and negative training examples
of the
A. TRUE
B. FALSE
Correct option is A

140. The hypothesis h1 is more-general-than


hypothesis h2 ( h1 > h2) if and only if h1≥h2 is
true and h2≥h1 is false. We also say h2 is more-
specific-than h1
A. The statement is true
B. The statement is false
C. We cannot
D. None of these
Correct option is A

141. The List-Then-Eliminate Algorithm


A. The List-Then-Eliminate algorithm
initializes the version space to contain all
hypotheses in H, then eliminates any
hypothesis found inconsistent with any
training
B. The List-Then-Eliminate algorithm not
initializes to the version
C. None of these Answer
Correct option is A

142. What will take place as the agent observes


its interactions with the world?
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A

143. Which modifies the performance element so


that it makes better decision?Performance
element
A. Performance element
B. Changing element
C. Learning element
D. None of the mentioned
Correct option is C
144. Any hypothesis found to approximate the
target function well over a sufficiently large set
of training examples will also approximate the
target function well over other unobserved
example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these
Correct option is A

145. Feature of ANN in which ANN creates its own


organization or representation of information it
receives during learning time is
A. Adaptive Learning
B. Self Organization
C. What-If Analysis
D. Supervised Learning
Correct option is B
146. How the decision tree reaches its decision?
A. Single test
B. Two test
C. Sequence of test
D. No test
Correct option is C

147. Which of the following is a disadvantage of


decision trees?
• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
• None of the above
Correct option is C

148. Tree/Rule based classification algorithms


generate which rule to perform the
classification.
A. if-then.
B. then
C. do
D. Answer
Correct option is A

149. What is Gini Index?


A. It is a type of index structure
B. It is a measure of purity
C. None of the options
Correct option is A

150. What is not a RNN in machine learning?


A. One output to many inputs
B. Many inputs to a single output
C. RNNs for nonsequential input
D. Many inputs to many outputs
Correct option is A

151. Which of the following sentences are correct


in reference to Information gain?
A. It is biased towards multi-valued
attributes
B. ID3 makes use of information gain
C. The approach used by ID3 is greedy
D. All of these
Correct option is D

152. A Neural Network can answer


A. For Loop questions
B. what-if questions
C. IF-The-Else Analysis Questions
D. None of these Answer
Correct option is B

153. Artificial neural network used for


A. Pattern Recognition
B. Classification
C. Clustering
D. All Answer
Correct option is D

154. Which of the following are the advantage/s


of Decision Trees?
A. Possible Scenarios can be added
B. Use a white box model, If given result is
provided by a model
C. Worst, best and expected values can be
determined for different scenarios
D. All of the mentioned
Correct option is D

155. What is the mathematical likelihood that


something will occur?
A. Classification
B. Probability
C. Naïve Bayes Classifier
D. None of the other
Correct option is C
A. What does the Bayesian network provides?
B. Complete description of the domain
C. Partial description of the domain
D. Complete description of the problem
E. None of the mentioned
Correct option is C

157. Where does the Bayes rule can be used?


A. Solving queries
B. Increasing complexity
C. Decreasing complexity
D. Answering probabilistic query
Correct option is D

158. How many terms are required for building a


Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B

159. What is needed to make probabilistic


systems feasible in the world?
A. Reliability
B. Crucial robustness
C. Feasibility
D. None of the mentioned
Correct option is B

160. It was shown that the Naive Bayesian


method
A. Can be much more accurate than the
optimal Bayesian method
B. Is always worse off than the optimal
Bayesian method
C. Can be almost optimal only when
attributes are independent
D. Can be almost optimal when some
attributes are dependent
Correct option is C

161. What is the consequence between a node


and its predecessors while creating Bayesian
network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant &
Dependant
Correct option is C

162. How the compactness of the Bayesian


network can be described?
A. Locally structured
B. Fully structured
C. Partial structure
D. All of the mentioned
Correct option is A

163. How the entries in the full joint probability


distribution can be calculated?
A. Using variables
B. Using information
C. Both Using variables & information
D. None of the mentioned
Correct option is B

164. How the Bayesian network can be used to


answer any query?
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the mentioned
Correct option is B

165. Sample Complexity is


A. The sample complexity is the number of
training-samples that we need to supply to
the algorithm, so that the function returned
by the algorithm is within an arbitrarily small
error of the best possible function, with
probability arbitrarily close to 1
B. How many training examples are needed
for learner to converge to a successful
hypothesis.
C. All of these
Correct option is C

166. PAC stands for


A. Probability Approximately Correct
B. Probability Applied Correctly
C. Partition Approximately Correct
Correct option is A

167. Which of the following will be true about k in


k-NN in terms of variance
A. When you increase the k the variance
will increases
B. When you decrease the k the variance
will increases
C. Can’t say
D. None of these
Correct option is B

168. Which of the following option is true about


k-NN algorithm?
A. It can be used for classification
B. It can be used for regression
C. It can be used in both classification and
regression Answer
Correct option is C

169. In k-NN it is very likely to overfit due to the


curse of dimensionality. Which of the following
option would you consider to handle such
problem? 1). Dimensionality Reduction 2).
Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C

170. When you find noise in data which of the


following option would you consider in k- NN
A. I will increase the value of k
B. I will decrease the value of k
C. Noise can not be dependent on value of k
D. None of these
Correct option is A

171. Which of the following will be true about k in


k-NN in terms of Bias?
A. When you increase the k the bias will be
increases
B. When you decrease the k the bias will be
increases
C. Can‟t say
D. None of these
Correct option is A

172. What is used to mitigate overfitting in a test


set?
A. Overfitting set
B. Training set
C. Validation dataset
D. Evaluation set
Correct option is C

173. A radial basis function is a


A. Activation function
B. Weight
C. Learning rate
D. none
Correct option is A
174. Mistake Bound is
A. How many training examples are needed for
learner to converge to a successful hypothesis.
B. How much computational effort is needed
for a learner to converge to a successful
hypothesis
C. How many training examples will the learner
misclassify before conversing to a successful
hypothesis
D. None of these
Correct option is C

175. All of the following are suitable problems for


genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex
patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D
176. Adding more basis functions in a linear
model… (Pick the most probably option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A

177. Which of these are types of crossover


A. Single point
B. Two point
C. Uniform
D. All of these
Correct option is D

178. A feature F1 can take certain value: A, B, C,


D, E, & F and represents grade of students from
a college. Which of the following statement is
true in following case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above
category.
Correct option is B

179. You observe the following while fitting a


linear regression to the data: As you increase
the amount of training data, the test error
decreases and the training error increases. The
train error is quite low (almost what you expect
it to), while the test error is much higher than
the train error. What do you think is the main
reason behind this behaviour? Choose the most
probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C
180. Genetic algorithms are heuristic methods
that do not guarantee an optimal solution to a
problem
A. TRUE
B. FALSE
Correct option is A

181. Which of the following statements about


regularization is not correct?
A. Using too large a value of lambda can
cause your hypothesis to underfit the
B. Using too large a value of lambda can
cause your hypothesis to overfit the
C. Using a very large value of lambda
cannot hurt the performance of your
hypothesis.
D. None of the above
Correct option is A
182. Consider the following: (a) Evolution (b)
Selection (c) Reproduction (d) Mutation Which
of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A

183. Genetic Algorithm are a part of


A. Evolutionary Computing
B. inspired by Darwin’s theory about
evolution – “survival of the fittest”
C. are adaptive heuristic search algorithm
based on the evolutionary ideas of natural
selection and genetics
D. All of the above
Correct option is D
184. Genetic algorithms belong to the family of
methods in the
A. artificial intelligence area
B. optimization
C. complete enumeration family of
methods
D. Non-computer based (human) solutions
area
Correct option is A

185. For a two player chess game, the


environment encompasses the opponent
A. True
B. False
Correct option is A

186. Which among the following is not a


necessary feature of a reinforcement learning
solution to a learning problem?
A. exploration versus exploitation dilemma
B. trial and error approach to learning
C. learning based on rewards
D. representation of the problem as a
Markov Decision Process
Correct option is D

187. Which of the following sentence is FALSE


regarding reinforcement learning
A. It relates inputs to
B. It is used for
C. It may be used for
D. It discovers causal relationships.
Correct option is D

188. The EM algorithm is guaranteed to never


decrease the value of its objective function on
any iteration
A. TRUE
B. FALSE Answer
Correct option is A
189. Consider the following modification to the
tic-tac-toe game: at the end of game, a coin is
tossed and the agent wins if a head appears
regardless of whatever has happened in the
game.Can reinforcement learning be used to
learn an optimal policy of playing Tic-Tac-Toe in
this case?
A. Yes
B. No
Correct option is B

190. Out of the two repeated steps in EM


algorithm, the step 2 is _
A. the maximization step
B. the minimization step
C. the optimization step
D. the normalization step
Correct option is A
191. Suppose the reinforcement learning player
was greedy, that is, it always played the move
that brought it to the position that it rated the
best. Might it learn to play better, or worse,
than a non greedy player?
A. Worse
B. Better
Correct option is B

192. A chess agent trained by using


Reinforcement Learning can be trained by
playing against a copy of the same
A. True
B. False
Correct option is A

193. The EM iteration alternates between


performing an expectation (E) step, which
creates a function for the expectation of the log-
likelihood evaluated using the current estimate
for the parameters, and a maximization (M)
step, which computes parameters maximizing
the expected log-likelihood found on the E
A. TRUE
B. FALSE
Correct option is A

194. Expectation–maximization (EM) algorithm is


an
A. Iterative
B. Incremental
C. None
Correct option is A

195. Feature need to be identified by using Well


Posed Learning Problem:
A. Class of tasks
B. Performance measure
C. Training experience
D. All of these
Correct option is D
196. A computer program that learns to play
checkers might improve its performance as:
A. Measured by its ability to win at the class
of tasks involving playing checkers
B. Experience obtained by playing games
against
C. Both a & b
D. None of these
Correct option is C

197. Learning symbolic representations of


concepts known as:
A. Artificial Intelligence
B. Machine Learning
C. Both a & b
D. None of these
Correct option is A
198. The field of study that gives computers the
capability to learn without being explicitly
programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning
D. Both a & b
Correct option is A

199. The autonomous acquisition of knowledge


through the use of computer programs is
called
A. Artificial Intelligence
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B

200. Learning that enables massive quantities of


data is known as
A. Artificial Intelligence
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B

201. A different learning method does not include


A. Memorization
B. Analogy
C. Deduction
D. Introduction
Correct option is D

202. Types of learning used in machine


A. Supervised
B. Unsupervised
C. Reinforcement
D. All of these
Correct option is D
203. A computer program is said to learn from
experience E with respect to some class of tasks
T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience
A. Supervised learning problem
B. Un Supervised learning problem
C. Well posed learning problem
D. All of these
Correct option is C

204. Which of the following is a widely used and


effective machine learning algorithm based on
the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D
205. How many types are available in machine
learning?
A. 1
B. 2
C. 3
D. 4
Correct option is C

205. A model can learn based on the rewards it


received for its previous action is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C

206. A subset of machine learning that involves


systems that think and learn like humans using
artificial neural networks.
A. Artificial Intelligence
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C

207. A learning method in which a training data


contains a small amount of labeled data and a
large amount of unlabeled data is known
as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
Correct option is C

208. Methods used for the calibration in


Supervised Learning
A. Platt Calibration
B. Isotonic Regression
C. All of these
D. None of above
Correct option is C

209. The basic design issues for designing a


learning
A. Choosing the Training Experience
B. Choosing the Target Function
C. Choosing a Function Approximation
Algorithm
D. Estimating Training Values
E. All of these
Correct option is E

210. In Machine learning the module that must


solve the given performance task is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is C

211. A learning method that is used to solve a


particular computational program, multiple
models such as classifiers or experts are
strategically generated and combined is called
as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
E. Ensemble learning
Correct option is E

212. In a learning system the component that


takes as takes input the current hypothesis
(currently learned function) and outputs a new
problem for the Performance System to explore.
A. Critic
B. Generalizer
C. Performance system
D. Experiment generator
E. All of these
Correct option is D

213. Learning method that is used to improve the


classification, prediction, function
approximation etc of a model
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
E. Ensemble learning
Correct option is E

214. In a learning system the component that


takes as input the history or trace of the game
and produces as output a set of training
examples of the target function is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is A

215. The most common issue when using ML is


A. Lack of skilled resources
B. Inadequate Infrastructure
C. Poor Data Quality
D. None of these
Correct option is C

216. How to ensure that your model is not over


fitting
A. Cross validation
B. Regularization
C. All of these
D. None of these
Correct option is C
217. A way to ensemble multiple classifications or
regression
A. Stacking
B. Bagging
C. Blending
D. Boosting
Correct option is A

218. How well a model is going to generalize in


new environment is known as
A. Data Quality
B. Transparent
C. Implementation
D. None of these
Correct option is B

219. Common classes of problems in machine


learning is
A. Classification
B. Clustering
C. Regression
D. All of these
Correct option is D

220. Which of the following is a widely used and


effective machine learning algorithm based on
the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D

221. Cost complexity pruning algorithm is used


in?
A. CART
B. 5
C. ID3
D. All of
Correct option is A
222. Which one of these is not a tree based
learner?
A. CART
B. 5
C. ID3
D. Bayesian Classifier
Correct option is D

223. Which one of these is a tree based learner?


A. Rule based
B. Bayesian Belief Network
C. Bayesian classifier
D. Random Forest
Correct option is D

224. What is the approach of basic algorithm for


decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A

225. Which of the following classifications would


best suit the student performance classification
systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A

226. What are two steps of tree pruning work?


A. Pessimistic pruning and Optimistic
pruning
B. Post pruning and Pre pruning
C. Cost complexity pruning and time
complexity pruning
D. None of these
Correct option is B

227. How will you counter over-fitting in decision


tree?
A. By pruning the longer rules
B. By creating new rules
C. Both By pruning the longer rules‟ and „
By creating new rules‟
D. None of Answer
Correct option is A

228. Which of the following sentences are true?


A. In pre-pruning a tree is ‘pruned’ by
halting its construction early
B. A pruning set of class labeled tuples is
used to estimate cost
C. The best pruned tree is the one that
minimizes the number of encoding
D. All of these
Correct option is D
229. Which of the following is a disadvantage of
decision trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be over fit
D. None of the above
Correct option is C

230. In which of the following scenario a gain


ratio is preferred over Information Gain?
A. When a categorical variable has very
large number of category
B. When a categorical variable has very
small number of category
C. Number of categories is the not the
reason
D. None of these
Correct option is A
231. Major pruning techniques used in decision
tree are
A. Minimum error
B. Smallest tree
C. Both a & b
D. None of these
Correct option is B

232. What does the central limit theorem state?


A. If the sample size increases sampling
distribution must approach normal
distribution
B. If the sample size decreases then the
sample distribution must approach normal
distribution.
C. If the sample size increases then the
sampling distributions much approach an
exponential
D. If the sample size decreases then the
sampling distributions much approach an
exponential
Correct option is A

233. The difference between the sample value


expected and the estimates value of the
parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A

234. In which of the following types of sampling


the information is carried out under the opinion
of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D
235. Which of the following is a subset of
population?
A. Distribution
B. Sample
C. Data
D. Set
Correct option is B

236. The sampling error is defined as?


A. Difference between population and
parameter
B. Difference between sample and
parameter
C. Difference between population and
sample
D. Difference between parameter and
sample
Correct option is C
237. Machine learning is interested in the best
hypothesis h from some space H, given observed
training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis
C. Most specific hypothesis
D. None of these
Correct option is B

238. Practical difficulties with Bayesian Learning :


A. Initial knowledge of many probabilities is
required
B. No consistent hypothesis
C. Hypotheses make probabilistic
predictions
D. None of these
Correct option is A

239. Bayes’ theorem states that the relationship


between the probability of the hypothesis
before getting the evidence P(H) and the
probability of the hypothesis after getting the
evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A

240. A doctor knows that Cold causes fever 50%


of the time. Prior probability of any patient
having cold is 1/50,000. Prior probability of any
patient having fever is 1/20. If a patient has
fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C
241. Which of the following will be true about k in
K-Nearest Neighbor in terms of Bias?
A. When you increase the k the bias will be
increases
B. When you decrease the k the bias will be
increases
C. Can‟t say
D. None of these
Correct option is A

242. When you find noise in data which of the


following option would you consider in K-
Nearest Neighbor?
A. I will increase the value of k
B. I will decrease the value of k
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A
243. In K-Nearest Neighbor it is very likely to
overfit due to the curse of dimensionality.
Which of the following option would you
consider to handle such problem?
• Dimensionality Reduction
• Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C

244. Radial basis functions is closely related to


distance-weighted regression, but it is
A. lazy learning
B. eager learning
C. concept learning
D. none of these
Correct option is B
245. Radial basis function networks provide a
global approximation to the target function,
represented by of many local kernel
function.
A. a series combination
B. a linear combination
C. a parallel combination
D. a non linear combination
Correct option is B

246. The most significant phase in a genetic


algorithm is
A. Crossover
B. Mutation
C. Selection
D. Fitness function
Correct option is A
247. The crossover operator produces two new
offspring from
A. Two parent strings, by copying selected
bits from each parent
B. One parent strings, by copying selected
bits from selected parent
C. Two parent strings, by copying selected
bits from one parent
D. None of these
Correct option is A

248. Mathematically characterize the evolution


over time of the population within a GA based
on the concept of
A. Schema
B. Crossover
C. Don‟t care
D. Fitness function
Correct option is A
249. In genetic algorithm process of selecting
parents which mate and recombine to create
off-springs for the next generation is known as:
A. Tournament selection
B. Rank selection
C. Fitness sharing
D. Parent selection
Correct option is D

250. Crossover operations are performed in


genetic programming by replacing
A. Randomly chosen sub tree of one parent
program by a sub tree from the other parent
program.
B. Randomly chosen root node tree of one
parent program by a sub tree from the other
parent program
C. Randomly chosen root node tree of one
parent program by a root node tree from the
other parent program
D. None of these
Correct option is A

MORE MCQ

1. What is true about Machine Learning?

A. Machine Learning (ML) is that field of computer


science
B. ML is a type of artificial intelligence that extract
patterns out of raw data by using an algorithm or
method.
C. The main focus of ML is to allow computer
systems learn from experience without being
explicitly programmed or human intervention.
D. All of the above

Ans : D

Explanation: All statement are true about Machine


Learning.
2. ML is a field of AI consisting of learning
algorithms that?

A. Improve their performance


B. At executing some task
C. Over time with experience
D. All of the above

Ans : D

Explanation: ML is a field of AI consisting of learning


algorithms that : Improve their performance (P), At
executing some task (T), Over time with experience
(E).

3. p → 0q is not a?

A. hack clause
B. horn clause
C. structural clause
D. system clause

Ans : B

Explanation: p → 0q is not a horn clause.

4. The action _______ of a robot arm specify to


Place block A on block B.

A. STACK(A,B)
B. LIST(A,B)
C. QUEUE(A,B)
D. ARRAY(A,B)

Ans : A

Explanation: The action 'STACK(A,B)' of a robot arm


specify to Place block A on block B.
5. A__________ begins by hypothesizing a sentence
(the symbol S) and successively predicting lower
level constituents until individual preterminal
symbols are written.

A. bottow-up parser
B. top parser
C. top-down parser
D. bottom parser

Ans : C

Explanation: A top-down parser begins by


hypothesizing a sentence (the symbol S) and
successively predicting lower level constituents until
individual preterminal symbols are written.
6. A model of language consists of the categories
which does not include ________.

A. System Unit
B. structural units.
C. data units
D. empirical units

Ans : B

Explanation: A model of language consists of the


categories which does not include structural units.

7. Different learning methods does not include?

A. Introduction
B. Analogy
C. Deduction
D. Memorization
Ans : A

Explanation: Different learning methods does not


include the introduction.

8. The model will be trained with data in one single


batch is known as ?

A. Batch learning
B. Offline learning
C. Both A and B
D. None of the above

Ans : C

Explanation: we have end-to-end Machine Learning


systems in which we need to train the model in one
go by using whole available training data. Such kind
of learning method or algorithm is called Batch or
Offline learning.
9. Which of the following are ML methods?

A. based on human supervision


B. supervised Learning
C. semi-reinforcement Learning
D. All of the above

Ans : A

Explanation: The following are various ML methods


based on some broad categories : Based on human
supervision, Unsupervised Learning, Semi-
supervised Learning and Reinforcement Learning

10. In Model based learning methods, an iterative


process takes place on the ML models that are built
based on various model parameters, called ?
A. mini-batches
B. optimizedparameters
C. hyperparameters
D. superparameters

Ans : C

Explanation: In Model based learning methods, an


iterative process takes place on the ML models that
are built based on various model parameters, called
hyperparameters.

11. Which of the following is a widely used and


effective machine learning algorithm based on the
idea of bagging?

A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Ans : D

Explanation: The Radom Forest algorithm builds an


ensemble of Decision Trees, mostly trained with the
bagging method.

12. To find the minimum or the maximum of a


function, we set the gradient to zero because:

A. The value of the gradient at extrema of a function


is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above

Ans : A

Explanation: The gradient of a multivariable


function at a maximum point will be the zero vector
of the function, which is the single greatest value
that the function can achieve.
13. Which of the following is a disadvantage of
decision trees?

A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above

Ans : C

Explanation: Allowing a decision tree to split to a


granular degree makes decision trees prone to
learning every point extremely well to the point of
perfect classification that is overfitting.

14. How do you handle missing or corrupted data in


a dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above

Ans : D

Explanation: All of the above techniques are


different ways of imputing the missing values.

15. When performing regression or classification,


which of the following is the correct way to
preprocess the data?

A. Normalize the data -> PCA -> training


B. PCA -> normalize PCA output -> training
C. Normalize the data -> PCA -> normalize PCA
output -> training
D. None of the above
Ans : A

Explanation: You need to always normalize the data


first. If not, PCA or other techniques that are used to
reduce dimensions will give different results.

16. Which of the following statements about


regularization is not correct?

A. Using too large a value of lambda can cause your


hypothesis to underfit the data.
B. Using too large a value of lambda can cause your
hypothesis to overfit the data
C. Using a very large value of lambda cannot hurt
the performance of your hypothesis.
D. None of the above

Ans : D

Explanation: A large value results in a large


regularization penalty and therefore, a strong
preference for simpler models, which can underfit
the data.

17. Which of the following techniques can not be


used for normalization in text mining?

A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above

Ans : C

Explanation: Lemmatization and stemming are the


techniques of keyword normalization.

18. In which of the following cases will K-means


clustering fail to give good results?
1) Data points with outliers
2) Data points with different densities
3) Data points with nonconvex shapes

A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above

Ans : D

Explanation: K-means clustering algorithm fails to


give good results when the data contains outliers,
the density spread of data points across the data
space is different, and the data points follow
nonconvex shapes.

19. Which of the following is a reasonable way to


select the number of principal components "k"?

A. Choose k to be the smallest value so that at least


99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to
the nearest integer).
C. Choose k to be the largest value so that 99% of
the variance is retained.
D. Use the elbow method.

Ans : A

Explanation: This will maintain the structure of the


data and also reduce its dimension.

20. What is a sentence parser typically used for?

A. It is used to parse sentences to check if they are


utf-8 compliant.
B. It is used to parse sentences to derive their most
likely syntax tree structures.
C. It is used to parse sentences to assign POS tags to
all tokens.
D. It is used to check if sentences can be parsed into
meaningful tokens.
Ans : B

Explanation: Sentence parsers analyze a sentence


and automatically build a syntax tree.

01. What is Machine learning?


A. The autonomous acquisition of knowledge
through the use of computer programs
B. The autonomous acquisition of knowledge
through the use of manual programs
C. The selective acquisition of knowledge
through the use of computer programs
D. The selective acquisition of knowledge
through the use of manual programs

Answer : A
Explanation: “Machine learning” is the autonomous
acquisition of knowledge through the use of
computer programs.
02. What is true about Machine Learning?
A. Machine Learning (ML) is the field of
computer science
B. ML is a type of artificial intelligence that
extract patterns out of raw data by using an
algorithm or method
C. The main focus of ML is to allow computer
systems learn from experience without being
explicitly programmed or human intervention
D. All of the above

Answer : D
Explanation: All the statements are true about
Machine Learning.

03. ML is a field of AI consisting of learning


algorithms that?
A. Improve their performance
B. At executing some task
C. Over time with experience
D. All of the above

Answer : D
Explanation: Machine learning is a field of AI
consisting of learning algorithms that: Improve their
performance (P), At executing some task (T), Over
time with experience (E).

04. Different learning methods do not include?


A. Memorization
B. Analogy
C. Introduction
D. Deduction

Answer : C
Explanation: Different learning methods in the ML
do not include Introdution.

05. Which of the following is a widely used and


effective machine learning algorithm based on the
idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification

Answer : B
Explanation: Random Forest

06. High entropy means that the partitions in


classification are
A. pure
B. not pure
C. useful
D. useless

Answer : B
Explanation: Entropy is a measure of the
randomness in the information being processed So
the higher the entropy, the harder it is to draw any
conclusions from that information. Entropy is a
measure of disorder or purity or unpredictability or
uncertainty. So Low entropy means less uncertain
and high entropy means more uncertain.

07. Which of the following are ML methods?


A. Based on human supervision
B. Supervised Learning
C. Semi-reinforcement Learning
D. All of the above

Answer : A
Explanation: The following are various Machine
learning methods based on some broad categories:
Based on human supervision, Unsupervised
Learning, Semi-supervised Learning, and
Reinforcement Learning.

08. In language understanding, the levels of


knowledge do not include?
A. Phonological
B. Syntactic
C. Empirical
D. Logical

Answer : C
Explanation: In language understanding, the levels
of knowledge do not include empirical knowledge.

09. A machine learning problem involves four


attributes plus a class. The attributes have 3, 2, 2,
and 2 possible values each. The class has 3 possible
values. How many maximum possible different
examples are there?
A. 12
B. 24
C. 48
D. 72

Answer : D
Explanation: Maximum possible different examples
are the products of the possible values of each
attribute and the number of classes so the result
would be
3 * 2 * 2 * 2 * 3 = 72

10. When performing regression or classification,


which of the following is the correct way to
preprocess the data?
A. Normalize the data → PCA → training
B. PCA → normalize PCA output → training
C. Normalize the data → PCA → normalize PCA
output → training
D. None of the above

Answer : A
Explanation: First Normalize the data then PCA then
training.

11. How do you handle missing or corrupted data in


a dataset?
A. Drop missing rows or columns
B. Replace missing values with
mean/median/mode
C. Assign a unique category to missing values
D. All of the above

Answer : D
Explanation: All of the above techniques are
different ways of imputing the missing or corrupted
data in a dataset.

12. The most widely used metrics and tools to


assess a classification model are:
A. Confusion matrix
B. Cost-sensitive accuracy
C. Area under the ROC curve
D. All of the above

Answer : D

13. A model of language consists of the categories


which do not include?
A. Language units
B. Structural units
C. Role structure of units
D. System constraints

Answer : B
Explanation: A model of language consists of
categories which does not include structural units.

14. Suppose we would like to perform clustering on


spatial data such as the geometrical locations of
houses. We wish to produce clusters of many
different sizes and shapes. Which of the following
methods is the most appropriate?
A. Decision Trees
B. Model-based clustering
C. K-means clustering
D. Density-based clustering

Answer : D
Explanation: The density-based clustering methods
recognize clusters based on the density function
distribution of the data object. For clusters with
arbitrary shapes, these algorithms connect regions
with sufficiently high densities into clusters.

15. Which of the following is a disadvantage of


decision trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above

Answer : C
Explanation: Allowing a decision tree to split to a
granular degree makes decision trees prone to
learning every point extremely well to the point of
perfect classification that is overfitting.

16. Which of the following is true about Naive


Bayes?
A. Assumes that all the features in a dataset are
equally important
B. Assumes that all the features in a dataset are
independent
C. Both A and B
D. None of the above options

Answer : C

17. Among the following which is not a horn clause?


A. p → Øq
B. p
C. p→q
D. Øp V q

Answer : A
Explanation: p → Øq is not a horn clause from the
above options.
18. Which of the following techniques can not be
used for normalization in text mining?
A. Stop Word Removal
B. Stemming
C. Lemmatization
D. None of the above

Answer : A
Explanation: Stop word removal is not but
Lemmatization and stemming are the techniques of
keyword normalization.

19. Which of the following is a reasonable way to


select the number of principal components “k”?
A. Choose k to be the smallest value so that at
least 99% of the varinace is retained
B. Use the elbow method
C. Choose k to be 99% of m (k = 0.99*m,
rounded to the nearest integer)
D. Choose k to be the largest value so that 99%
of the variance is retained

Answer : A
Explanation: Choose k to be the smallest value so
that at least 99% of the variance is retained and This
will maintain the structure of the data and also
reduce its dimension.

20. In which of the following cases will K-means


clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with nonconvex shapes
A. 1&2
B. 1, 2, & 3
C. 2&3
D. 1&3
Answer : B
Explanation: K-means clustering algorithm of
Machine Learning fails to give good results when
the data contains outliers, the density spread of
data points across the data space is different, and
when the data points with nonconvex shapes.

1. What is Machine learning?

a) The autonomous acquisition of knowledge


through the use of computer programs
b) The autonomous acquisition of knowledge
through the use of manual programs
c) The selective acquisition of knowledge through
the use of computer programs
d) The selective acquisition of knowledge through
the use of manual programs

Answer: a
Explanation: Machine learning is the autonomous
acquisition of knowledge through the use of
computer programs.
2. Which of the factors affect the performance of
learner system does not include?

a) Representation scheme used


b) Training scenario
c) Type of feedback
d) Good data structures

Answer: d
Explanation: Factors that affect the performance of
learner system does not include good data
structures.

3. Different learning methods does not include?


a) Memorization
b) Analogy
c) Deduction
d) Introduction
Answer: d
Explanation: Different learning methods does not
include the introduction.
4. In language understanding, the levels of
knowledge that does not include?
a) Phonological
b) Syntactic
c) Empirical
d) Logical

Answer: c
Explanation: In language understanding, the levels
of knowledge that does not include empirical
knowledge.

5. A model of language consists of the categories


which does not include?
a) Language units
b) Role structure of units
c) System constraints
d) Structural units

Answer: d
Explanation: A model of language consists of the
categories which does not include structural units.
6. What is a top-down parser?

a) Begins by hypothesizing a sentence (the symbol


S) and successively predicting lower level
constituents until individual preterminal symbols
are written
b) Begins by hypothesizing a sentence (the symbol
S) and successively predicting upper level
constituents until individual preterminal symbols
are written
c) Begins by hypothesizing lower level constituents
and successively predicting a sentence (the symbol
S)
d) Begins by hypothesizing upper level constituents
and successively predicting a sentence (the symbol
S)

Answer: a
Explanation: A top-down parser begins by
hypothesizing a sentence (the symbol S) and
successively predicting lower level constituents until
individual preterminal symbols are written.
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq

Answer: d
Explanation: p → Øq is not a horn clause.

8. The action ‘STACK(A, B)’ of a robot arm specify to


_______________
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B

Answer: d
Explanation: The action ‘STACK(A,B)’ of a robot arm
specify to Place block A on block B.
Module 01

1. What is true about Machine Learning?


A. Machine Learning (ML) is that field of computer
science
B. ML is a type of artificial intelligence that extract
patterns out of raw data by using an algorithm or
method.
C. The main focus of ML is to allow computer
systems learn from experience without being
explicitly programmed or human intervention.
D. All of the above
Answer : D
Explanation: All statement are true about Machine
Learning.

2. ML is a field of AI consisting of learning


algorithms that?
A. Improve their performance
B. At executing some task
C. Over time with experience
D. All of the above
Answer : D
Explanation: ML is a field of AI consisting of learning
algorithms that : Improve their performance (P), At
executing some task (T), Over time with experience
(E).

3. p → 0q is not a?
A. hack clause
B. horn clause
C. structural clause
D. system clause
Answer : B
Explanation: p → 0q is not a horn clause.

4. The action _______ of a robot arm specify to


Place block A on block B.
A. STACK(A,B)
B. LIST(A,B)
C. QUEUE(A,B)
D. ARRAY(A,B)
Answer : A
Explanation: The action ‘STACK(A,B)’ of a robot arm
specify to Place block A on block B.
5. A__________ begins by hypothesizing a sentence
(the symbol S) and successively predicting lower
level constituents until individual preterminal
symbols are written.
A. bottow-up parser
B. top parser
C. top-down parser
D. bottom parser
Answer : C
Explanation: A top-down parser begins by
hypothesizing a sentence (the symbol S) and
successively predicting lower level constituents until
individual preterminal symbols are written.

6. A model of language consists of the categories


which does not include ________.
A. System Unit
B. structural units.
C. data units
D. empirical units
Answer : B
Explanation: A model of language consists of the
categories which does not include structural units.

7. Different learning methods does not include?


A. Introduction
B. Analogy
C. Deduction
D. Memorization
Answer : A
Explanation: Different learning methods does not
include the introduction.

8. The model will be trained with data in one single


batch is known as ?
A. Batch learning
B. Offline learning
C. Both A and B
D. None of the above
Ans : C
Explanation: we have end-to-end Machine Learning
systems in which we need to train the model in one
go by using whole available training data. Such kind
of learning method or algorithm is called Batch or
Offline learning.

9. Which of the following are ML methods?


A. based on human supervision
B. supervised Learning
C. semi-reinforcement Learning
D. All of the above
Ans : A
Explanation: The following are various ML methods
based on some broad categories : Based on human
supervision, Unsupervised Learning, Semi-
supervised Learning and Reinforcement Learning

10. In Model based learning methods, an iterative


process takes place on the ML models that are built
based on various model parameters, called ?
A. mini-batches
B. optimizedparameters
C. hyperparameters
D. superparameters
Answer : C
Explanation: In Model based learning methods, an
iterative process takes place on the ML models that
are built based on various model parameters, called
hyperparameters.

11. Which of the following is a widely used and


effective machine learning algorithm based on the
idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Answer : D
Explanation: The Radom Forest algorithm builds an
ensemble of Decision Trees, mostly trained with the
bagging method.

12. To find the minimum or the maximum of a


function, we set the gradient to zero because:
A. The value of the gradient at extrema of a function
is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
Answer : A
Explanation: The gradient of a multivariable
function at a maximum point will be the zero vector
of the function, which is the single greatest value
that the function can achieve.

13. Which of the following is a disadvantage of


decision trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above
Answer : C
Explanation: Allowing a decision tree to split to a
granular degree makes decision trees prone to
learning every point extremely well to the point of
perfect classification that is overfitting.

14. How do you handle missing or corrupted data in


a dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above
Answer : D
Explanation: All of the above techniques are
different ways of imputing the missing values.

15. When performing regression or classification,


which of the following is the correct way to
preprocess the data?
A. Normalize the data -> PCA -> training
B. PCA -> normalize PCA output -> training
C. Normalize the data -> PCA -> normalize PCA
output -> training
D. None of the above
Answer : A
Explanation: You need to always normalize the data
first. If not, PCA or other techniques that are used to
reduce dimensions will give different results.

16. Which of the following statements about


regularization is not correct?
A. Using too large a value of lambda can cause your
hypothesis to underfit the data.
B. Using too large a value of lambda can cause your
hypothesis to overfit the data
C. Using a very large value of lambda cannot hurt
the performance of your hypothesis.
D. None of the above
Answer : D
Explanation: A large value results in a large
regularization penalty and therefore, a strong
preference for simpler models, which can underfit
the data.

17. Which of the following techniques can not be


used for normalization in text mining?
A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
Answer : C
Explanation: Lemmatization and stemming are the
techniques of keyword normalization.

18. In which of the following cases will K-means


clustering fail to give good results?
1) Data points with outliers
2) Data points with different densities
3) Data points with nonconvex shapes
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
Answer : D
Explanation: K-means clustering algorithm fails to
give good results when the data contains outliers,
the density spread of data points across the data
space is different, and the data points follow
nonconvex shapes.

19. Which of the following is a reasonable way to


select the number of principal components “k”?
A. Choose k to be the smallest value so that at least
99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to
the nearest integer).
C. Choose k to be the largest value so that 99% of
the variance is retained.
D. Use the elbow method.
Answer : A
Explanation: This will maintain the structure of the
data and also reduce its dimension.

20. What is a sentence parser typically used for?


A. It is used to parse sentences to check if they are
utf-8 compliant.
B. It is used to parse sentences to derive their most
likely syntax tree structures.
C. It is used to parse sentences to assign POS tags to
all tokens.
D. It is used to check if sentences can be parsed into
meaningful tokens.
Answer : B
Explanation: Sentence parsers analyze a sentence
and automatically build a syntax tree.

21. Which of the following is a widely used and


effective machine learning algorithm based on the
idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Answer : D

22. To find the minimum or the maximum of a


function, we set the gradient to zero because:
A. The value of the gradient at extrema of a function
is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
Answer : A

23.The most widely used metrics and tools to assess


a classification model are:
A. Confusion matrix
B. Cost-sensitive accuracy
C. Area under the ROC curve
D. All of the above
Answer : D

24. Which of the following is a good test dataset


characteristic?
A. Large enough to yield meaningful results
B. Is representative of the dataset as a whole
C. Both A and B
D. None of the above
Answer : C

25. Which of the following is a disadvantage of


decision trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above
Answer : C

26. How do you handle missing or corrupted data in


a dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above
Answer : D
27. What is the purpose of performing cross-
validation?
A. To assess the predictive performance of the
models
B. To judge how the trained model performs outside
the sample on test data
C. Both A and B
Answer : C

28. Why is second order differencing in time series


needed?
A. To remove stationarity
B. To find the maxima or minima at the local point
C. Both A and B
D. None of the above
Answer : C

29. When performing regression or classification,


which of the following is the correct way to
preprocess the data?
A. Normalize the data → PCA → training
B. PCA → normalize PCA output → training
C. Normalize the data → PCA → normalize PCA
output → training
D. None of the above
Answer : A

30. Which of the following is an example of feature


extraction?
A. Constructing bag of words vector from an email
B. Applying PCA projects to a large high-dimensional
data
C. Removing stopwords in a sentence
D. All of the above
Answer : D

31. What is pca.components_ in Sklearn?


A. Set of all eigen vectors for the projection space
B. Matrix of principal components
C. Result of the multiplication matrix
D. None of the above options
Answer : A
32. Which of the following is true about Naive Bayes
?
A. Assumes that all the features in a dataset are
equally important
B. Assumes that all the features in a dataset are
independent
C. Both A and B
D. None of the above options
Answer : C

33. Which of the following statements about


regularization is not correct?
A. Using too large a value of lambda can cause your
hypothesis to underfit the data.
B. Using too large a value of lambda can cause your
hypothesis to overfit the data.
C. Using a very large value of lambda cannot hurt
the performance of your hypothesis.
D. None of the above
Answer : D
34. How can you prevent a clustering algorithm
from getting stuck in bad local optima?
A. Set the same seed value for each run
B. Use multiple random initializations
C. Both A and B
D. None of the above
Answer : B

35. Which of the following techniques can be used


for normalization in text mining?
A. Stemming
B. Lemmatization
C. Stop Word Removal
D. Both A and B
Answer : D

36. In which of the following cases will K-means


clustering fail to give good results? 1) Data points
with outliers 2) Data points with different densities
3) Data points with nonconvex shapes
A. 1 and 2
B. 2 and 3
C. 1, 2, and 3
D. 1 and 3
Answer : C

37. Which of the following is a reasonable way to


select the number of principal components “k”?
A. Choose k to be the smallest value so that at least
99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to
the nearest integer).
C. Choose k to be the largest value so that 99% of
the variance is retained.
D. Use the elbow method
Answer : A

38. You run gradient descent for 15 iterations with


a=0.3 and compute J(theta) after each iteration. You
find that the value of J(Theta) decreases quickly and
then levels off. Based on this, which of the following
conclusions seems most plausible?
A. Rather than using the current value of a, use a
larger value of a (say a=1.0)
B. Rather than using the current value of a, use a
smaller value of a (say a=0.1)
C. a=0.3 is an effective choice of learning rate
D. None of the above
Answer : C

39. What is a sentence parser typically used for?


A. It is used to parse sentences to check if they are
utf-8 compliant.
B. It is used to parse sentences to derive their most
likely syntax tree structures.
C. It is used to parse sentences to assign POS tags to
all tokens.
D. It is used to check if sentences can be parsed into
meaningful tokens.
Answer : B

40. Suppose you have trained a logistic regression


classifier and it outputs a new example x with a
prediction ho(x) = 0.2. This means
A. Our estimate for P(y=1 | x)
B. Our estimate for P(y=0 | x)
C. Our estimate for P(y=1 | x)
D. Our estimate for P(y=0 | x)
Answer : B
41. What is Machine learning?
a) The autonomous acquisition of knowledge
through the use of computer programs
b) The autonomous acquisition of knowledge
through the use of manual programs
c) The selective acquisition of knowledge through
the use of computer programs
d) The selective acquisition of knowledge through
the use of manual programs

Answer: a
Explanation: Machine learning is the autonomous
acquisition of knowledge through the use of
computer programs.

42. Which of the factors affect the performance of


learner system does not include?
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
Answer: d
Explanation: Factors that affect the performance of
learner system does not include good data
structures.

43. Different learning methods does not include?


a) Memorization
b) Analogy
c) Deduction
d) Introduction
Answer: d
Explanation: Different learning methods does not
include the introduction.

44. In language understanding, the levels of


knowledge that does not include?
a) Phonological
b) Syntactic
c) Empirical
d) Logical
Answer: c
Explanation: In language understanding, the levels
of knowledge that does not include empirical
knowledge.
45. A model of language consists of the categories
which does not include?
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
Answer: d
Explanation: A model of language consists of the
categories which does not include structural units.

46. What is a top-down parser?


a) Begins by hypothesizing a sentence (the symbol
S) and successively predicting lower level
constituents until individual preterminal symbols
are written
b) Begins by hypothesizing a sentence (the symbol
S) and successively predicting upper level
constituents until individual preterminal symbols
are written
c) Begins by hypothesizing lower level constituents
and successively predicting a sentence (the symbol
S)
d) Begins by hypothesizing upper level constituents
and successively predicting a sentence (the symbol
S)
Answer: a
Explanation: A top-down parser begins by
hypothesizing a sentence (the symbol S) and
successively predicting lower level constituents until
individual preterminal symbols are written.

47. Among the following which is not a horn clause?


a) p
b) Øp V q
c) p → q
d) p → Øq
Answer: d
Explanation: p → Øq is not a horn clause.

48. The action ‘STACK(A, B)’ of a robot arm specify


to _______________
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
Answer: d
Explanation: The action ‘STACK(A,B)’ of a robot arm
specify to Place block A on block B.

Module 02

1. Why do we need biological neural networks?


a) to solve tasks like machine vision & natural
language processing
b) to apply heuristic search methods to find
solutions of problem
c) to make smart human interactive & user friendly
system
d) all of the mentioned
Answer: d
Explanation: These are the basic aims that a neural
network achieve.
2. What is the trend in software nowadays?
a) to bring computer more & more closer to user
b) to solve complex problems
c) to be task specific
d) to be versatile
Answer: a
Explanation: Software should be more interactive to
the user, so that it can understand its problem in a
better fashion.
3. What’s the main point of difference between
human & machine intelligence?
a) human perceive everything as a pattern while
machine perceive it merely as data
b) human have emotions
c) human have more IQ & intellect
d) human have sense organs
Answer: a
Explanation: Humans have emotions & thus form
different patterns on that basis, while a
machine(say computer) is dumb & everything is just
a data for him.
4. What is auto-association task in neural networks?
a) find relation between 2 consecutive inputs
b) related to storage & recall task
c) predicting the future inputs
d) none of the mentioned
Answer: b
Explanation: This is the basic definition of auto-
association in neural networks.
5. Does pattern classification belongs to category of
non-supervised learning?
a) yes
b) no
Answer: b
Explanation: Pattern classification belongs to
category of supervised learning.
6. In pattern mapping problem in neural nets, is
there any kind of generalization involved between
input & output?
a) yes
b) no
Answer: a
Explanation: The desired output is mapped closest
to the ideal output & hence there is generalisation
involved.
7. What is unsupervised learning?
a) features of group explicitly stated
b) number of groups may be known
c) neither feature & nor number of groups is known
d) none of the mentioned
Answer: c
Explanation: Basic definition of unsupervised
learning.
8. Does pattern classification & grouping involve
same kind of learning?
a) yes
b) no
Answer: b
Explanation: Pattern classification involves
supervised learning while grouping is an
unsupervised one.
9. Does for feature mapping there’s need of
supervised learning?
a) yes
b) no
Answer: b
Explanation: Feature mapping can be unsupervised,
so it’s not a sufficient condition.
10. Example of a unsupervised feature map?
a) text recognition
b) voice recognition
c) image recognition
d) none of the mentioned
Answer: b
Explanation: Since same vowel may occur in
different context & its features vary over
overlapping regions of different vowels.
11. Who was the inventor of the first
neurocomputer?
A. Dr. John Hecht-Nielsen
B. Dr. Robert Hecht-Nielsen
C. Dr. Alex Hecht-Nielsen
D. Dr. Steve Hecht-Nielsen
Answer : B
Explanation: The inventor of the first
neurocomputer, Dr. Robert Hecht-Nielsen.
12. How many types of Artificial Neural Networks?
A. 2
B. 3
C. 4
D. 5
Answer : A
Explanation: There are two Artificial Neural
Network topologies : FeedForward and Feedback.
13. In which ANN, loops are allowed?
A. FeedForward ANN
B. FeedBack ANN
C. Both A and B
D. None of the Above
Answer : B
Explanation: FeedBack ANN loops are allowed. They
are used in content addressable memories.
14. What is the full form of BN in Neural Networks?
A. Bayesian Networks
B. Belief Networks
C. Bayes Nets
D. All of the above
Answer : D
Explanation: The full form BN is Bayesian networks
and Bayesian networks are also called Belief
Networks or Bayes Nets.
15. What is the name of node which take binary
values TRUE (T) and FALSE (F)?
A. Dual Node
B. Binary Node
C. Two-way Node
D. Ordered Node
Answer : B
Explanation: Boolean nodes : They represent
propositions, taking binary values TRUE (T) and
FALSE (F).
16. What is an auto-associative network?
A. a neural network that contains no loops
B. a neural network that contains feedback
C. a neural network that has only one loop
D. a single layer feed-forward neural network with
pre-processing
Answer : B
Explanation: An auto-associative network is
equivalent to a neural network that contains
feedback. The number of feedback paths(loops)
does not have to be one.
17. What is Neuro software?
A. A software used to analyze neurons
B. It is powerful and easy neural network
C. Designed to aid experts in real world
D. It is software used by Neurosurgeon
Answer : B
Explanation: Neuro software is powerful and easy
neural network.
18. Neural Networks are complex ______________
with many parameters.
A. Linear Functions
B. Nonlinear Functions
C. Discrete Functions
D. Exponential Functions
Answer : A
Explanation: Neural networks are complex linear
functions with many parameters.
19. Which of the following is not the promise of
artificial neural network?
A. It can explain result
B. It can survive the failure of some nodes
C. It has inherent parallelism
D. It can handle noise
Answer : A
Explanation: The artificial Neural Network (ANN)
cannot explain result.
20. The output at each node is called_____.
A. node value
B. Weight
C. neurons
D. axons
Answer : A
Explanation: The output at each node is called its
activation or node value.
21. What is full form of ANNs?
A. Artificial Neural Node
B. AI Neural Networks
C. Artificial Neural Networks
D. Artificial Neural numbers
Answer : C
Explanation: Artificial Neural Networks is the full
form of ANNs.
22. In Feed Forward ANN, information flow is
_________.
A. unidirectional
B. bidirectional
C. multidirectional
D. All of the above
Answer : A
Explanation: Feed Forward ANN the information
flow is unidirectional.
23. Which of the following is not an Machine
Learning strategies in ANNs?
A. Unsupervised Learning
B. Reinforcement Learning
C. Supreme Learning
D. Supervised Learning
Answer : C
Explanation: Supreme Learning is not an Machine
Learning strategies in ANNs.
24. Which of the following is an Applications of
Neural Networks?
A. Automotive
B. Aerospace
C. Electronics
D. All of the above
Answer : D
Explanation: All above are appliction of Neural
Networks.
25. What is perceptron?
A. a single layer feed-forward neural network with
pre-processing
B. an auto-associative neural network
C. a double layer auto-associative neural network
D. a neural network that contains feedback
Answer : A
Explanation: The perceptron is a single layer feed-
forward neural network.
26. A 4-input neuron has weights 1, 2, 3 and 4. The
transfer function is linear with the constant of
proportionality being equal to 2. The inputs are 4, 3,
2 and 1 respectively. What will be the output?
A. 30
B. 40
C. 50
D. 60
Answer : B
Explanation: The output is found by multiplying the
weights with their respective inputs, summing the
results and multiplying with the transfer function.
Therefore: Output = 2 * (1*4 + 2*3 + 3*2 + 4*1) = 40.
27. What is back propagation?
A. It is another name given to the curvy function in
the perceptron
B. It is the transmission of error back through the
network to adjust the inputs
C. It is the transmission of error back through the
network to allow weights to be adjusted so that the
network can learn
D. None of the Above
Answer : C
Explanation: Back propagation is the transmission of
error back through the network to allow weights to
be adjusted so that the network can learn.
28. The network that involves backward links from
output to the input and hidden layers is called
_________
A. Self organizing map
B. Perceptrons
C. Recurrent neural network
D. Multi layered perceptron
Ans : C
Explanation: RNN (Recurrent neural network)
topology involves backward links from output to the
input and hidden layers.
29. The BN variables are composed of how many
dimensions?
A. 2
B. 3
C. 4
D. 5
Answer : B
Explanation: The BN variables are composed of two
dimensions : Range of prepositions and Probability
assigned to each of the prepositions.
30. The first artificial neural network was invented
in _____.
A. 1957
B. 1958
C. 1959
D. 1960
Ans : B
Explanation: The first artificial neural network was
invented in 1958.
31. Back propagation is a learning technique that
adjusts weights in the neural network by
propagating weight changes.
a. Forward from source to sink
b. Backward from sink to source
c. Forward from source to hidden nodes
d. Backward from sink to hidden nodes
Answer: c
Explanation: Backward from sink to source
32. Identify the following activation function :
φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),
Z, X, Y are parameters
a. Step function
b. Ramp function
c. Sigmoid function
d. Gaussian functionAnswer: c
Explanation: Sigmoid function
33. An artificial neuron receives n inputs x1, x2,
x3…………xn with weights w1, w2, ……….wn attached
to the input links. The weighted
sum_________________ is computed to be passed
on to a non-linear filter Φ called activation function
to release the output.
a. Σ wi
b. Σ xi
c. Σ wi + Σ xi
d. Σ wi* xi
Answer: d
Explanation: Σ wi* xi
34. Match the following knowledge representation
techniques with their applications:
List – I List – II
(a) Frames (i) Pictorial representation of objects,
their attributes and relationships
(b) Conceptual dependencies (ii) To describe real
world stereotype events
(c) Associative networks (iii) Record like structures
for grouping closely related knowledge
(d) Scripts (iv) Structures and primitives to represent
sentences
code:
abcd
a. (iii) (iv) (i) (ii)
b. (iii) (iv) (ii) (i)
c. (iv) (iii) (i) (ii)
d. (iv) (iii) (ii) (i)
Answer: a
Explanation:(iii) (iv) (i) (ii)
35. In propositional logic P ⇔ Q is equivalent to
(Where ~ denotes NOT):
a. ~ (P ˅ Q) ˄ ~ (Q ˅ P)
b. (~ P ˅ Q) ˄ (~ Q ˅ P)
c. (P ˅ Q) ˄ (Q ˅ P)
d. ~ (P ˅ Q) → ~ (Q ˅ P)
Answer: b
Explanation: (~ P ˅ Q) ˄ (~ Q ˅ P)
36. Slots and facets are used in
a. Semantic Networks
b. Frames
c. Rules
d. All of these
Answer: b
Explanation: Frames
37. A neuron with 3 inputs has the weight vector
[0.2 -0.1 0.1]^T and a bias θ = 0. If the input vector is
X = [0.2 0.4 0.2]^T then the total input to the neuron
is:
a. 0.20
b. 1.0
c. 0.02
d. -1.0
Answer: c
Explanation: 0.02
38. Which of the following neural networks uses
supervised learning?
(A) Multilayer perceptron
(B) Self organizing feature map
(C) Hopfield network
a. (A) only
b. (B) only
c. (A) and (B) only
d. (A) and (C) only
Answer: a
Explanation: (A) only
39. Consider the following statements:
(a) If primal (dual) problem has a finite optimal
solution, then its dual (primal) problem has a finite
optimal solution.
(b) If primal (dual) problem has an unbounded
optimum solution, then its dual (primal) has no
feasible solution at all.
(c) Both primal and dual problems may be
infeasible.
Which of the following is correct?
a. (a) and (b) only
b. (a) and (c) only
c. (b) and (c) only
d. (a), (b) and (c)
Answer: d
Explanation:(a), (b) and (c)
40. Consider the following statements :
(a) Assignment problem can be used to minimize
the cost.
(b) Assignment problem is a special case of
transportation problem.
(c) Assignment problem requires that only one
activity be assigned to each resource.
Which of the following options is correct?
a. (a) and (b) only
b. (a) and (c) only
c. (b) and (c) only
d. (a), (b) and (c)
Answer: d
Explanation: (a), (b) and (c)
41. What is the name of the model in figure below?

a) Rosenblatt perceptron model


b) McCulloch-pitts model
c) Widrow’s Adaline model
d) None of the mentioned
Answer: b
Explanation: It is a general block diagram of
McCulloch-pitts model of neuron.
42. What is nature of function F(x) in the figure?
a) linear
b) non-linear
c) can be either linear or non-linear
d) none of the mentioned
Answer: b
Explanation: In this function, the independent
variable is an exponent in the equation hence non-
linear.
43. What does the character ‘b’ represents in the
above diagram?
a) bias
b) any constant value
c) a variable value
d) none of the mentioned
Answer: a
Explanation: More appropriate choice since bias is a
constant fixed value for any circuit model.
44. If ‘b’ in the figure below is the bias, then what
logic circuit does it represents?

a) or gate
b) and gate
c) nor gate
d) nand gate
Answer: c
Explanation: Form the truth table of above figure by
taking inputs as 0 or 1.
45. When both inputs are 1, what will be the output
of the above figure?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of nor gate.
46. When both inputs are different, what will be the
output of the above figure?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of nor gate.
47. Which of the following model has ability to
learn?
a) pitts model
b) rosenblatt perceptron model
c) both rosenblatt and pitts model
d) neither rosenblatt nor pitts
Answer: b
Explanation: Weights are fixed in pitts model but
adjustable in rosenblatt.
48. When both inputs are 1, what will be the output
of the pitts model nand gate ?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of simply a nand
gate.
49. When both inputs are different, what will be the
logical output of the figure of question 4?
a) 0
b) 1
c) either 0 or 1
d) z
Answer: a
Explanation: Check the truth table of nor gate.
50. Does McCulloch-pitts model have ability of
learning?
a) yes
b) no
Answer: b
Explanation: Weights are fixed.

Module 03

1. In descent methods, the particular choice of


search direction does not matter so much.
a. True.
b. False.
answer : b
2. In descent methods, the particular choice of line
search does not matter so much.
a. True.
b. False.
answer : a
3. When the gradient descent method is started
from a point near the solution, it will converge very
quickly.
a. True.
b. False.
answer : b
4. Newton’s method with step size $h=1$ always
works.
a. True.
b. False.
answer : b
5. When Newton’s method is started from a point
near the solution, it will converge very quickly.
a. True.
b. False.
answer : a
6. Using Newton’s method to minimize $f(Ty)$,
where $Ty=x$ and $T$ is nonsingular, can greatly
improve the convergence speed when $T$ is chosen
appropriately.
a. True.
b. False.
answer : b
7. If $f$ is self-concordant, its Hessian is Lipschitz
continuous.
a. True.
b. False.
answer : b
8. If the Hessian of $f$ is Lipschitz continuous, then
$f$ is self-concordant.
a. True.
b. False.
answer : b
9. Newton’s method should only be used to
minimize self-concordant functions.
a. True.
b. False.
answer : b
10. $f(x) = \exp x$ is self-concordant.
a. True.
b. False.
answer : b
11. $f(x) = -\log x$ is self-concordant.
a. True.
b. False.
answer : a
12. Consider the problem of minimizing \[ f(x) =
(c^Tx)^4 + \sum_{i=1}^n w_i \exp x_i, \] over $x \in
\mathbf{R}^n$, where $w \succ 0$.
Newton’s method would probably require fewer
iterations than the gradient method, but each
iteration would be much more costly.
a. True.
b. False.
answer : b
13. Newton’s method is seldom used in machine
learning because
a. common loss functions are not self-concordant
b. Newton’s method does not work well on noisy
data
c. machine learning researchers don’t really
understand linear algebra
d. it is generally not practical to form or store the
Hessian in such problems, due to large problem size
answer : d
Module 04

1. In practice, Line of best fit or regression line is


found when _____________
a) Sum of residuals (∑(Y – h(X))) is minimum
b) Sum of the absolute value of residuals (∑|Y-h(X)|)
is maximum
c) Sum of the square of residuals ( ∑ (Y-h(X))2) is
minimum
d) Sum of the square of residuals ( ∑ (Y-h(X))2) is
maximum
Answer: c
Explanation: Here we penalize higher error value
much more as compared to the smaller one, such
that there is a significant difference between
making big errors and small errors, which makes it
easy to differentiate and select the best fit line.
2. If Linear regression model perfectly first i.e., train
error is zero, then _____________________
a) Test error is also always zero
b) Test error is non zero
c) Couldn’t comment on Test error
d) Test error is equal to Train error
Answer: c
Explanation: Test Error depends on the test data. If
the Test data is an exact representation of train
data then test error is always zero. But this may not
be the case.
3. Which of the following metrics can be used for
evaluating regression models?
i) R Squared ii) Adjusted R Squared iii) F Statistics iv)
RMSE / MSE / MAE
a) ii and iv
b) i and ii
c) ii, iii and iv
d) i, ii, iii and iv
Answer: d
Explanation: These (R Squared, Adjusted R Squared,
F Statistics, RMSE / MSE / MAE) are some metrics
which you can use to evaluate your regression
model.
4. How many coefficients do you need to estimate
in a simple linear regression model (One
independent variable)?
a) 1
b) 2
c) 3
d) 4
Answer: b
Explanation: In simple linear regression, there is one
independent variable so 2 coefficients
(Y=a+bx+error).
5. In a simple linear regression model (One
independent variable), If we change the input
variable by 1 unit. How much output variable will
change?
a) by 1
b) no change
c) by intercept
d) by its slope
Answer: d
Explanation: For linear regression Y=a+bx+error. If
neglect error then Y=a+bx. If x increases by 1, then Y
= a+b(x+1) which implies Y=a+bx+b. So Y increases
by its slope.
6. Function used for linear regression in R is
__________
a) lm(formula, data)
b) lr(formula, data)
c) lrm(formula, data)
d) regression.linear(formula, data)
Answer: a
Explanation: lm(formula, data) refers to a linear
model in which formula is the object of the class
“formula”, representing the relation between
variables. Now this formula is on applied on the
data to create a relationship model.
7. In syntax of linear model lm(formula,data,..), data
refers to ______
a) Matrix
b) Vector
c) Array
d) List
Answer: b
Explanation: Formula is just a symbol to show the
relationship and is applied on data which is a vector.
In General, data.frame are used for data.
8. In the mathematical Equation of Linear
Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
Answer: c
Explanation: Y-intercept is β1 and X-intercept is –
(β1 / β2). Intercepts are defined for axis and formed
when the coordinates are on the axis.
9) Looking at above two characteristics, which of
the following option is the correct for Pearson
correlation between V1 and V2?
If you are given the two variables V1 and V2 and
they are following below two characteristics.
1. If V1 increases then V2 also increases
2. If V1 decreases then V2 behavior is unknown
A) Pearson correlation will be close to 1
B) Pearson correlation will be close to -1
C) Pearson correlation will be close to 0
D) None of these
answer: (D)
10) Suppose Pearson correlation between V1 and V2
is zero. In such case, is it right to conclude that V1
and V2 do not have any relation between them?
A) TRUE
B) FALSE
answer: (B)
11) Which of the following offsets, do we use in
linear regression’s least square line fit? Suppose
horizontal axis is independent variable and vertical
axis is dependent variable.

A) Vertical offset
B) Perpendicular offset
C) Both, depending on the situation
D) None of above
answer: (A)
12) True- False: Overfitting is more likely when you
have huge amount of data to train?
A) TRUE
B) FALSE
answer: (B)
13) We can also compute the coefficient of linear
regression with the help of an analytical method
called “Normal Equation”. Which of the following
is/are true about Normal Equation?
We don’t have to choose the learning rate
It becomes slow when number of features is very
large
Thers is no need to iterate
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
answer: (D)
14) Which of the following statement is true about
sum of residuals of A and B?
Below graphs show two fitted regression lines (A &
B) on randomly generated data. Now, I want to find
the sum of residuals in both cases A and B.
Note:
Scale is same in both graphs for both axis.
X axis is independent variable and Y-axis is
dependent variable.
A) A has higher sum of residuals than B
B) A has lower sum of residual than B
C) Both have same sum of residuals
D) None of these
answer: (C)
15) Choose the option which describes bias in best
manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
answer: (B)
16) What will happen when you apply very large
penalty?
A) Some of the coefficient will become absolute
zero
B) Some of the coefficient will approach zero but
not absolute zero
C) Both A and B depending on the situation
D) None of these
answer: (B)
17) What will happen when you apply very large
penalty in case of Lasso?
A) Some of the coefficient will become zero
B) Some of the coefficient will be approaching to
zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
answer: (A)
18) Which of the following statement is true about
outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
answer: (A)
19) Suppose you plotted a scatter plot between the
residuals and predicted values in linear regression
and you found that there is a relationship between
them. Which of the following conclusion do you
make about this situation?
A) Since the there is a relationship means our model
is not good
B) Since the there is a relationship means our model
is good
C) Can’t say
D) None of these
answer: (A)
20) What will happen when you fit degree 4
polynomial in linear regression?
A) There are high chances that degree 4 polynomial
will over fit the data
B) There are high chances that degree 4 polynomial
will under fit the data
C) Can’t say
D) None of these
answer: (A)
21) What will happen when you fit degree 2
polynomial in linear regression?
A) It is high chances that degree 2 polynomial will
over fit the data
B) It is high chances that degree 2 polynomial will
under fit the data
C) Can’t say
D) None of these
answer: (B)
22) In terms of bias and variance. Which of the
following is true when you fit degree 2 polynomial?
A) Bias will be high, variance will be high
B) Bias will be low, variance will be high
C) Bias will be high, variance will be low
D) Bias will be low, variance will be low
answer: (C)
23) Suppose l1, l2 and l3 are the three learning rates
for A,B,C respectively. Which of the following is true
about l1,l2 and l3?

A) l2 < l1 < l3
B) l1 > l2 > l3
C) l1 = l2 = l3
D) None of these
answer: (A)
24) Now we increase the training set size gradually.
As the training set size increases, what do you
expect will happen with the mean training error?
A) Increase
B) Decrease
C) Remain constant
D) Can’t Say
answer: (D)
25) What do you expect will happen with bias and
variance as you increase the size of training data?
A) Bias increases and Variance increases
B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
E) Can’t Say False
answer: (D)
26) What would be the root mean square training
error for this data if you run a Linear Regression
model of the form (Y = A0+A1X)?

A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these
answer: (C)
Question Context 27-28:
Suppose you have been given the following scenario
for training and validation error for Linear
Regression.

27) Which of the following scenario would give you


the right hyper parameter?
A) 1
B) 2
C) 3
D) 4
answer: (B)
28) Suppose you got the tuned hyper parameters
from the previous question. Now, Imagine you want
to add a variable in variable space such that this
added feature is important. Which of the following
thing would you observe in such case?
A) Training Error will decrease and Validation error
will increase
B) Training Error will increase and Validation error
will increase
C) Training Error will increase and Validation error
will decrease
D) Training Error will decrease and Validation error
will decrease
E) None of the above
answer: (D)
29) In such situation which of the following options
would you consider?
Add more variables
Start introducing polynomial degree variables
Remove some variables
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
answer: (A)
30) Now situation is same as written in previous
question(under fitting).Which of following
regularization algorithm would you prefer?
A) L1
B) L2
C) Any
D) None of these
answer: (D)
31) Which of the following evaluation metrics can
not be applied in case of logistic regression output
to compare with target?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
answer: D
32) One of the very good methods to analyze the
performance of Logistic Regression is AIC, which is
similar to R-Squared in Linear Regression. Which of
the following is true about AIC?
A) We prefer a model with minimum AIC value
B) We prefer a model with maximum AIC value
C) Both but depend on the situation
D) None of these
answer: A
33) [True-False] Standardisation of features is
required before training a Logistic Regression.
A) TRUE
B) FALSE
answer: B
34) Which of the following algorithms do we use for
Variable Selection?
A) LASSO
B) Ridge
C) Both
D) None of these
answer: A
Context: 35-36
Consider a following model for logistic regression: P
(y =1|x, w)= g(w0 + w1x)
where g(z) is the logistic function.
In the above equation the P (y =1|x; w) , viewed as
a function of x, that we can get by changing the
parameters w.
35) What would be the range of p in such case?
A) (0, inf)
B) (-inf, 0 )
C) (0, 1)
D) (-inf, inf)
answer: C
36) In above question what do you think which
function would make p between (0,1)?
A) logistic function
B) Log likelihood function
C) Mixture of both
D) None of them
answer: A
37) Suppose you have been given a fair coin and you
want to find out the odds of getting heads. Which of
the following option is true for such a case?
A) odds will be 0
B) odds will be 0.5
C) odds will be 1
D) None of these
answer: C
38) The logit function(given as l(x)) is the log of odds
function. What could be the range of logit function
in the domain x=[0,1]?
A) (– ∞ , ∞)
B) (0,1)
C) (0, ∞)
D) (- ∞, 0)
answer: A
39) Which of the following option is true?
A) Linear Regression errors values has to be
normally distributed but in case of Logistic
Regression it is not the case
B) Logistic Regression errors values has to be
normally distributed but in case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression
error values have to be normally distributed
D) Both Linear Regression and Logistic Regression
error values have not to be normally distributed
answer: A
40) Which of the following is true regarding the
logistic function for any value “x”?
Note:
Logistic(x): is a logistic function of any number “x”
Logit(x): is a logit function of any number “x”
Logit_inv(x): is a inverse logit function of any
number “x”
A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these
answer: B
41. A _________ is a decision support tool that uses
a tree-like graph or model of decisions and their
possible consequences, including chance event
outcomes, resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
Answer: a
Explanation: Refer the definition of Decision tree.
42. Decision Tree is a display of an algorithm.
a) True
b) False
Answer: a
Explanation: None.
43. What is Decision Tree?
a) Flow-Chart
b) Structure in which internal node represents test
on an attribute, each branch represents outcome of
test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node
represents test on an attribute, each branch
represents outcome of test and each leaf node
represents class label
d) None of the mentioned
Answer: c
Explanation: Refer the definition of Decision tree.
44. Decision Trees can be used for Classification
Tasks.
a) True
b) False
Answer: a
Explanation: None.
45. Choose from the following that are Decision
Tree nodes?
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
Answer: d
Explanation: None.
46. Decision Nodes are represented by
____________
a) Disks
b) Squares
c) Circles
d) Triangles
Answer: b
Explanation: None.
47. Chance Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
Answer: c
Explanation: None.
48. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
Answer: d
Explanation: None.
49. Which of the following are the advantage/s of
Decision Trees?
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided
by a model
c) Worst, best and expected values can be
determined for different scenarios
d) All of the mentioned
Answer: d
Explanation: None.

Module 05

1. Instead of representing knowledge in a relatively


declarative, static way (as a bunch of things that are
true), rule-based system represent knowledge in
terms of___________ that tell you what you should
do or what you could conclude in different
situations.
a) Raw Text
b) A bunch of rules
c) Summarized Text
d) Collection of various Texts
Answer: b
Explanation: None.
2. A rule-based system consists of a bunch of IF-
THEN rules.
a) True
b) False
Answer: a
Explanation: None.
3. In a backward chaining system you start with the
initial facts, and keep using the rules to draw new
conclusions (or take certain actions) given those
facts.
a) True
b) False
Answer: b
Explanation: Refer the definition of backward
chaining.
4. In a backward chaining system, you start with
some hypothesis (or goal) you are trying to prove,
and keep looking for rules that would allow you to
conclude that hypothesis, perhaps setting new sub-
goals to prove as you go.
a) True
b) False
Answer: a
Explanation: None.
5. Forward chaining systems are _____________
where as backward chaining systems are
___________
a) Goal-driven, goal-driven
b) Goal-driven, data-driven
c) Data-driven, goal-driven
d) Data-driven, data-driven
Answer: c
Explanation: None.
6. A Horn clause is a clause with _______ positive
literal.
a) At least one
b) At most one
c) None
d) All
Answer: b
Explanation: Refer to the definition of Horn Clauses.
7. ___________ trees can be used to infer in Horn
clause systems.
a) Min/Max Tree
b) And/Or Trees
c) Minimum Spanning Trees
d) Binary Search Trees
Answer: b
Explanation: Take the analogy using min/max trees
in game theory.
8. An expert system is a computer program that
contains some of the subject-specific knowledge of
one or more human experts.
a) True
b) False
Answer: a
Explanation: None.
9. A knowledge engineer has the job of extracting
knowledge from an expert and building the expert
system knowledge base.
a) True
b) False
Answer: a
Explanation: None.
10. What is needed to make probabilistic systems
feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the above
answer : b
Explanation: On a model-based knowledge provides
the crucial robustness needed to make probabilistic
system feasible in the real world.
11. How many terms are required for building a
bayes model?
a) 1
b) 2
c) 3
d) 4
Answer: c
Explanation: The three required terms are a
conditional probability and two unconditional
probability.
12. What is needed to make probabilistic systems
feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned
Answer: b
Explanation: On a model-based knowledge provides
the crucial robustness needed to make probabilistic
system feasible in the real world.
13. Where does the bayes rule can be used?
a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
Answer: d
Explanation: Bayes rule can be used to answer the
probabilistic queries conditioned on one piece of
evidence.
14. What does the bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
Answer: a
Explanation: A Bayesian network provides a
complete description of the domain.
15. How the entries in the full joint probability
distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
Answer: b
Explanation: Every entry in the full joint probability
distribution can be calculated from the information
in the network.
16. How the bayesian network can be used to
answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned
Answer: b
Explanation: If a bayesian network is a
representation of the joint distribution, then it can
solve any query, by summing all the relevant joint
entries.
17. How the compactness of the bayesian network
can be described?
a) Locally structured
b) Fully structured
c) Partial structure
d) All of the mentioned
Answer: a
Explanation: The compactness of the bayesian
network is an example of a very general property of
a locally structured system.
18. To which does the local structure is associated?
a) Hybrid
b) Dependant
c) Linear
d) None of the mentioned
Answer: c
Explanation: Local structure is usually associated
with linear rather than exponential growth in
complexity.
19. Which condition is used to influence a variable
directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
Answer: b
Explanation: None.
20. What is the consequence between a node and
its predecessors while creating bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Answer: c
Explanation: The semantics to derive a method for
constructing bayesian networks were led to the
consequence that a node can be conditionally
independent of its predecessors.
21. Which algorithm is used for solving temporal
probabilistic reasoning?
a) Hill-climbing search
b) Hidden markov model
c) Depth-first search
d) Breadth-first search
Answer: b
Explanation: Hidden Markov model is used for
solving temporal probabilistic reasoning that was
independent of transition and sensor model.
22. How does the state of the process is described in
HMM?
a) Literal
b) Single random variable
c) Single discrete random variable
d) None of the mentioned
Answer: c
Explanation: An HMM is a temporal probabilistic
model in which the state of the process is described
by a single discrete random variable.
23. What are the possible values of the variable?
a) Variables
b) Literals
c) Discrete variable
d) Possible states of the world
Answer: d
Explanation: The possible values of the variables are
the possible states of the world.
24. Where does the additional variables are added
in HMM?
a) Temporal model
b) Reality model
c) Probability model
d) All of the mentioned
Answer: a
Explanation: Additional state variables can be
added to a temporal model while staying within the
HMM framework.
25. Which allows for a simple and matrix
implementation of all the basic algorithm?
a) HMM
b) Restricted structure of HMM
c) Temporary model
d) Reality model
Answer: b
Explanation: Restricted structure of HMM allows for
a very simple and elegant matrix implementation of
all the basic algorithm.
26. Where does the Hidden Markov Model is used?
a) Speech recognition
b) Understanding of real world
c) Both Speech recognition & Understanding of real
world
d) None of the mentioned
Answer: a
Explanation: None.
27. Which variable can give the concrete form to the
representation of the transition model?
a) Single variable
b) Discrete state variable
c) Random variable
d) Both Single & Discrete state variable
Answer: d
Explanation: With a single, discrete state variable,
we can give concrete form to the representation of
the transition model.
28. Which algorithm works by first running the
standard forward pass to compute?
a) Smoothing
b) Modified smoothing
c) HMM
d) Depth-first search algorithm
Answer: b
Explanation: The modified smoothing algorithm
works by first running the standard forward pass to
compute and then running the backward pass.
29. Which reveals an improvement in online
smoothing?
a) Matrix formulation
b) Revelation
c) HMM
d) None of the mentioned
Answer: a
Explanation: Matrix formulation reveals an
improvement in online smoothing with a fixed lag.
30. Which suggests the existence of an efficient
recursive algorithm for online smoothing?
a) Matrix
b) Constant space
c) Constant time
d) None of the mentioned
Answer: b
Explanation: None.
31. The Expectation Maximization algorithm has
been used to identify conserved domains in
unaligned proteins only.
a) True
b) False
Answer: b
Explanation: This algorithm has been used to
identify both conserved domains in unaligned
proteins and protein-binding sites in unaligned DNA
sequences (Lawrence and Reilly 1990), including
sites that may include gaps (Cardon and Stormo
1992). Given are a set of sequences that are
expected to have a common sequence pattern and
may not be easily recognizable by eye.
32. Which of the following is untrue regarding
Expectation Maximization algorithm?
a) An initial guess is made as to the location and size
of the site of interest in each of the sequences, and
these parts of the sequence are aligned
b) The alignment provides an estimate of the base
or amino acid composition of each column in the
site
c) The column-by-column composition of the site
already available is used to estimate the probability
of finding the site at any position in each of the
sequences
d) The row-by-column composition of the site
already available is used to estimate the probability
Answer: d
Explanation: The EM algorithm then consists of two
steps, which are repeated consecutively. In step 1,
the expectation step, the column-by-column
composition of the site already available is used to
estimate the probability of finding the site at any
position in each of the sequences. These
probabilities are used in turn to provide new
information as to the expected base or amino acid
distribution for each column in the site.
33. Out of the two repeated steps in EM algorithm,
the step 2 is ________
a) the maximization step
b) the minimization step
c) the optimization step
d) the normalization step
Answer: a
Explanation: In step 2, the maximization step, the
new counts of bases or amino acids for each
position in the site found in step 1 are substituted
for the previous set. Step 1 is then repeated using
these new counts. The cycle is repeated until the
algorithm converges on a solution and does not
change with further cycles. At that time, the best
location of the site in each sequence and the best
estimate of the residue composition of each column
in the site will be available.
34. In EM algorithm, as an example, suppose that
there are 10 DNA sequences having very little
similarity with each other, each about 100
nucleotides long and thought to contain a binding
site near the middle 20 residues, based on
biochemical and genetic evidence. the following
steps would be used by the EM algorithm to find the
most probable location of the binding sites in each
of the ______ sequences.
a) 30
b) 10
c) 25
d) 20
Answer: b
Explanation: When examining the EM program
MEME, the size and number of binding sites, the
location in each sequence, and whether or not the
site is present in each sequence do not necessarily
have to be known. For the present example, the
following steps would be used by the EM algorithm
to find the most probable location of the binding
sites in each of the 10 sequences.
35. In the initial step of EM algorithm, the 20-
residue-long binding motif patterns in each
sequence are aligned as an initial guess of the motif.
a) True
b) False
Answer: a
Explanation: The base composition of each column
in the aligned patterns is then determined. The
composition of the flanking sequence on each side
of the site provides the surrounding base or amino
acid composition for comparison. Each sequence is
assumed to be the same length and to be aligned by
the ends.
36. In the intermediate steps of EM algorithm, the
number of each base in each column is determined
and then converted to fractions.
a) True
b) False
Answer: a
Explanation: For example, that there are four Gs in
the first column of the 10 sequences, then the
frequency of G in the first column of the site, fSG =
4/10 = 0.4. This procedure is repeated for each base
and each column.
37. For the 10-residue DNA sequence example,
there are _______ possible starting sites for a 20-
residue-long site.
a) 30
b) 21
c) 81
d) 60
Answer: c
Explanation: For the 10-residue DNA sequence
example, there are 100 – 20 +1 possible starting
sites for a 20-residue-long site. Where the first one
is at position 1 in the sequence ending one at 20 and
the last beginning at position 81 and ending at 100
(there is not enough sequence for a 20-residue-long
site beyond position 81).
38. An alternative method is to produce an odds
scoring matrix calculated by dividing each base
frequency by the background frequency of that
base.
a) True
b) False
Answer: a
Explanation: In this method, the probability of each
location is then found by multiplying the odds
scores from each column. An even simpler method
is to use log odds scores in the matrix. The column
scores are then simply added. In this case, the log
odds scores must be converted to odds scores
before position probabilities are calculated.
39. Which of the following about MEME is untrue?
a) It is a Web resource for performing local MSAs
(Multiple Sequence Alignment) by the above
expectation maximization method is the program
MEME
b) It stands for Multiple EM for Motif Elicitation
c) It was developed at developed at the University
of California at San Diego Supercomputing Center
d) The Web page has multiple versions for searching
blocks by an EM algorithm
Answer: d
Explanation: The Web page for two versions of
MEME, ParaMEME, a Web program that searches
for blocks by an EM algorithm (Described below),
and a similar program MetaMEME (which searches
for profiles using HMMs, described below).The
Motif Alignment and Search Tool (MAST) for
searching through databases for matches to motifs.
40. Which of the following about the Gibbs sampler
is untrue?
a) It is a statistical method for finding motifs in
sequences
b) It is dissimilar to the principle of the EM method
c) It searches for the statistically most probable
motifs
d) It can find the optimal width and number of given
motifs in each sequence
Answer: b
Explanation: It is another statistical method for
finding motifs in sequences is the Gibbs sampler.
The method is similar in principle to the EM method
described above, but the algorithm is different. A
combinatorial approach of the Gibbs sampler and
MOTIF may be used to make blocks at the BLOCKS
Web site.
41. Bayesian Belief Network is also known as ?
A. belief network
B. decision network
C. Bayesian model
D. All of the above
Answer : D
Explanation: Bayesian Belief Network also called
a Bayes network, belief network, decision network,
or Bayesian model.
42. Bayesian Network consist of ?
A. 2
B. 3
C. 4
D. 5
Answer : A
Explanation: Bayesian Network can be used for
building models from data and experts opinions,
and it consists of two parts: Directed Acyclic Graph
and Table of conditional probabilities.
43. The generalized form of Bayesian network that
represents and solve decision problems under
uncertain knowledge is known as an?
A. Directed Acyclic Graph
B. Table of conditional probabilities
C. Influence diagram
D. None of the above
Answer : C
Explanation: The generalized form of Bayesian
network that represents and solve decision
problems under uncertain knowledge is known as
an Influence diagram
44. How many component does Bayesian network
have?
A. 2
B. 3
C. 4
D. 5
Answer : A
Explanation: The Bayesian network has mainly two
components: Causal Component and Actual
numbers
45. The Bayesian network graph does not contain
any cyclic graph. Hence, it is known as a
A. DCG
B. DAG
C. CAG
D. SAG
Answer : B
Explanation: The Bayesian network graph does not
contain any cyclic graph. Hence, it is known as
a directed acyclic graph or DAG.
46. In a Bayesian network variable is?
A. continuous
B. discrete
C. Both A and B
D. None of the above
Answer : C
Explanation: Each node corresponds to the random
variables, and a variable can
be continuous or discrete.
47. If we have variables x1, x2, x3,….., xn, then the
probabilities of a different combination of x1, x2,
x3.. xn, are known as?
A. Table of conditional probabilities
B. Causal Component
C. Actual numbers
D. Joint probability distribution
Answer : D
Explanation: If we have variables x1, x2, x3,….., xn,
then the probabilities of a different combination of
x1, x2, x3.. xn, are known as Joint probability
distribution.
48. The nodes and links form the structure of the
Bayesian network, and we call this the ?
A. structural specification
B. multi-variable nodes
C. Conditional Linear Gaussian distributions
D. None of the above
Answer : A
Explanation: The nodes and links form the structure
of the Bayesian network, and we call this
the structural specification.
49. Which of the following are used for modeling
times series and sequences?
A. Decision graphs
B. Dynamic Bayesian networks
C. Value of information
D. Parameter tuning
Answer : B
Explanation: Dynamic Bayesian networks (DBNs) are
used for modeling times series and sequences.
50. How many terms are required for building a
bayes model?
A. 1
B. 2
C. 3
D. 4
Answer : C
Explanation: The three required terms are a
conditional probability and two unconditional
probability.
51) Which of the following algorithms cannot be
used for reducing the dimensionality of data?
A. t-SNE
B. PCA
C. LDA False
D. None of these
Answer: (D)
52) [ True or False ] PCA can be used for projecting
and visualizing data in lower dimensions.
A. TRUE
B. FALSE
Answer: (A)
53) The most popularly used dimensionality
reduction algorithm is Principal Component Analysis
(PCA). Which of the following is/are true about
PCA?
PCA is an unsupervised method
It searches for the directions that data have the
largest variance
Maximum number of principal components <=
number of features
All principal components are orthogonal to each
other
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. 1, 2 and 3
E. 1,2 and 4
F. All of the above
Answer: (F)
54) Suppose we are using dimensionality reduction
as pre-processing technique, i.e, instead of using all
the features, we reduce the data to k dimensions
with PCA. And then use these PCA projections as
our features. Which of the following statement is
correct?
A. Higher ‘k’ means more regularization
B. Higher ‘k’ means less regularization
C. Can’t Say
Answer: (B)
55) In which of the following scenarios is t-SNE
better to use than PCA for dimensionality reduction
while working on a local machine with minimal
computational power?
A. Dataset with 1 Million entries and 300 features
B. Dataset with 100000 entries and 310 features
C. Dataset with 10,000 entries and 8 features
D. Dataset with 10,000 entries and 200 features
Answer: (C)
56) Which of the following statement is true for a t-
SNE cost function?
A. It is asymmetric in nature.
B. It is symmetric in nature.
C. It is same as the cost function for SNE.
Answer: (B)
57) Imagine you are dealing with text data. To
represent the words you are using word embedding
(Word2vec). In word embedding, you will end up
with 1000 dimensions. Now, you want to reduce the
dimensionality of this high dimensional data such
that, similar words should have a similar meaning in
nearest neighbor space.In such case, which of the
following algorithm are you most likely choose?
A. t-SNE
B. PCA
C. LDA
D. None of these
Answer: (A)
58) [True or False] t-SNE learns non-parametric
mapping.
A. TRUE
B. FALSE
Answer: (A)
59) Which of the following statement is correct for
t-SNE and PCA?
A. t-SNE is linear whereas PCA is non-linear
B. t-SNE and PCA both are linear
C. t-SNE and PCA both are nonlinear
D. t-SNE is nonlinear whereas PCA is linear
Answer: (D)
60) In t-SNE algorithm, which of the following hyper
parameters can be tuned?
A. Number of dimensions
B. Smooth measure of effective number of
neighbours
C. Maximum number of iterations
D. All of the above
Answer: (D)
61) The minimum time complexity for training an
SVM is O(n2). According to this fact, what sizes of
datasets are not best suited for SVM’s?
A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter
Answer: A
62) The effectiveness of an SVM depends upon:
A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
Answer: D
63) Support vectors are the data points that lie
closest to the decision surface.
A) TRUE
B) FALSE
Answer: A
64) The SVM’s are less effective when:
A) The data is linearly separable
B) The data is clean and ready to use
C) The data is noisy and contains overlapping points
Answer: C
65) Suppose you are using RBF kernel in SVM with
high Gamma value. What does this signify?
A) The model would consider even far away points
from hyperplane for modeling
B) The model would consider only the points close
to the hyperplane for modeling
C) The model would not be affected by distance of
points from hyperplane for modeling
D) None of the above
Answer: B
66) The cost parameter in the SVM means:
A) The number of cross-validations to be made
B) The kernel to be used
C) The tradeoff between misclassification and
simplicity of the model
D) None of the above
Answer: C
67)Suppose you are building a SVM model on data
X. The data X can be error prone which means that
you should not trust any specific data point too
much. Now think that you want to build a SVM
model which has quadratic kernel function of
polynomial degree 2 that uses Slack variable C as
one of it’s hyper parameter. Based upon that give
the answer for following question.
What would happen when you use very large value
of C(C->infinity)?
Note: For small C was also classifying all data points
correctly
A) We can still classify data correctly for given
setting of hyper parameter C
B) We can not classify data correctly for given
setting of hyper parameter C
C) Can’t Say
D) None of these
Answer: A
68) What would happen when you use very small C
(C~0)?
A) Misclassification would happen
B) Data will be correctly classified
C) Can’t say
D) None of these
Answer: A
69) If I am using all features of my dataset and I
achieve 100% accuracy on my training set, but ~70%
on validation set, what should I look out for?
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
Answer: C
70) Which of the following are real world
applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Answer: D
Question Context: 71 – 72
Suppose you have trained an SVM with linear
decision boundary after training SVM, you correctly
infer that your SVM model is under fitting.
71) Which of the following option would you more
likely to consider iterating SVM next time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features
Answer: C
72) Suppose you gave the correct answer in
previous question. What do you think that is
actually happening?
1. We are lowering the bias
2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance
A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4
Answer: C
73) In above question suppose you want to change
one of it’s(SVM) hyperparameter so that effect
would be same as previous questions i.e model will
not under fit?
A) We will increase the parameter C
B) We will decrease the parameter C
C) Changing in C don’t effect
D) None of these
Answer: A
74) We usually use feature normalization before
using the Gaussian kernel in SVM. What is true
about feature normalization?
1. We do feature normalization so that new feature
will dominate other
2. Some times, feature normalization is not feasible
in case of categorical variables
3. Feature normalization always helps when we use
Gaussian kernel in SVM
A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3
Answer: B
Question Context: 75
Suppose you are dealing with 4 class classification
problem and you want to train a SVM model on the
data for that you are using One-vs-all method. Now
answer the below questions?
75) How many times we need to train our SVM
model in such case?
A) 1
B) 2
C) 3
D) 4
Solution: D

MORE MCQ

1) What is machine learning ?


• A. Machine learning is the science of getting
computers to act without being explicitly
programmed.
• B.Machine Learning is a Form of AI that Enables
a System to Learn from Data.
• C.Both A and B
• D.None of the above
2) Machine learning is an application of
___________.
• A. Blockchain
• B.Artificial Intelligence
• C.Both A and B
• D.None of the above
3) Application of Machine learning is __________.
• A. email filtering
• B.sentimental analysis
• C.face recognition
• D.All of the above
4) The term machine learning was coined in which
year?
• A. 1958
• B.1959
• C.1960
• D.1961
5) Machine learning approaches can be traditionally
categorized into ______ categories.
• A. 3
• B.4
• C.7
• D.9
6) The categories in which Machine learning
approaches can be traditionally categorized are
______ .
• A. Supervised learning
• B.Unsupervised learning
• C.Reinforcement learning
• D.All of the above
7) _________ is the machine learning algorithms
that can be used with labeled data.
• A. Regression algorithms
• B.Clustering algorithms
• C.Association algorithms
• D.All of the above
8) __________ is the machine learning algorithms
that can be used with unlabeled data.
• A. Regression algorithms
• B.Clustering algorithms
• C.Instance-based algorithms
• D.All of the above
9) The Real-world machine learning use cases are
_______.
• A. Digital assistants
• B.Chatbots
• C.Fraud detection
• D.All of the above
10) Which among the following algorithms are used
in Machine learning?
• A. Naive Bayes
• B.Support Vector Machines
• C.K-Nearest Neighbors
• D.All of the above
11) __________ are the techniques of keyword
normalization
• A. Lemmatization
• B.Stemming
• C.Both A and B
• D.None of the above
12) Replace missing values with
mean/median/mode helps to handle missing or
corrupted data in a dataset. True/False?
• A. True
• B.False
13) ________ is a disadvantage of decision trees?
• A. Decision trees are robust to outliers
• B.Decision trees are prone to be overfit
• C.Both A and B
• D.None of the above
14) ________ is a part of machine learning that
works with neural networks.
• A. Artificial inteligence
• B.Deep learning
• C.Both A and B
• D.None of the above
15) Overfitting is a type of modelling error which
results in the failure to predict future observations
effectively or fit additional data in the existing
model. Yes/No?
• A. Yes
• B.No
• C.May be
• D.Can't say
16) ________ is used as an input to the machine
learning model for training and prediction purposes.
• A. Feature
• B.Feature Vector
• C.Both A and B
• D.None of the above
17) _______ is the scenario when the model fails to
decipher the underlying trend in the input data.
• A. Overfitting
• B.Underfitting
• C.Both A and B
• D.None of the above
18) Which Language is Best for Machine Learning?
• A. C
• B.Java
• C.Python
• D.HTML
19) The supervised learning problems can be
grouped as _______.
• A. Regression problems
• B.Classification problems
• C.Both A and B
• D.None of the above
20) The unsupervised learning problems can be
grouped as _______.
• A. Clustering
• B.Association
• C.Both A and B
• D.None of the above
21) Automatic Speech Recognition systems find a
wide variety of applications in the _________
domains.
• A. Medical Assistance
• B.Industrial Robotics
• C.Defence & Aviation
• D.All of the above
22) The term machine learning was coined by
__________.
• A. James Gosling
• B.Arthur Samuel
• C.Guido van Rossum
• D.None of the above
23) Machine Learning can automate many tasks,
especially the ones that only humans can perform
with their innate intelligence.
• A. True
• B.False
24) Features of Machine Learning are______.
• A. Automation
• B.Improved customer experience
• C.Business intelligence
• D.All of the above
25) Which machine learning models are trained to
make a series of decisions based on the rewards and
feedback they receive for their actions?
• A. Supervised learning
• B.Unsupervised learning
• C.Reinforcement learning
• D.All of the above

MORE MCQ

Question Context
A feature F1 can take certain value: A, B, C, D, E, & F
and represents grade of students from a college.
1) Which of the following statement is true in
following case?
A) Feature F1 is an example of nominal variable.
B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these
Solution: (B)
Ordinal variables are the variables which has some
order in their categories. For example, grade A
should be consider as high grade than grade B.
2) Which of the following is an example of a
deterministic algorithm?
A) PCA
B) K-Means
C) None of the above
Solution: (A)A deterministic algorithm is that in
which output does not change on different runs.
PCA would give the same result if we run again, but
not k-means.

3) [True or False] A Pearson correlation between


two variables is zero but, still their values can still
be related to each other.
A) TRUE
B) FALSE
Solution: (A)
Y=X2. Note that, they are not only associated, but
one is a function of the other and Pearson
correlation between them is 0.
4) Which of the following statement(s) is / are true
for Gradient Decent (GD) and Stochastic Gradient
Decent (SGD)?
1. In GD and SGD, you update a set of
parameters in an iterative manner to minimize
the error function.
2. In SGD, you have to run through all the
samples in your training set for a single update
of a parameter in each iteration.
3. In GD, you either use the entire data or a
subset of training data to update a parameter in
each iteration.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: (A)In SGD for each iteration you choose
the batch which is generally contain the random
sample of data But in case of GD each iteration
contain the all of the training observations.

5) Which of the following hyper parameter(s), when


increased may cause random forest to over fit the
data?
1. Number of Trees
2. Depth of Tree
3. Learning Rate
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: (B)Usually, if we increase the depth of tree
it will cause overfitting. Learning rate is not an
hyperparameter in random forest. Increase in the
number of tree will cause under fitting.
6) Imagine, you are working with “Analytics Vidhya”
and you want to develop a machine learning
algorithm which predicts the number of views on
the articles.
Your analysis is based on features like author name,
number of articles written by the same author on
Analytics Vidhya in past and a few other features.
Which of the following evaluation metric would you
choose in that case?
1. Mean Square Error
2. Accuracy
3. F1 Score
A) Only 1
B) Only 2
C) Only 3
D) 1 and 3
E) 2 and 3
F) 1 and 2
Solution:(A)
You can think that the number of views of articles is
the continuous target variable which fall under the
regression problem. So, mean squared error will be
used as an evaluation metrics.

7) Given below are three images (1,2,3). Which of


the following option is correct for these images?

A)

B)
C)
A) 1 is tanh, 2 is ReLU and 3 is SIGMOID activation
functions.
B) 1 is SIGMOID, 2 is ReLU and 3 is tanh activation
functions.
C) 1 is ReLU, 2 is tanh and 3 is SIGMOID activation
functions.
D) 1 is tanh, 2 is SIGMOID and 3 is ReLU activation
functions.
Solution: (D)
The range of SIGMOID function is [0,1].
The range of the tanh function is [-1,1].
The range of the RELU function is [0, infinity].
So Option D is the right answer.
8) Below are the 8 actual values of target variable in
the train file.
[0,0,0,1,1,1,1,1]
What is the entropy of the target variable?
A) -(5/8 log(5/8) + 3/8 log(3/8))
B) 5/8 log(5/8) + 3/8 log(3/8)
C) 3/8 log(5/8) + 5/8 log(3/8)
D) 5/8 log(3/8) – 3/8 log(5/8)
Solution: (A)The formula for entropy is
So the answer is A.

9) Let’s say, you are working with categorical


feature(s) and you have not looked at the
distribution of the categorical variable in the test
data.
You want to apply one hot encoding (OHE) on the
categorical feature(s). What challenges you may
face if you have applied OHE on a categorical
variable of train dataset?
A) All categories of categorical variable are not
present in the test dataset.
B) Frequency distribution of categories is different
in train as compared to the test dataset.
C) Train and Test always have same distribution.
D) Both A and B
E) None of these
Solution: (D)Both are true, The OHE will fail to
encode the categories which is present in test but
not in train so it could be one of the main challenges
while applying OHE. The challenge given in option B
is also true you need to more careful while applying
OHE if frequency distribution doesn’t same in train
and test.

10) Skip gram model is one of the best models used


in Word2vec algorithm for words embedding.
Which one of the following models depict the skip
gram model?
A) A
B) B
C) Both A and B
D) None of these
Solution: (B)
Both models (model1 and model2) are used in
Word2vec algorithm. The model1 represent a CBOW
model where as Model2 represent the Skip gram
model.

11) Let’s say, you are using activation function X in


hidden layers of neural network. At a particular
neuron for any given input, you get the output as “-
0.0001”. Which of the following activation function
could X represent?
A) ReLU
B) tanh
C) SIGMOID
D) None of these
Solution: (B)The function is a tanh because the this
function output range is between (-1,-1).

12) [True or False] LogLoss evaluation metric can


have negative values.
A) TRUE
B) FALSE
Solution: (B)Log loss cannot have negative values.

13) Which of the following statements is/are true


about “Type-1” and “Type-2” errors?
1. Type1 is known as false positive and Type2 is
known as false negative.
2. Type1 is known as false negative and Type2
is known as false positive.
3. Type1 error occurs when we reject a null
hypothesis when it is actually true.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 1 and 3
F) 2 and 3
Solution: (E)
In statistical hypothesis testing, a type I error is the
incorrect rejection of a true null hypothesis (a “false
positive”), while a type II error is incorrectly
retaining a false null hypothesis (a “false negative”).

14) Which of the following is/are one of the


important step(s) to pre-process the text in NLP
based projects?
1. Stemming
2. Stop word removal
3. Object Standardization
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
Solution: (D)
Stemming is a rudimentary rule-based process of
stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from
a word.
Stop words are those words which will have not
relevant to the context of the data for example
is/am/are.
Object Standardization is also one of the good way
to pre-process the text.

15) Suppose you want to project high dimensional


data into lower dimensions. The two most famous
dimensionality reduction algorithms used here are
PCA and t-SNE. Let’s say you have applied both
algorithms respectively on data “X” and you got the
datasets “X_projected_PCA” , “X_projected_tSNE”.
Which of the following statements is true for
“X_projected_PCA” & “X_projected_tSNE” ?
A) X_projected_PCA will have interpretation in the
nearest neighbour space.
B) X_projected_tSNE will have interpretation in the
nearest neighbour space.
C) Both will have interpretation in the nearest
neighbour space.
D) None of them will have interpretation in the
nearest neighbour space.
Solution: (B)
t-SNE algorithm considers nearest neighbour points
to reduce the dimensionality of the data. So, after
using t-SNE we can think that reduced dimensions
will also have interpretation in nearest neighbour
space. But in the case of PCA it is not the case.
Context: 16-17
Given below are three scatter plots for two features
(Image 1, 2 & 3 from left to right).
16) In the above images, which of the following
is/are examples of multi-collinear features?
A) Features in Image 1
B) Features in Image 2
C) Features in Image 3
D) Features in Image 1 & 2
E) Features in Image 2 & 3
F) Features in Image 3 & 1
Solution: (D)
In Image 1, features have high positive correlation
where as in Image 2 has high negative correlation
between the features so in both images pair of
features are the example of multicollinear features.
17) In previous question, suppose you have
identified multi-collinear features. Which of the
following action(s) would you perform next?
1. Remove both collinear variables.
2. Instead of removing both variables, we can
remove only one variable.
3. Removing correlated variables might lead to
loss of information. In order to retain those
variables, we can use penalized regression
models like ridge or lasso regression.
A) Only 1
B)Only 2
C) Only 3
D) Either 1 or 3
E) Either 2 or 3
Solution: (E)
You cannot remove the both features because after
removing the both features you will lose all of the
information so you should either remove the only 1
feature or you can use the regularization algorithm
like L1 and L2.
18) Adding a non-important feature to a linear
regression model may result in.
1. Increase in R-square
2. Decrease in R-square
A) Only 1 is correct
B) Only 2 is correct
C) Either 1 or 2
D) None of these
Solution: (A)
After adding a feature in feature space, whether
that feature is important or unimportant features
the R-squared always increase.

19) Suppose, you are given three variables X, Y and


Z. The Pearson correlation coefficients for (X, Y), (Y,
Z) and (X, Z) are C1, C2 & C3 respectively.
Now, you have added 2 in all values of X (i.enew
values become X+2), subtracted 2 from all values of
Y (i.e. new values are Y-2) and Z remains the same.
The new coefficients for (X,Y), (Y,Z) and (X,Z) are
given by D1, D2 & D3 respectively. How do the
values of D1, D2 & D3 relate to C1, C2 & C3?
A) D1= C1, D2 < C2, D3 > C3
B) D1 = C1, D2 > C2, D3 > C3
C) D1 = C1, D2 > C2, D3 < C3
D) D1 = C1, D2 < C2, D3 < C3
E) D1 = C1, D2 = C2, D3 = C3
F) Cannot be determined
Solution: (E)Correlation between the features won’t
change if you add or subtract a value in the
features.

20) Imagine, you are solving a classification


problems with highly imbalanced class. The majority
class is observed 99% of times in the training data.
Your model has 99% accuracy after taking the
predictions on test data. Which of the following is
true in such a case?
1. Accuracy metric is not a good idea for
imbalanced class problems.
2. Accuracy metric is a good idea for
imbalanced class problems.
3. Precision and recall metrics are good for
imbalanced class problems.
4. Precision and recall metrics aren’t good for
imbalanced class problems.
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Solution: (A)

21) In ensemble learning, you aggregate the


predictions for weak learners, so that an ensemble
of these models will give a better prediction than
prediction of individual models.
Which of the following statements is / are true for
weak learners used in ensemble model?
1. They don’t usually overfit.
2. They have high bias, so they cannot solve
complex learning problems
3. They usually overfit.
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) Only 1
E) Only 2
F) None of the above
Solution: (A)
Weak learners are sure about particular part of a
problem. So, they usually don’t overfit which means
that weak learners have low variance and high bias.

22) Which of the following options is/are true for K-


fold cross-validation?
1. Increase in K will result in higher time
required to cross validate the result.
2. Higher values of K will result in higher
confidence on the cross-validation result as
compared to lower value of K.
3. If K=N, then it is called Leave one out cross
validation, where N is the number of
observations.

A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3
Solution: (D)
Larger k value means less bias towards
overestimating the true expected error (as training
folds will be closer to the total dataset) and higher
running time (as you are getting closer to the limit
case: Leave-One-Out CV). We also need to consider
the variance between the k folds accuracy while
selecting the k.

Question Context 23-24


Cross-validation is an important step in machine
learning for hyper parameter tuning. Let’s say you
are tuning a hyper-parameter “max_depth” for
GBM by selecting it from 10 different depth values
(values are greater than 2) for tree based model
using 5-fold cross validation.
Time taken by an algorithm for training (on a model
with max_depth 2) 4-fold is 10 seconds and for the
prediction on remaining 1-fold is 2 seconds.
Note: Ignore hardware dependencies from the
equation.
23) Which of the following option is true for overall
execution time for 5-fold cross validation with 10
different values of “max_depth”?
A) Less than 100 seconds
B) 100 – 300 seconds
C) 300 – 600 seconds
D) More than or equal to 600 seconds
C) None of the above
E) Can’t estimate
Solution: (D)
Each iteration for depth “2” in 5-fold cross
validation will take 10 secs for training and 2 second
for testing. So, 5 folds will take 12*5 = 60 seconds.
Since we are searching over the 10 depth values so
the algorithm would take 60*10 = 600 seconds. But
training and testing a model on depth greater than
2 will take more time than depth “2” so overall
timing would be greater than 600.

24) In previous question, if you train the same


algorithm for tuning 2 hyper parameters say
“max_depth” and “learning_rate”.
You want to select the right value against
“max_depth” (from given 10 depth values) and
learning rate (from given 5 different learning rates).
In such cases, which of the following will represent
the overall time?
A) 1000-1500 second
B) 1500-3000 Second
C) More than or equal to 3000 Second
D) None of these
Solution: (D)Same as question number 23.
25) Given below is a scenario for training error TE
and Validation error VE for a machine learning
algorithm M1. You want to choose a
hyperparameter (H) based on TE and VE.

Which value of H will you choose based on the


above table?
A) 1
B) 2
C) 3
D) 4
E) 5
Solution: (D)Looking at the table, option D seems
the best
26) What would you do in PCA to get the same
projection as SVD?
A) Transform data to zero mean
B) Transform data to zero median
C) Not possible
D) None of these
Solution: (A)When the data has a zero mean vector
PCA will have same projections as SVD, otherwise
you have to centre the data first before taking SVD.

Question Context 27-28


Assume there is a black box algorithm, which takes
training data with multiple observations (t1, t2,
t3,…….. tn) and a new observation (q1). The black
box outputs the nearest neighbor of q1 (say ti) and
its corresponding class label ci.
You can also think that this black box algorithm is
same as 1-NN (1-nearest neighbor).
27) It is possible to construct a k-NN classification
algorithm based on this black box alone.
Note: Where n (number of training observations) is
very large compared to k.
A) TRUE
B) FALSE
Solution: (A)
In first step, you pass an observation (q1) in the
black box algorithm so this algorithm would return
a nearest observation and its class.
In second step, you through it out nearest
observation from train data and again input the
observation (q1). The black box algorithm will again
return the a nearest observation and it’s class.
You need to repeat this procedure k times

28) Instead of using 1-NN black box we want to use


the j-NN (j>1) algorithm as black box. Which of the
following option is correct for finding k-NN using j-
NN?
1. J must be a proper factor of k
2. J>k
3. Not possible
A) 1
B) 2
C) 3
Solution: (A)Same as question number 27

29) Suppose you are given 7 Scatter plots 1-7 (left to


right) and you want to compare Pearson correlation
coefficients between variables of each scatterplot.
Which of the following is in the right order?

1. 1<2<3<4
2. 1>2>3 > 4
3. 7<6<5<4
4. 7>6>5>4
A) 1 and 3
B) 2 and 3
C) 1 and 4
D) 2 and 4
Solution: (B)
from image 1to 4 correlation is decreasing (absolute
value). But from image 4 to 7 correlation is
increasing but values are negative (for example, 0, -
0.3, -0.7, -0.99).
30) You can evaluate the performance of a binary
class classification problem using different metrics
such as accuracy, log-loss, F-Score. Let’s say, you are
using the log-loss function as evaluation metric.
Which of the following option is / are true for
interpretation of log-loss as an evaluation metric?

1.
If a classifier is confident about an incorrect
classification, then log-loss will penalise it
heavily.
2. For a particular observation, the classifier
assigns a very small probability for the correct
class then the corresponding contribution to the
log-loss will be very large.
3. Lower the log-loss, the better is the model.
A) 1 and 3
B) 2 and 3
C) 1 and 2
D) 1,2 and 3
Solution: (D)Options are self-explanatory.

Context Question 31-32


Below are five samples given in the dataset.

Note: Visual distance between the points in the


image represents the actual distance.

31) Which of the following is leave-one-out cross-


validation accuracy for 3-NN (3-nearest neighbor)?
A) 0
D) 0.4
C) 0.8
D) 1
Solution: (C)
In Leave-One-Out cross validation, we will select (n-
1) observations for training and 1 observation of
validation. Consider each point as a cross validation
point and then find the 3 nearest point to this point.
So if you repeat this procedure for all points you will
get the correct classification for all positive class
given in the above figure but negative class will be
misclassified. Hence you will get 80% accuracy.

32) Which of the following value of K will have least


leave-one-out cross validation accuracy?
A) 1NN
B) 3NN
C) 4NN
D) All have same leave one out error
Solution: (A)Each point which will always be
misclassified in 1-NN which means that you will get
the 0% accuracy.

33) Suppose you are given the below data and you
want to apply a logistic regression model for
classifying it in two given classes.
You are using logistic regression with L1
regularization.

Where C is the
regularization parameter and w1 & w2 are the
coefficients of x1 and x2.
Which of the following option is correct when you
increase the value of C from zero to a very large
value?
A) First w2 becomes zero and then w1 becomes zero
B) First w1 becomes zero and then w2 becomes zero
C) Both becomes zero at the same time
D) Both cannot be zero even after very large value
of C
Solution: (B)
By looking at the image, we see that even on just
using x2, we can efficiently perform classification.
So at first w1 will become 0. As regularization
parameter increases more, w2 will come more and
more closer to 0.
34) Suppose we have a dataset which can be trained
with 100% accuracy with help of a decision tree of
depth 6. Now consider the points below and choose
the option based on these points.
Note: All other hyper parameters are same and
other factors are not affected.
1. Depth 4 will have high bias and low variance
2. Depth 4 will have low bias and low variance
A) Only 1
B) Only 2
C) Both 1 and 2
D) None of the above
Solution: (A)If you fit decision tree of depth 4 in
such data means it will more likely to underfit the
data. So, in case of underfitting you will have high
bias and low variance.
35) Which of the following options can be used to
get global minima in k-Means Algorithm?
1. Try to run algorithm for different centroid
initialization
2. Adjust number of iterations
3. Find out the optimal number of clusters
A) 2 and 3
B) 1 and 3
C) 1 and 2
D) All of above
Solution: (D)All of the option can be tuned to find
the global minima.

36) Imagine you are working on a project which is a


binary classification problem. You trained a model
on training dataset and get the below confusion
matrix on validation dataset.
Based on the above confusion matrix, choose which
option(s) below will give you correct predictions?
1. Accuracy is ~0.91
2. Misclassification rate is ~ 0.91
3. False positive rate is ~0.95
4. True positive rate is ~0.95
A) 1 and 3
B) 2 and 4
C) 1 and 4
D) 2 and 3
Solution: (C)
The Accuracy (correct classification) is (50+100)/165
which is nearly equal to 0.91.
The true Positive Rate is how many times you are
predicting positive class correctly so true positive
rate would be 100/105 = 0.95 also known as
“Sensitivity” or “Recall”
37) For which of the following hyperparameters,
higher value is better for decision tree algorithm?
1. Number of samples used for split
2. Depth of tree
3. Samples for leaf
A)1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
E) Can’t say
Solution: (E)
For all three options A, B and C, it is not necessary
that if you increase the value of parameter the
performance may increase. For example, if we have
a very high value of depth of tree, the resulting tree
may overfit the data, and would not generalize well.
On the other hand, if we have a very low value, the
tree may underfit the data. So, we can’t say for sure
that “higher is better”.
Context 38-39
Imagine, you have a 28 * 28 image and you run a 3 *
3 convolution neural network on it with the input
depth of 3 and output depth of 8.
Note: Stride is 1 and you are using same padding.
38) What is the dimension of output feature map
when you are using the given parameters.
A) 28 width, 28 height and 8 depth
B) 13 width, 13 height and 8 depth
C) 28 width, 13 height and 8 depth
D) 13 width, 28 height and 8 depth
Solution: (A)The formula for calculating output size
is
output size = (N – F)/S + 1
where, N is input size, F is filter size and S is stride.
Read this article to get a better understanding.

39) What is the dimensions of output feature map


when you are using following parameters.
A) 28 width, 28 height and 8 depth
B) 13 width, 13 height and 8 depth
C) 28 width, 13 height and 8 depth
D) 13 width, 28 height and 8 depth
Solution: (B)Same as above

40) Suppose, we were plotting the visualization for


different values of C (Penalty parameter) in SVM
algorithm. Due to some reason, we forgot to tag the
C values with visualizations. In that case, which of
the following option best explains the C values for
the images below (1,2,3 left to right, so C values are
C1 for image1, C2 for image2 and C3 for image3 ) in
case of rbf kernel.

A) C1 = C2 = C3
B) C1 > C2 > C3
C) C1 < C2 < C3
D) None of these
Solution: (C)
Penalty parameter C of the error term. It also
controls the trade-off between smooth decision
boundary and classifying the training points
correctly. For large values of C, the optimization will
choose a smaller-margin hyperplane.

MCQ

1. What is true about Machine Learning?


• Machine Learning (ML) is that field of
computer science
• ML is a type of artificial intelligence that
extract patterns out of raw data by using an
algorithm or method
• The main focus of ML is to allow computer
systems learn from experience without being
explicitly programmed or human intervention.
• All of the above
Correct Answer: All of the above

2. ML is a field of AI consisting of learning


algorithms that?
• Improve their performance
• At executing some task
• Over time with experience
• All of the above

Correct Answer: At executing some task

3. The action _______ of a robot arm specify to


Place block A on block B
• STACK(A,B)
• LIST(A,B)
• QUEUE(A,B)
• ARRAY(A,B)

Correct Answer: STACK(A,B)


4. p ? 0q is not a?
• hack clause
• horn clause
• structural clause
• system clause

Correct Answer: horn clause

5. A__________ begins by hypothesizing a sentence


(the symbol S) and successively predicting lower
level constituents until individual preterminal
symbols are written.
• bottow-up parser
• top parser
• top-down parser
• bottom parser

Correct Answer: top-down parser


6. Different learning methods does not include?
• Introduction
• Analogy
• Deduction
• Memorization

Correct Answer: Introduction

7. A model of language consists of the categories


which does not include ________.
• System Unit
• structural units.
• data units
• empirical units

Correct Answer: structural units.

8. Which of the following is a widely used and


effective machine learning algorithm based on the
idea of bagging?
• Decision Tree
• Regression
• Classification
• Random Forest

Correct Answer: Random Forest

9. Which of the following are ML methods?


• based on human supervision
• supervised Learning
• semi-reinforcement Learning
• All of the above

Correct Answer: based on human supervision

10. To find the minimum or the maximum of a


function, we set the gradient to zero because:
• The value of the gradient at extrema of a
function is always zero
• Depends on the type of problem
• Both A and B
• None of the above

Correct Answer: The value of the gradient at


extrema of a function is always zero

11. The model will be trained with data in one single


batch is known as ?
• Batch learning
• Offline learning
• Both A and B
• None of the above

Correct Answer: Both A and B

12. In Model based learning methods, an iterative


process takes place on the ML models that are built
based on various model parameters, called ?
• mini-batches
• optimizedparameters
• hyperparameters
• superparameters

Correct Answer: hyperparameters

13. Which of the following statements about


regularization is not correct?
• Using too large a value of lambda can cause
your hypothesis to underfit the data.
• Using too large a value of lambda can cause
your hypothesis to overfit the data
• Using a very large value of lambda cannot
hurt the performance of your hypothesis
• None of the above

Correct Answer: None of the above


14. How do you handle missing or corrupted data in
a dataset?
• Drop missing rows or columns
• Replace missing values with
mean/median/mode
• Assign a unique category to missing values
• All of the above

Correct Answer: All of the above

15. When performing regression or classification,


which of the following is the correct way to
preprocess the data?
• Normalize the data -> PCA -> training
• PCA -> normalize PCA output -> training
• Normalize the data -> PCA -> normalize PCA
output -> training
• None of the above
Correct Answer: Normalize the data -> PCA ->
training

16. Which of the following is a disadvantage of


decision trees?
• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
• None of the above

Correct Answer: Decision trees are prone to be


overfit

17. Which of the following is a reasonable way to


select the number of principal components "k"?
• Choose k to be the smallest value so that at
least 99% of the varinace is retained
• Choose k to be 99% of m (k = 0.99*m,
rounded to the nearest integer).
• Choose k to be the largest value so that 99%
of the variance is retained.
• Use the elbow method.

Correct Answer: Choose k to be the smallest value


so that at least 99% of the varinace is retained

18. High entropy means that the partitions in


classification are
• pure
• not pure
• useful
• useless

Correct Answer: not pure

19. What is a sentence parser typically used for?


• It is used to parse sentences to check if they
are utf-8 compliant.
• It is used to parse sentences to derive their
most likely syntax tree structures.
• It is used to parse sentences to assign POS
tags to all tokens.
• It is used to check if sentences can be parsed
into meaningful tokens.

Correct Answer: It is used to parse sentences to


derive their most likely syntax tree structures.

20. Which of the following techniques can not be


used for normalization in text mining?
• Stemming
• Lemmatization
• Stop Word Removal
• None of the above

Correct Answer: Stop Word Removal


21. Which of the following is NOT supervised
learning?
• PCA
• Decision Tree
• Linear Regression
• Naive Bayesian

Correct Answer: PCA

22. Suppose we would like to perform clustering on


spatial data such as the geometrical locations of
houses. We wish to produce clusters of many
different sizes and shapes. Which of the following
methods is the most appropriate?
• Decision Trees
• Density-based clustering
• Model-based clustering
• K-means clustering
Correct Answer: Density-based clustering

23. What is the purpose of performing cross-


validation?
• To assess the predictive performance of the
models
• To judge how the trained model performs
outside the sample on test data
• both 1 and 2

Correct Answer: To assess the predictive


performance of the models

25. How do you handle missing or corrupted data in


a dataset?
• Drop missing rows or columns
• Replace missing values with
mean/median/mode
• Assign a unique category to missing values
• All of the above -
Correct Answer: All of the above

1. What is machine learning?

• The selective acquisition of knowledge through


the use of manual programs
• The selective acquisition of knowledge through
the use of computer programs
• The autonomous acquisition of knowledge
through the use of manual programs
• The autonomous acquisition of knowledge
through the use of computer programs
View Answer
The autonomous acquisition of knowledge through
the use of computer programs

2. Machine Learning is a field of AI consisting of


learning algorithms that ..............
• At executing some task
• Over time with experience
• Improve their performance
• All of the above
View Answer
All of the above

3. .............. is a widely used and effective machine


learning algorithm based on the idea of bagging.

• Regression
• Classification
• Decision Tree
• Random Forest
View Answer
Random Forest

4. What is the disadvantage of decision trees?


• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
• All of the above
View Answer
Decision trees are prone to be overfit

5. ................. is a widely used and effective machine


learning algorithm based on the idea of bagging.

• Regression
• Classification
• Random Forest
• Decision Tree
View Answer
Random Forest

6. How can you handle missing or corrupted data in


a dataset?
• Drop missing rows or columns
• Assign a unique category to missing values
• Replace missing values with
mean/median/mode
• All of the above
View Answer
All of the above

7. Which of the followings are most widely used


metrics and tools to assess a classification model?

• Confusion matrix
• Cost-sensitive accuracy
• Area under the ROC curve
• All of the above
View Answer
All of the above
8. Machine learning algorithms build a model based
on sample data, known as .................

• Training Data
• Transfer Data
• Data Training
• None of the above
View Answer
Training Data

9. Machine learning is a subset of ................

• Deep Learning
• Artificial Intelligence
• Data Learining
• None of the above
View Answer
Artificial Intelligence
10. A Machine Learning technique that helps in
detecting the outliers in data.

• Clustering
• Classification
• Anamoly Detection
• All of the above
View Answer
Anamoly Detection

11. Who is the father of Machine Learning?

• Geoffrey Hill
• Geoffrey Chaucer
• Geoffrey Everest Hinton
• None of the above
View Answer
Geoffrey Everest Hinton
12. What is the most significant phase in a genetic
algorithm?

• Selection
• Mutation
• Crossover
• Fitness function
View Answer
Crossover

13. Which one in the following is not Machine


Learning disciplines?

• Physics
• Information Theory
• Neurostatistics
• Optimization Control
View Answer
Neurostatistics
14. Machine Learning has various function
representation, which of the following is not
function of symbolic?

• Decision Trees
• Rules in propotional Logic
• Rules in first-order predicate logic
• Hidden-Markov Models (HMM)
View Answer
Hidden-Markov Models (HMM)

15. ................... algorithms enable the computers to


learn from data, and even improve themselves,
without being explicitly programmed.

• Deep Learning
• Machine Learning
• Artificial Intelligence
• None of the above
View Answer
Machine Learning

16. What are the three types of Machine Learning?

• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• All of the above
View Answer
All of the above

17. Which of the following is not a supervised


learning?

• PCA
• Naive Bayesian
• Linear Regression
• Decision Tree Answer
View Answer
PCA

18. Real-Time decisions, Game AI, Learning Tasks,


Skill acquisition, and Robot Navigation are
applications of .............

• Reinforcement Learning
• Supervised Learning: Classification
• Unsupervised Learning: Regression
• None of the above
View Answer
Reinforcement Learning

19. Which of the following is not numerical


functions in the various function representation of
Machine Learning?

• Case-based
• Neural Network
• Linear Regression
• Support Vector Machines
View Answer
Case-based

20. Common classes of problems in machine


learning is ..............

• Clustering
• Regression
• Classification
• All of the above
View Answer
All of the above

You might also like