100% found this document useful (2 votes)
263 views57 pages

RCS-080 Machine Learning MCQs

This document outlines the units and topics covered in the RCS-080 Machine Learning course. The 5 units cover introduction to machine learning concepts; decision trees and artificial neural networks; evaluating hypotheses and Bayesian learning; computational learning theory and instance-based learning; and genetic algorithms, rule learning, and reinforcement learning. The document also includes 35 multiple choice questions related to machine learning concepts.

Uploaded by

Utkarsh Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
263 views57 pages

RCS-080 Machine Learning MCQs

This document outlines the units and topics covered in the RCS-080 Machine Learning course. The 5 units cover introduction to machine learning concepts; decision trees and artificial neural networks; evaluating hypotheses and Bayesian learning; computational learning theory and instance-based learning; and genetic algorithms, rule learning, and reinforcement learning. The document also includes 35 multiple choice questions related to machine learning concepts.

Uploaded by

Utkarsh Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

RCS-080 Machine Learning

MCQ Questions

Unit I INTRODUCTION – Well defined learning problems, Designing a


Learning System, Issues in Machine Learning; THE CONCEPT LEARNING
TASK - General-to-specific ordering of hypotheses, Find-S, List then eliminate
algorithm, Candidate elimination algorithm, Inductive bias

Unit II DECISION TREE LEARNING - Decision tree learning algorithm-


Inductive bias- Issues in Decision tree learning; ARTIFICIAL NEURAL
NETWORKS – Perceptrons, Gradient descent and the Delta rule, Adaline,
Multilayer networks, Derivation of backpropagation rule Backpropagation
AlgorithmConvergence, Generalization;

Unit III Evaluating Hypotheses: Estimating Hypotheses Accuracy, Basics of


sampling Theory, Comparing Learning Algorithms; Bayesian Learning: Bayes
theorem, Concept learning, Bayes Optimal Classifier, Naïve Bayes classifier,
Bayesian belief networks, EM algorithm;

Unit IV Computational Learning Theory: Sample Complexity for Finite


Hypothesis spaces, Sample Complexity for Infinite Hypothesis spaces, The
Mistake Bound Model of Learning; INSTANCE-BASED LEARNING – k-
Nearest Neighbour Learning, Locally Weighted Regression, Radial basis
function networks, Case-based learning

Unit V Genetic Algorithms: an illustrative example, Hypothesis space search,


Genetic Programming, Models of Evolution and Learning; Learning first order
rules-sequential covering algorithmsGeneral to specific beam search-FOIL;
REINFORCEMENT LEARNING - The Learning Task, Q Learning.
1. What is Machine learning?
a) The autonomous acquisition of knowledge through the use of
computer programs
b) The autonomous acquisition of knowledge through the use of manual
programs
c) The selective acquisition of knowledge through the use of computer
programs
d) The selective acquisition of knowledge through the use of manual
programs

2. Which of the factors affect the performance of learner system does not
include?
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures

3. Different learning methods does not include?


a) Memorization
b) Analogy
c) Deduction
d) Introduction

4. In language understanding, the levels of knowledge that does not


include?
a) Phonological
b) Syntactic
c) Empirical
d) Logical

5. A model of language consists of the categories which does not include?


a) Language units
b) Role structure of units
c) System constraints
d) Structural units
6. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively
predicting lower level constituents until individual preterminal symbols are
written
b) Begins by hypothesizing a sentence (the symbol S) and successively
predicting upper level constituents until individual preterminal symbols are
written
c) Begins by hypothesizing lower level constituents and successively predicting
a sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting
a sentence (the symbol S)

7. Among the following which is not a horn clause?


a) p
b) Øp V q
c) p → q
d) p → Øq

8. The action „STACK(A, B)‟ of a robot arm specify to _______________


a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B

9. The process of forming general concept definitions from examples of


concepts to be learned.
A. Deduction
B. abduction
C. induction
D. conjunction

10. Computers are best at learning


A. facts.
B. concepts.
C. procedures.
D. principles.
11. Data used to build an ML model.
A. validation data
B. training data
C. test data
D. hidden data

12. Supervised learning and unsupervised clustering both require at least one
A. hidden attribute.
B. output attribute.
C. input attribute.
D. categorical attribute.

13. Supervised learning differs from unsupervised clustering in that supervised


learning requires
A. at least one input attribute.
B. input attributes to be categorical.
C. at least one output attribute.
D. ouput attriubutes to be categorical.

14. A regression model in which more than one independent variable is used to
predict the dependent variable is called
A. a simple linear regression model
B. a multiple regression models
C. an independent model
D. none of the above

15. A term used to describe the case when the independent variables in a
multiple regression model are correlated is
A. regression
B. correlation
C. multicollinearity
D. none of the above

16. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1


increases by 1 unit (holding x2 constant), y will
A. increase by 3 units
B. decrease by 3 units
C. increase by 4 units
D. decrease by 4 units

17. A multiple regression model has


A. only one independent variable
B. more than one dependent variable
C. more than one independent variable
D. none of the above

18. A measure of goodness of fit for the estimated regression equation is the
A. multiple coefficient of determination
B. mean square due to error
C. mean square due to regression
D. none of the above

19. The adjusted multiple coefficient of determination accounts for


A. the number of dependent variables in the model
B. the number of independent variables in the model
C. unusually large predictors
D. none of the above

20. A nearest neighbor approach is best used


A. with large-sized datasets.
B. when irrelevant attributes have been removed from the data.
C. when a generalized model of the data is desireable.
D. when an explanation of what has been found is of primary importance.

21. 16. Another name for an output attribute.


A. predictive variable
B. independent variable
C. estimated variable
D. dependent variable

22. Classification problems are distinguished from estimation problems in that


A. classification problems require the output attribute to be numeric.
B. classification problems require the output attribute to be categorical.
C. classification problems do not allow an output attribute.
D. classification problems are designed to predict future outcome.

23. Which statement is true about prediction problems?


A. The output attribute must be categorical.
B. The output attribute must be numeric.
C. The resultant model is designed to determine future outcomes.
D. The resultant model is designed to classify current behavior.

24. Which statement about outliers is true?


A. Outliers should be identified and removed from a dataset.
B. Outliers should be part of the training dataset but should not be present in the
test data.
C. Outliers should be part of the test dataset but should not be present in the
training data.
D. The nature of the problem determines how outliers are used.
E. More than one of a,b,c or d is true.

25. Which statement is true about neural network and linear regression models?
A. Both models require input attributes to be numeric.
B. Both models require numeric attributes to range between 0 and 1.
C. The output of both models is a categorical attribute value.
D. Both techniques build models whose output is determined by a linear sum of
weighted input attribute values.
E. More than one of a,b,c or d is true.

26. Which of the following is a common use of unsupervised clustering?


A. detect outliers
B. determine a best set of input attributes for supervised learning
C. evaluate the likely performance of a supervised learner model
D. determine if meaningful relationships can be found in a dataset
E. All of a,b,c, and d are common uses of unsupervised clustering.

27. The average positive difference between computed and desired outcome
values.
A. root mean squared error
B. mean squared error
C. mean absolute error
D. mean positive error

28. Selecting data so as to assure that each class is properly represented in both
the training and test set.
A. cross validation
B. stratification
C. verification
D. bootstrapping

29. The standard error is defined as the square root of this computation.
A. The sample variance divided by the total number of sample instances.
B. The population variance divided by the total number of sample instances.
C. The sample variance divided by the sample mean.
D. The population variance divided by the sample mean.

30. Data used to optimize the Data used to optimize the parameter settings of a s
parameter settings of a supervised learner model. unsupervised learner model.
A. training
B. test
C. verification
D. validation

31. Bootstrapping allows us to


A. choose the same training instance several times.
B. choose the same test set instance several times.
C. build models with alternative subsets of the training data several times.
D. test a model with alternative subsets of the test data several times.

32. The correlation between the number of years an employee has worked for a
company and the salary of the employee is 0.75. What can be said about
employee salary and years worked?
A. There is no relationship between salary and years worked.
B. Individuals that have worked for the company the longest have higher
salaries.
C. Individuals that have worked for the company the longest have lower
salaries.
D. The majority of employees have been with the company a long time.
E. The majority of employees have been with the company a short period of
time.

33. The correlation coefficient for two real-valued attributes is –0.85. What does
this value tell you?
A. The attributes are not linearly related.
B. As the value of one attribute increases the value of the second attribute also
increases.
C. As the value of one attribute decreases the value of the second attribute
increases.
D. The attributes show a curvilinear relationship.

34. The average squared difference between classifier predicted output and
actual output.
A. mean squared error
B. root mean squared error
C. mean absolute error
D. mean relative error

35. Simple regression assumes a _____ relationship between the input and
output attribute.
A. linear
B. quadratic
C. reciprocal
D. inverse

36. Regression trees are often used to model _______ data.


A. linear
B. nonlinear
C. categorical
D. symmetrical

37. The leaf nodes of a model tree are


A. averages of numeric output attribute values.
B. nonlinear regression equations.
C. linear regression equations.
D. sums of numeric output attribute values.
38. Logistic regression is a ________ regression technique that is used to model
data having a _____ outcome.
A. linear, numeric
B. linear, binary
C. nonlinear, numeric
D. nonlinear, binary

39. This technique associates a conditional probability value with each data
instance.
A. linear regression
B. logistic regression
C. simple regression
D. multiple linear regression

40. This supervised learning technique can process both numeric and
categorical input attributes.
A. linear regression
B. Bayes classifier
C. logistic regression
D. backpropagation learning

41. With Bayes classifier, missing data items are


A. treated as equal compares.
B. treated as unequal compares.
C. replaced with a default value.
D. ignored.

42. This clustering algorithm merges and splits nodes to help modify
nonoptimal partitions.
A. agglomerative clustering
B. expectation maximization
C. conceptual clustering
D. K-Means clustering

43. This clustering algorithm initially assumes that each data instance represents
a single cluster.
A. agglomerative clustering
B. conceptual clustering
C. K-Means clustering
D. expectation maximization

44. This unsupervised clustering algorithm terminates when mean values


computed for the current iteration of the algorithm are identical to the computed
mean values for the previous iteration.
A. agglomerative clustering
B. conceptual clustering
C. K-Means clustering
D. expectation maximization

45. Machine learning techniques differ from statistical techniques in that


machine learning methods
A. typically assume an underlying distribution for the data.
B. are better able to deal with missing and noisy data.
C. are not able to explain their behavior.
D. have trouble with large-sized datasets.

46. A _________ is a decision support tool that uses a tree-like graph or model
of decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks

47. Decision Tree is a display of an algorithm.


a) True
b) False

48. Decision Tree is


a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch
representsoutcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an
attribute, each branch represents outcome of test and each leaf node
represents class label
d) None of the mentioned

49. Decision Trees can be used for Classification Tasks.


a) True
b) False

50. Choose from the following that are Decision Tree nodes
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned

51. Decision Nodes are represented by ____________


a) Disks
b) Squares
c) Circles
d) Triangles

52. Chance Nodes are represented by,


a) Disks
b) Squares
c) Circles
d) Triangles

53. End Nodes are represented by __________


a) Disks
b) Squares
c) Circles
d) Triangles

54. Following are the advantage/s of Decision Trees. Choose that apply.
a) Possible Scenarios can be added
b) Use a white box model, if given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned

55. A perceptron is:


a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback

56. An auto-associative network is:


a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing

57. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear
with the constant of proportionality being equal to 2. The inputs are 4, 10, 5 and
20 respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
Explanation: The output is found by multiplying the weights with their
respective inputs, summing the results and multiplying with the transfer
function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.

58. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high
„computational‟ rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
59. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights
to be adjusted so that the network can learn
d) None of the mentioned

60. Which of the following is not the promise of artificial neural network?
a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise

61. Neural Networks are complex ______________ with many parameters.


a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions

62. A perceptron adds up all the weighted inputs it receives, and if it exceeds a
certain value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can‟t say

63. The network that involves backward links from output to the input and
hidden layers is called as ____
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron

64. Which of the following is an application of NN (Neural Network)?


a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned

65. The process by which you become aware of messages through your sense is
called
a) Organization
b) Sensation
c) Interpretation-Evaluation
d) Perception

66. Mindless processing is


a) careful, critical thinking
b) inaccurate and faulty processing
c) information processing that relies heavily on familiar schemata
d) processing that focuses on unusual or novel events

67. Selective retention occurs when


a) we process, store, and retrieve information that we have already
selected, organized, and interpreted
b) we make choices to experience particular stimuli
c) we make choices to avoid particular stimuli
d) we focus on specific stimuli while ignoring other stimuli

68. Which of the following strategies would NOT be effective at improving


your communication competence?
a) Recognize the people, objects, and situations remain stable over time
b) Recognize that each person‟s frame of perception is unique
c) Be active in perceiving
d) Distinguish facts from inference

69. A perception check is


a) a cognitive bias that makes us listen only to information we already agree
with
b) a method teachers use to reward good listeners in the classroom
c) any factor that gets in the way of good listening and decreases our ability to
interpret correctly
d) a response that allows you to state your interpretation and ask your
partner whether or not that interpretation is correct
70. The process of forming general concept definitions from examples of from
examples of concepts concepts to be learned.
A. Deduction
B. abduction
C. induction
D. conjunction

71. Computers are best at learning


A. facts.
B. concepts.
C. procedures.
D. principles.

72. Supervised learning and unsupervised clustering both require at both require
at least one least one
A. hidden attribute.
B. output attribute.
C. input attribute.
D. categorical attribute

73. Supervised learning differs from unsupervised clustering in that supervised


learning requires
A. at least one input attribute.
B. input attributes to be categorical.
C. at least one output attribute.
D. ouput attriubutes to be categorical.

74. Classification problems are distinguished from estimation problems in that


A.classification problems require the output attribute to be numeric.
B.classification problems require the output attribute to be categorical.
C.classification problems do not allow an output attribute.
D.classification problems are designed to predict future outcome.

75. Which statement is true about prediction problems?


A.The output attribute must be categorical.
B.The output attribute must be numeric.
C.The resultant model is designed to determine future outcomes.
D. The resultant model is designed to classify current behavior .

76. Which statement about outliers is true?


A.Outliers should be identified and removed from a dataset.
B.Outliers should be part of the training dataset but should not be present in the
test data.
C.Outliers should be part of the test dataset but should not be present in the
training data.
D.The nature of the problem determines how outliers are used.
E.More than one of a,b,c or d is true

77. The average positive difference between computed and desired outcome
values.
A.root mean squared error
B.mean squared error
C.mean absolute error
D.mean positive error

78. Selecting data so as to assure that each class is properly represented in both
the training and test set.
A.cross validation
B.stratification
C.verification
D. bootstrapping

79. The standard error is defined as the square root of this computat
computation.
A.The sample variance divided by the total number of sample instances.
B.The population variance divided by the total number of sample instances.
C.The sample variance divided by the sample mean.
D.The population variance divided by the sample mean.

80. Data used Data used to optimize to optimize the parameter settings of
parameter settings of a supervised learner supervised learner model.
A.training
B.test
C.verification
D.validation

81. Bootstrapping allows us to


A.choose the same training instance several times.
B.choose the same test set instance several times.
C. build models with alternative subsets of the training data several times.
D.test a model with alternative subsets of the test data several times.

82. The correlation coefficient for two real-valued attributes is – 0.85. What
does this value tell you?
A.The attributes are not linearly related.
B.As the value of one attribute increases the value of the second attribute also
increases.
C.As the value of one attribute decreases the value of the second attribute
increases.
D.The attributes show a curvilinear relationship.

83. The average squared difference between classifier predicted output and
actual output.
A.mean squared error
B.root mean squared error
C.mean absolute error
D.mean relative error

84. With Bayes classifier, missing data items are


A.treated as equal compares.
B.treated as unequal compares.
C.replaced with a default value.
D.ignored.

85. A statement about a population developed for the purpose of testing is rpose
of testing is called:
(a) Hypothesis
(b) Hypothesis testing
(c) Level of significance
(d) Test-statistic
86. Any hypothesis which is tested for the purpose of rejection under the
assumption that it is true is called:
(a) Null hypothesis
(b) Alternative hypothesis
(c) Statistical hypothesis
(d) Composite hypothesis

87. A statement about the value of a population parameter is called:


(a) Null hypothesis
(b) Alternative hypothesis
(c) Simple hypothesis
(d) Composite hypothesis

88. Any statement whose validity is tested on the basis of a sample is called:
(a) Null hypothesis
(b) Alternative hypothesis
(c) Statistical hypothesis
(d) Simple hypothesis

89. A quantitative statement about a population is called:


(a) Research hypothesis
(b) Composite hypothesis
(c) Simple hypothesis
(d) Statistical hypothesis

90. A statement that is accepted if the sample data provide sufficient evidence
that the null hypothesis is false is called:
(a) Simple hypothesis
(b) Composite hypothesis
(c) Statistical hypothesis
(d) Alternative hypothesis

91. A hypothesis that specifies all the values of parameter is called:


(a) Simple hypothesis
(b) Composite hypothesis
(c) Statistical hypothesis
(d) None of the above
92. A hypothesis may be classified as:
(a) Simple
(b) Composite
(c) Null
(d) All of the above

93. The probability of rejecting the null hypothesis when it is true is called:
(a) Level of confidence
(b) Level of significance
(c) Power of the test
(d) Difficult to tell

94. The dividing point between the region where the null hypothesis is rejected
and the region where it is not rejected is said to be:
(a) Critical region
(b) Critical value
(c) Acceptance region
(d) Significant region

95. If the critical region is located equally in both sides of the sampling
distribution of test-statistic, the test is called:
(a) One tailed
(b) Two tailed
(c) Right tailed
(d) Left tailed

96. A rule or formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
(b) Population statistic
(c) Both of these
(d) None of the above

97. Critical region is also called:


(a)Acceptance region
(b) Rejection region
(c) Confidence region
(d) Statistical region
98. The probability of rejecting Ho when it is false is called:
(a) Power of the test
(b) Size of the test
(c) Level of confidence
(d) Confidence coefficient

99. Suppose you are given an EM algorithm that finds maximum likelihood
estimates for a model with latent variables. You are asked to modify the
algorithm so that it finds MAP estimates instead. Which step or steps do you
need to modify:
A. Expectation
B. Maximization
C. No modification necessary
D. Both

100. A regression model in which more than one independent variable is used to
predict the dependentvariable is called
A.a simple linear regression model
B.a multiple regression models
C.an independent model
D.none of the above

101. A term used to describe the case when the independent variables in a
multiple regression model arecorrelated is
A.regression
B.correlation
C.multicollinearity
D.none of the above

102. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 + 4x2.
As x1 increases by 1 unit x1 increases by 1 unit(holding x2constant), y will
A.increase by 3 units
B.decrease by 3 units
C.increase by 4 units
D.decrease by 4 units
103. A multiple regression model has
A.only one independent variable
B.more than one dependent variable
C.more than one independent variable
D.none of the above

104. A measure of goodness of fit for the estimated regression equation is the
A.multiple coefficient of determination
B.mean square due to error
C.mean square due to regression
D.none of the above

105. The adjusted multiple coefficient of determination


A. accounts for the number of dependent variables in the model
B.the number of independent variables in the model
C.unusually large predictors
D.none of the above

106. The multiple coefficient of determination is The multiple coefficient of


determination is computed by
A.dividing SSR by SST
B.dividing SST by SSR
C.dividing SST by SSE
D.none

107. Which statement is true about neural network and linear regression
models?
A.Both models require input attributes to be numeric.
B.Both models require numeric attributes to range between 0 and 1.
C.The output of both models is a categorical attribute value.
D.Both techniques build models whose output is determined by a linear sum of
weighted input attribute values.
E.More than one of a,b,c or d is true.
108. Simple regression assumes regression assumes a __________ relationship
between the input attribute and output attribute.
A.linear
B.quadratic
C.reciprocal
D.inverse

109. Regression trees are often used to model _______ data.


A.linear
B.nonlinear
C.categorical
D.symmetrical

110. The leaf nodes of a The leaf nodes of a model tree are model tree are
A.averages of numeric output attribute values.
B.nonlinear regression equations.
C.linear regression equations.
D.sums of numeric output attribute values.

111. Logistic regression is a ________ regression technique that is used to


model data having _____outcome.
A.linear, numeric
B.linear, binary
C.nonlinear, numeric
D.nonlinear, binary

112. This technique associates a conditional probability value with value with
each data instance.
A.linear regression
B.logistic regression
C.simple regression
D.multiple linear regression

113. This supervised learning technique can process both numeric and
categorical input attributes.
A.linear regression
B.Bayes classifier
C.logistic regression
D. backpropagation learning

114. This clustering algorithm merges and splits nodes to help modify non-
optimal partitions.
A.agglomerative clustering
B.expectation maximization
C.conceptual clustering
D.K-Means clustering

115. This clustering algorithm initially assumes that each data instance
represents a single cluster.
A.agglomerative clustering
B.conceptual clustering
C.K-Means clustering
D.expectation maximization

116. This unsupervised clustering algorithm terminates when mean values


computed for the current iteration of the algorithm are identical to the computed
mean values for the previous iteration.
A.agglomerative clustering
B.conceptual clustering
C.K-Means clustering
D.expectation maximization

117. Machine learning techniques differ from statistical techniques in that ML


methods
A.typically assume an underlying distribution for the data.
B.are better able to deal with missing and noisy data.
C.are not able to explain their behavior.
D.have trouble with large-sized datasets

118. We can get multiple local optimum solutions if we solve a linear regression
regression problem by minimizing the sum of squared errors using gradient
descent.
A. True
B. False
119. When the feature space is larger, over fitting is more likely.
A. True
B. False

120. We can use gradient descent to learn a Gaussian Mixture Model.


A. True
B. False

121. As the number of training examples goes to infinity, your model trained on
that data will have:
A. Lower variance
B. Higher variance
C. Same variance

122. As the number of training examples goes to infinity, your model trained on
that data will have:
A. Lower bias
B. Higher bias
C. Same bias

123. What are the issues on which biological networks proves to be superior
than AI networks?
a) robustness & fault tolerance
b) flexibility
c) collective computation
d) all of the mentioned

124. The fundamental unit of network is


a) brain
b) nucleus
c) neuron
d) axon

125. What are dendrites?


a) fibers of nerves
b) nuclear projections
c) other name for nucleus
d) none of the mentioned

126. Function of dendrites is?


a) receptors
b) transmitter
c) both receptor & transmitter
d) none of the mentioned

127. What is purpose of Axon?


a) receptors
b) transmitter
c) transmission
d) none of the mentioned

128. Why do we need biological neural networks?


a) to solve tasks like machine vision & natural language processing
b) to apply heuristic search methods to find solutions of problem
c) to make smart human interactive & user friendly system
d) all of the mentioned

129. What is unsupervised learning?


a) features of group explicitly stated
b) number of groups may be known
c) neither feature & nor number of groups is known
d) none of the mentioned

130. Example of a unsupervised feature map?


a) text recognition
b) voice recognition
c) image recognition
d) none of the mentioned

131. What is plasticity in neural networks?


a) input pattern keeps on changing
b) input pattern has become static
c) output pattern keeps on changing
d) output is static
132. Who proposed the first perceptron model in 1958?
a) McCulloch-pitts
b) Marvin Minsky
c) Hopfield
d) Rosenblatt

133. Which action is faster pattern classification or adjustment of weights in


neural nets?
a) pattern classification
b) adjustment of weights
c) equal
d) either of them can be fast, depending on conditions

134. What is the feature of ANNs due to which they can deal with noisy, fuzzy,
inconsistent data?
a) associative nature of networks
b) distributive nature of networks
c) both associative & distributive
d) none of the mentioned

135. Operations in the neural networks can perform what kind of operations?
a) serial
b) parallel
c) serial or parallel
d) none of the mentioned

136. What is an activation value?


a) weighted sum of inputs
b) threshold value
c) main input to neuron
d) none of the mentioned

137. Positive sign of weight indicates?


a) excitatory input
b) inhibitory input
c) can be either excitatory or inhibitory as such
d) none of the mentioned
138. Negative sign of weight indicates?
a) excitatory input
b) inhibitory input
c) excitatory output
d) inhibitory output

139. The amount of output of one unit received by another unit depends on
what?
a) output unit
b) input unit
c) activation value
d) weight

140. The process of adjusting the weight is known as?


a) activation
b) synchronisation
c) learning
d) none of the mentioned

141. The procedure to incrementally update each of weights in neural is referred


to as?
a) synchronisation
b) learning law
c) learning algorithm
d) both learning algorithm & law

142. In what ways can output be determined from activation value?


a) deterministically
b) stochastically
c) both deterministically & stochastically
d) none of the mentioned

143. How can output be updated in neural network?


a) synchronously
b) asynchronously
c) both synchronously & asynchronously
d) none of the mentioned
144. What is asynchronous update in neural netwks?
a) output units are updated sequentially
b) output units are updated in parallel fashion
c) can be either sequentially or in parallel fashion
d) none of the mentioned

145. Who invented perceptron neural networks?


a) McCullocch-pitts
b) Widrow
c) Minsky & papert
d) Rosenblatt

146. Delta learning is of unsupervised type?


a) yes
b) no

147. On what parameters can change in weight vector depend?


a) learning parameters
b) input vector
c) learning signal
d) all of the mentioned

148. Activation models are?


a) dynamic
b) static
c) deterministic
d) none of the mentioned

149. What is supervised learning?


a) weight adjustment based on deviation of desired output from actual
output
b) weight adjustment based on desired output only
c) weight adjustment based on actual output only
d) none of the mentioned
150. Supervised learning may be used for?
a) temporal learning
b) structural learning
c) both temporal & structural learning
d) none of the mentioned

151. What is structural learning?


a) concerned with capturing input-output relationship in patterns
b) concerned with capturing weight relationships
c) both weight & input-output relationships
d) none of the mentioned

152. What is temporal learning?


a) concerned with capturing input-output relationship in patterns
b) concerned with capturing weight relationships
c) both weight & input-output relationships
d) none of the mentioned

153. What is unsupervised learning?


a) weight adjustment based on deviation of desired output from actual output
b) weight adjustment based on desired output only
c) weight adjustment based on local information available to weights
d) none of the mentioned

154. Learning methods can only be online?


a) yes
b) no

155. What is reinforcement learning?


a) learning is based on evaluative signal
b) learning is based o desired output for an input
c) learning is based on both desired output & evaluative signal
d) none of the mentioned

156. Reinforcement learning is also known as learning with critic?


a) yes
b) no

157. How many types of reinforcement learning exist?


a) 2
b) 3
c) 4
d) 5

158. Convergence refers to adjustment in behaviour of weights during learning.


a) yes
b) no

159. Feedforward networks are used for?


a) pattern mapping
b) pattern association
c) pattern classification
d) all of the mentioned

160. Feedback networks are used for?


a) auto association
b) pattern storage
c) both auto association & pattern storage
d) none of the mentioned

161. The simplest combination network is called competitive learning network?


a) yes
b) no

162. Competitive learning net is used for?


a) pattern grouping
b) pattern storage
c) pattern grouping or storage
d) none of the mentioned

163. Feedback connection strength are usually?


a) fixed
b) variable
c) both fixed or variable type
d) none of the mentioned

164. Feedforward network are used for pattern storage?


a) yes
b) no

165. If some of output patterns in pattern association problem are identical then
problem shifts to?
a) pattern storage problem
b) pattern classification problem
c) pattern mapping problem
d) none of the mentioned

166. The network for pattern mapping is expected to perform?


a) pattern storage
b) pattern classification
c) genaralization
d) none of the mentioned

167. In case of autoassociation by feedback nets in pattern recognition task,


what is the behaviour expected?
a) accretive
b) interpolative
c) can be either accretive or interpolative
d) none of the mentioned

168. In case of pattern by feedback nets in pattern recognition task, what is the
behaviour expected?
a) accretive
b) interpolative
c) can be either accretive or interpolative
d) none of the mentioned

169. What are hard problems?


a) classification problems which are not clearly separable
a) classification problems which are not associatively separable
a) classification problems which are not functionally separable
d) none of the mentioned

170. In order to overcome constraint of linearly separablity concept of


multilayer feedforward net is proposed?
a) yes
b) no

171. The hard learning problem is ultimately solved by hoff‟s algorithm?


a) yes
b) no

172. What is generalization?


a) ability to store a pattern
b) ability to recall a pattern
c) ability to learn a mapping function
d) none of the mentioned

173. Generalization feature of a multilayer feedforward network depends on


factors?
a) architectural details
b) learning rate parameter
c) training samples
d) all of the mentioned

174. What is accretive behaviour?


a) not a type of pattern clustering task
b) for small noise variations pattern lying closet to the desired pattern is
recalled.
c) for small noise variations noisy pattern having parameter adjusted according
to noise variation is recalled
d) none of the mentioned

175. What is Interpolative behaviour?


a) not a type of pattern clustering task
b) for small noise variations pattern lying closet to the desired pattern is
recalled.
c) for small noise variations noisy pattern having parameter adjusted
according to noise variation is recalled
d) none of the mentioned

176. Does pattern association involves non linear units in feedforward neural
network?
a) yes
b) no

177. What is the feature that doesn‟t belongs to pattern classification in


feeddorward neural networks?
a) recall is direct
b) delta rule learning
c) non linear processing units
d) two layers

178. What is the feature that doesn‟t belongs to pattern mapping in feeddorward
neural networks?
a) recall is direct
b) delta rule learning
c) non linear processing units
d) two layers

179. Number of output cases depends on what factor?


a) number of inputs
b) number of distinct classes
c) total number of classes
d) none of the mentioned
180. What is the objective of perceptron learning?
a) class identification
b) weight adjustment
c) adjust weight along with class identification
d) none of the mentioned

181. On what factor the number of outputs depends?


a) distinct inputs
b) distinct classes
c) both on distinct classes & inputs
d) none of the mentioned

182. In perceptron learning, what happens when input vector is correctly


classified?
a) small adjustments in weight is done
b) large adjustments in weight is done
c) no adjustments in weight is done
d) weight adjustments doesn‟t depend on classification of input vector

183. When two classes can be separated by a separate line, they are known as?
a) linearly separable
b) linearly inseparable classes
c) may be separable or inseparable, it depends on system
d) none of the mentioned

184. If two classes are linearly inseparable, can perceptron convergence


theorem be applied?
a) yes
b) no

185. Is it necessary to set initial weights in prceptron convergence theorem to


zero?
a) yes
b) no
186. Convergence in perceptron learning takes place if and only if:
a) a minimal error condition is satisfied
b) actual output is close to desired output
c) classes are linearly separable
d) all of the mentioned

187. To provide generalization capability to a network, what should be done?


a) all units should be linear
b) all units should be non – linear
c) except input layer, all units in other layers should be non – linear
d) none of the mentioned
188. In a three layer network, shape of dividing surface is determined by?
a) number of units in second layer
b) number of units in third layer
c) number of units in second and third layer
d) none of the mentioned

189. In a three layer network, number of classes is determined by?


a) number of units in second layer
b) number of units in third layer
c) number of units in second and third layer
d) none of the mentioned

190. What are 3 basic types of neural nets that form basic functional units
among
i)feedforward ii) loop iii) recurrent iv) feedback v) combination of feed forward
& back
a) i, ii, iii
b) i, ii, iv
c) i, iv, v
d) i, iii, v

191. What is the objective of backpropagation algorithm?


a) to develop learning algorithm for multilayer feedforward neural network
b) to develop learning algorithm for single layer feedforward neural network
c) to develop learning algorithm for multilayer feedforward neural
network, so that network can be trained to capture the mapping implicitly
d) none of the mentioned

192. The backpropagation law is also known as generalized delta rule, is it true?
a) yes
b) no

193. What is true regarding backpropagation rule?


a) it is also called generalized delta rule
b) error in output is propagated backwards only to determine weight updates
c) there is no feedback of signal at nay stage
d) all of the mentioned

194. There is feedback in final stage of backpropagation algorithm?


a) yes
b) no

195. What is true regarding backpropagation rule?


a) it is a feedback neural network
b) actual output is determined by computing the outputs of units for each
hidden layer
c) hidden layers output is not all important, they are only meant for supporting
input and output layers
d) none of the mentioned

196. What is meant by generalized in statement “backpropagation is a


generalized delta rule” ?
a) because delta rule can be extended to hidden layer units
b) because delta is applied to only input and output layers, thus making it more
simple and generalized
c) it has no significance
d) none of the mentioned

197. What are general limitations of back propagation rule?


a) local minima problem
b) slow convergence
c) scaling
d) all of the mentioned

198. What are the general tasks that are performed with backpropagation
algorithm?
a) pattern mapping
b) function approximation
c) prediction
d) all of the mentioned

199. Does backpropagaion learning is based on gradient descent along error


surface?
a) yes
b) no
c) cannot be said
d) it depends on gradient descent but not error surface

200. How can learning process be stopped in backpropagation rule?


a) there is convergence involved
b) no heuristic criteria exist
c) on basis of average gradient value
d) none of the mentioned

201. How can false minima be reduced in case of error in recall in feedback
neural networks?
a) by providing additional units
b) by using probabilistic update
c) can be either probabilistic update or using additional units
d) none of the mentioned

202. What is a Boltzman machine?


a) A feedback network with hidden units
b) A feedback network with hidden units and probabilistic update
c) A feed forward network with hidden units
d) A feed forward network with hidden units and probabilistic update

203. What is objective of linear autoassociative feedforward networks?


a) to associate a given pattern with itself
b) to associate a given pattern with others
c) to associate output with input
d) none of the mentioned

204. Is there any error in linear autoassociative networks?


a) yes
b) no

205. If input is „ a(l) + e „ where „e‟ is the noise introduced, then what is the
output in case of autoassociative feedback network?
a) a(l)
b) a(l) + e
c) could be either a(l) or a(l) + e
d) e

206. If input is „ a(l) + e „ where „e‟ is the noise introduced, then what is the
output if system is accretive in nature?
a) a(l)
b) a(l) + e
c) could be either a(l) or a(l) + e
d) e

207. If input is „ a(l) + e „ where „e‟ is the noise introduced, then what is the
output if system is interpolative in nature?
a) a(l)
b) a(l) + e
c) could be either a(l) or a(l) + e
d) e

208. What property should a feedback network have, to make it useful for
storing information?
a) accretive behaviour
b) interpolative behaviour
c) both accretive and interpolative behaviour
d) none of the mentioned

209. What is the objective of a pattern storage task in a network?


a) to store a given set of patterns
b) to recall a give set of patterns
c) both to store and recall
d) none of the mentioned

210. Linear neurons can be useful for application such as interpolation, is it


true?
a) yes
b) no
211. What is the use of MLFFNN?
a) to realize structure of MLP
b) to solve pattern classification problem
c) to solve pattern mapping problem
d) to realize an approximation to a MLP

212. What is the advantage of basis function over mutilayer feedforward neural
networks?
a) training of basis function is faster than MLFFNN
b) training of basis function is slower than MLFFNN
c) storing in basis function is faster than MLFFNN
d) none of the mentioned

213. Why is the training of basis function is faster than MLFFNN?


a) because they are developed specifically for pattern approximation
b) because they are developed specifically for pattern classification
c) because they are developed specifically for pattern approximation or
classification
d) none of the mentioned

214. Which application out of these of robots can be made of single layer
feedforward network?
a) wall climbing
b) rotating arm and legs
c) gesture control
d) wall following

215. Which is the most direct application of neural networks?


a) vector quantization
b) pattern mapping
c) pattern classification
d) control applications

216. What are pros of neural networks over computers?


a) they have ability to learn b examples
b) they have real time high computational rates
c) they have more tolerance
d) all of the mentioned

217. What is true about single layer associative neural networks?


a) performs pattern recognition
b) can find the parity of a picture
c) can determine whether two or more shapes in a picture are connected or not
d) none of the mentioned

218. Which of the following is false?


a) neural networks are artificial copy of the human brain
b) neural networks have high computational rates than conventional computers
c) neural networks learn by examples
d) none of the mentioned

219. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4

220. What is needed to make probabilistic systems feasible in the world?


a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned

221. Where does the bayes rule can be used?


a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query

222. What does the bayesian network provides?


a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned

223. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned

224. How the bayesian network can be used to answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned

225. How the compactness of the bayesian network can be described?


a) Locally structured
b) Fully structured
c) Partial structure
d) All of the mentioned

226. To which does the local structure is associated?


a) Hybrid
b) Dependant
c) Linear
d) None of the mentioned

227. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned

228. What is the consequence between a node and its predecessors while
creating bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant

229. k-NN algorithm does more computation on test time rather than train time.
A) TRUE
B) FALSE

230. Which of the following distance metric can not be used in k-NN?
A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used

231. Which of the following option is true about k-NN algorithm?


A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression

232. Which of the following statement is true about k-NN algorithm?

1. k-NN performs much better if all of the data have the same scale
2. k-NN works well with a small number of input variables (p), but struggles
when the number of inputs is very large
3. k-NN makes no assumptions about the functional form of the problem being
solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above

233. Which of the following machine learning algorithm can be used for
imputing missing values of both categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression

234. Which of the following is true about Manhattan distance?


A) It can be used for continuous variables
B) It can be used for categorical variables
C) It can be used for categorical as well as continuous
D) None of these

235. Which of the following distance measure do we use in case of categorical


variables in k-NN?

1. Hamming Distance
2. Euclidean Distance
3. Manhattan Distance
A) 1
B) 2
C) 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3

236. Which of the following will be Euclidean Distance between the two data
point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
sqrt( (1-2)^2 + (3-3)^2) = sqrt(1^2 + 0^2) = 1

237. Which of the following will be Manhattan Distance between the two data
point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 + 0) = 1

238. Which of the following will be true about k in k-NN in terms of Bias?
A) When you increase the k the bias will be increases
B) When you decrease the k the bias will be increases
C) Can‟t say
D) None of these
239. Which of the following will be true about k in k-NN in terms of variance?
A) When you increase the k the variance will increases
B) When you decrease the k the variance will increases
C) Can‟t say
D) None of these

240. When you find noise in data which of the following option would you
consider in k-NN?
A) I will increase the value of k
B) I will decrease the value of k
C) Noise can not be dependent on value of k
D) None of these

241. In k-NN it is very likely to overfit due to the curse of dimensionality.


Which of the following option would you consider to handle such problem?
1. Dimensionality Reduction
2. Feature selection
A) 1
B) 2
C) 1 and 2
D) None of these

242. Below are two statements given. Which of the following will be true both
statements?
1. k-NN is a memory-based approach is that the classifier immediately adapts as
we collect new training data.
2. The computational complexity for classifying new samples grows linearly
with the number of samples in the training dataset in the worst-case scenario.
A) 1
B) 2
C) 1 and 2
D) None of these
243. Which of the following value of k in the following graph would you give
least leave one out cross validation accuracy?

A) 1
B) 2
C) 3
D) 5

244. A company has build a kNN classifier that gets 100% accuracy on training
data. When they deployed this model on client side it has been found that the
model is not at all accurate. Which of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at
client side except the model performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can‟t say
D) None of these

245. You have given the following 2 statements, find which of these option
is/are true in case of k-NN?
1. In case of very large value of k, we may include points from other classes
into the neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise
A) 1
B) 2
C) 1 and 2
D) None of these
246. Which of the following statements is true for k-NN classifiers?

A) The classification accuracy is better with larger values of k


B) The decision boundary is smoother with smaller values of k
C) The decision boundary is linear
D) k-NN does not require an explicit training step

247. It is possible to construct a 2-NN classifier by using the 1-NN classifier?


A) TRUE
B) FALSE

248. In k-NN what will happen when you increase/decrease the value of k?
A) The boundary becomes smoother with increasing value of K
B) The boundary becomes smoother with decreasing value of K
C) Smoothness of boundary doesn‟t dependent on value of K
D) None of these

249. Following are the two statements given for k-NN algorthm, which of the
statement(s) is/are true?
1. We can choose optimal value of k with the help of cross validation
2. Euclidean distance treats each feature as equally important
A) 1
B) 2
C) 1 and 2
D) None of these

250. The following are methods used to test hypotheses except:


a) traditional (computed value) method
b) p-value method
c) confidence interval method
d) survey method

251. In reality, the null hypothesis may or may not be true, and a decision is
made to reject or not to reject it on the basis of the data obtained from a
sample.
a. True
b. False
252. The level of significance is the maximum probability of committing a type
II error.
a. True
b. False

253. A level of significance of 5% means:


a. There's a 5% chance we're wrong
b. There's a 5% chance we'll be wrong if we fail to reject the null hypothesis
c. There's a 5% chance we'll be wrong if we reject the null hypothesis.
d. There's a 5% chance you'll get an A on the test.

254. Which value separates the critical region from the noncritical region in a
normal curve when testing the hypothesis?
computed value
a. t-value
b. z-value
c. critical value

255. Which of the following shows a right-tailed test?


a. H1: µ < 15
b. H0: µ < 15
c. H1: µ > 15
d. H0: µ > 15

256. Which of the following shows a left-tailed test?


a. H1: µ < 15
b. H0: µ >15
c. H1: µ > 15
d. H0: µ > 15

257. A Type I error is when:


a) We obtain the wrong test statistic
b) We reject the null hypothesis when it is actually true
c) We fail to reject the null hypothesis when it's actually false
d) We reject the alternate hypothesis when it's actually true
258. Which of the following is not an option for an alternative hypothesis?
a) Ha = k
b) Ha > k
c) Ha < k
d) Ha ≠ k

259. What does it mean to say a test is two-tailed?


a) There is no predicted direction for the alternative hypothesis
b) There are two alternative hypotheses
c) There are two types of error

260. Failing to reject a null hypothesis that is false can be characterized as


a) a Type I error
b) a Type II error
c) both a Type I and Type II error
d) no error

261. What separates the critical region from the noncritical region?
a) critical value
b) computed value
c) z-score

262. Type I error is also called


a) beta error
b) alpha error
c) critical value error

263. Which of the following yields correct decision?


a) Accept a false null hypothesis
b) Reject a true null hypothesis
c) Accept a true null hypothesis

264. If the test is two-tailed, the critical region, with an area equal to α, will be
on the left side of the mean.
a) True
b) False
265. A P-value indicates:
a) the probability that the null hypothesis is true
b) the probability of obtaining the results (or one more extreme) if the
null hypothesis is true
c) the probability that the alternative hypothesis is true
d) probability of a Type I error

266. Genetic Algorithm are a part of


a) Evolutionary Computing
b) are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
c) inspired by Darwin's theory about evolution - "survival of the fittest"
d) All

267. Evolutionary computation is


A. Combining different types of method or information
B. Approach to the design of learning algorithms that is structured
along the lines of the theory of evolution.
C. Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.
D. None of these

268. How the new states are generated in genetic algorithm?


a) Composition
b) Mutation
c) Cross-over
d) Both Mutation & Cross-over

269. A genetic algorithm (or GA) is a variant of stochastic beam search in


which successor states are generated by combining two parent states, rather than
by modifying a single state.
a) True
b) False

270. What are the two main features of Genetic Algorithm?


a) Fitness function & Crossover techniques
b) Crossover techniques & Random mutation
c) Individuals among the population & Random mutation
d) Random mutation & Fitness function

271. Genetic algorithms belong to the family of methods in the


a) artificial intelligence area.
b) optimization area.
c) complete enumeration family of methods
d) Non-computer based (human) solutions area

272. Which approach is most suited to structured problems (with little


uncertainty)
a) simulation
b) human intuition
c) Optimization
d) genetic algorithms

273. All of the following are suitable problems for genetic algorithms EXCEPT
a) dynamic process control
b) pattern recognition with complex patterns
c) simulation of biological models
d) simple optimization with few variables

274. Which approach is most suited to complex problems with significant


uncertainty, a need for experimentation, and time compression?
a) Simulation
b) Optimization
c) human intuition
d) genetic algorithms

275. Genetic algorithm is commonly used to generate high-quality solutions for


a) Optimization problems
b) Search problems
c) Both a & b
d) None

276. How the decision tree reaches its decision?


a) Single test
b) Two test
c) Sequence of test
d) No test

277. Which of the following is the model used for learning?


a) Decision trees
b) Neural networks
c) Propositional and FOL rules
d) All of the mentioned

278. Automated vehicle is an example of ______


a) Supervised learning
b) Unsupervised learning
c) Active learning
d) Reinforcement learning

279. In which of the following learning the teacher returns reward and
punishment to learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning

280. Decision trees are appropriate for the problems where ___________
a) Attributes are both numeric and nominal
b) Target function takes on a discrete number of values.
c) Data may have errors
d) All of the mentioned

281. Which of the following is also called as exploratory learning?


a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning

282. What takes input as an object described by a set of attributes?


a) Tree
b) Graph
c) Decision graph
d) Decision tree

283. What approach is taken by decision tree for knowledge learning


a) Association rules
b) Statistical
c) Substitutive
d) Inductive

284. Which of the following algorithm are not an example of ensemble learning
algorithm?
A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees

285. A decision tree does not include:


a) statements
b) condition stubs
c) action stubs
d) rules

286. Components of a decision tree include:


a) states
b) rules
c) decision points
d) stubs

287. To read a decision tree, you begin at the:


a) top root node
b) far-left root node
c) far-right root node
d) bottom root node

288. Which of the following statement(s) is / are true for Gradient Decent (GD)
and Stochastic Gradient Decent (SGD)?
1. In GD and SGD, you update a set of parameters in an iterative manner to
minimize the error function.
2. In SGD, you have to run through all the samples in your training set for a
single update of a parameter in each iteration.
3. In GD, you either use the entire data or a subset of training data to update
a parameter in each iteration.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3

288. Which of the following hyper parameter(s), when increased may cause
random forest to over fit the data?
1. Number of Trees
2. Depth of Tree
3. Learning Rate
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3

289. Which of the following options is/are true for K-fold cross-validation?
1. Increase in K will result in higher time required to cross validate the
result.
2. Higher values of K will result in higher confidence on the cross-
validation result as compared to lower value of K.
3. If K=N, then it is called Leave one out cross validation, where N is the
number of observations.

A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3

290. In ______ learning we can say that the output depends on the state of the
current input and the next input depends on the output of the previous input.
a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning
291. In _______ learning decision is dependent, So we give labels to sequences
of dependent decisions.
a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning

292. Reinforcement learning algorithms include


a) Q-learning
b) SARSA
c) Both a & b
d) None

293. _________ defines the relative importance of a design.


a) Fitness function
b) Mutation
c) Simulation
d) None

294. The goal of ______ is to find the hypothesis that best fits the training
examples.
a) concept learning
b) genetic algorithm
c) nutation
d) none of the above

295. The ________ only considers the positive examples and eliminates
negative examples.
a) List-Then-Eliminate algorithm
b) Find-S algorithm
c) Candidate Elimination algorithm
d) None of the above

296. The _________ initializes the version space to contain all hypotheses in H,
then eliminates the hypotheses that are inconsistent, from training examples.
a) List-Then-Eliminate algorithm
b) Find-S algorithm
c) Candidate Elimination algorithm
d) None of the above
297. The ________ incrementally builds the version space given a hypothesis
space H and a set E of examples.
a) List-Then-Eliminate algorithm
b) Find-S algorithm
c) Candidate Elimination algorithm
d) None of the above

298. The _______ of a learning algorithm is the set of assumptions that the
learner uses to predict outputs of given inputs that it has not encountered.
a) inductive bias
b) learning bias
c) deductive bias
d) Both a & b

299. Disadvantage of Machine Learning is/are:


a) Data Acquisition
b) Highly error-prone
c) Time-consuming
d) All the above

300. The Expectation Maximization algorithm has been used to identify


conserved domains in unaligned proteins only.
a) True
b) False

301. Which of the following is untrue regarding Expectation Maximization


algorithm?
a) An initial guess is made as to the location and size of the site of interest in
each of the sequences, and these parts of the sequence are aligned
b) The alignment provides an estimate of the base or amino acid composition of
each column in the site
c) The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the
sequences
d) The row-by-column composition of the site already available is used to
estimate the probability
302. Out of the two repeated steps in EM algorithm, the step 2 is ________
a) the maximization step
b) the minimization step
c) the optimization step
d) the normalization step

303. In EM algorithm, as an example, suppose that there are 10 DNA sequences


having very little similarity with each other, each about 100 nucleotides long
and thought to contain a binding site near the middle 20 residues, based on
biochemical and genetic evidence. the following steps would be used by the EM
algorithm to find the most probable location of the binding sites in each of the
______ sequences.
a) 30
b) 10
c) 25
d) 20

304. In the initial step of EM algorithm, the 20-residue-long binding motif


patterns in each sequence are aligned as an initial guess of the motif.
a) True
b) False

305. In the intermediate steps of EM algorithm, the number of each base in each
column is determined and then converted to fractions.
a) True
b) False

306. Computational learning theory analyzes the sample complexity and


computational complexity of __________
a) Unsupervised Learning
b) Inductive learning
c) Forced based learning
d) Weak learning

307. Inductive learning involves finding a __________


a) Consistent Hypothesis
b) Inconsistent Hypothesis
c) Regular Hypothesis
d) Irregular Hypothesis

You might also like