0% found this document useful (0 votes)
71 views20 pages

Deep Learning MCQA

This document contains multiple choice questions about machine learning concepts such as supervised vs unsupervised learning, classification vs prediction problems, neural networks, and regression techniques. It tests understanding of key differences between different machine learning algorithms and how they can be applied to problems.

Uploaded by

39- Aarti Omane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views20 pages

Deep Learning MCQA

This document contains multiple choice questions about machine learning concepts such as supervised vs unsupervised learning, classification vs prediction problems, neural networks, and regression techniques. It tests understanding of key differences between different machine learning algorithms and how they can be applied to problems.

Uploaded by

39- Aarti Omane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Subject- Deep Learning

MCQA
1. Supervised learning and unsupervised clustering both require at least one
a. hidden attribute.
b. output attribute.
c. input attribute.
d. categorical attribute.

2. Supervised learning differs from unsupervised clustering in that supervised


learning requires
a. at least one input attribute.
b. input attributes to be categorical.
c. at least one output attribute.
d. ouput attriubutes to be categorical.

3. Classification problems are distinguished from estimation problems in that


b. classification problems require the output attribute to be numeric.
c. classification problems require the output attribute to be categorical.
d. classification problems do not allow an output attribute.
e. classification problems are designed to predict future outcome.

4. Which statement is true about prediction problems?


a. The output attribute must be categorical.
b. The output attribute must be numeric.
c. The resultant model is designed to determine future outcomes.
d. The resultant model is designed to classify current behavior.

5.Which statement about outliers is true?


a. Outliers should be identified and removed from a dataset.
b. Outliers should be part of the training dataset but should not be present in the test
data.
c. Outliers should be part of the test dataset but should not be present in the training
data.
d. The nature of the problem determines how outliers are used.
e. More than one of a,b,c or d is true.

6. Assume that we have a dataset containing information about 200 individuals.


One hundred of these individuals have purchased life insurance. A supervised
data mining session has discovered the following rule:

IF age < 30 & credit card insurance = yes


THEN life insurance = yes
Rule Accuracy: 70%
Rule Coverage: 63%

7. How many individuals in the class life insurance= no have credit card
insurance and are less than 30 years old?
a. 63
b. 70
c. 30
d. 27

8. Which statement is true about neural network and linear regression models?
a. Both models require input attributes to be numeric.
b. Both models require numeric attributes to range between 0 and 1.
c. The output of both models is a categorical attribute value.
d. Both techniques build models whose output is determined by a linear sum of
weighted input attribute values.
e. More than one of a,b,c or d is true.
11 Unlike traditional production rules, association rules
a. allow the same variable to be an input attribute in one rule and an output
attribute in another rule.
b. allow more than one input attribute in a single rule.
c. require input attributes to take on numeric values.
d. require each rule to have exactly one categorical output attribute.

12. Which of the following is a common use of unsupervised clustering?


a. detect outliers
b. determine a best set of input attributes for supervised learning
c. evaluate the likely performance of a supervised learner model
d. All of a,b,c, and d are common uses of unsupervised clustering.

13. The average positive difference between computed and desired outcome values.
a. root mean squared error
b. mean squared error
c. mean absolute error
d. mean positive error

14. Given desired class C and population P, lift is defined as


a. the probability of class C given population P divided by the probability of C given
a sample taken from the population.
b. the probability of population P given a sample taken from P.
c. the probability of class C given a sample taken from population P.
d. the probability of class C given a sample taken from population P divided by
the probability of C within the entire population P.
15 The K-Means algorithm terminates when
a. a user-defined minimum value for the summation of squared error differences
between instances and their corresponding cluster center is seen.
b. the cluster centers for the current iteration are identical to the cluster centers
for the previous iteration.
c. the number of instances in each cluster for the current iteration is identical to the
number of instances in each cluster of the previous iteration.
d. the number of clusters formed for the current iteration is identical to the number
of clusters formed in the previous iteration.

16. A feed-forward neural network is said to be fully connected when


a. all nodes are connected to each other.
b. all nodes at the same layer are connected to each other.
c. all nodes at one layer are connected to all nodes in the next higher layer.
d. all hidden layer nodes are connected to all output layer nodes.

17. The values input into a feed-forward neural network


a. may be categorical or numeric.
b. must be either all categorical or all numeric but not both.
c. must be numeric.
d. must be categorical.

18 Neural network training is accomplished by repeatedly passing the training data


through the network while
a. individual network weights are modified.
b. training instance attribute values are modified.
c. the ordering of the training instances is modified.
d. individual network nodes have the coefficients on their corresponding functional
parameters modified.
19 Genetic learning can be used to train a feed-forward network. This is
accomplished by having each population element represent one possible
a. network configuration of nodes and links.
b. set of training data to be fed through the network.
c. set of network output values.
d. set of network connection weights.

20 With a Kohonen network, the output layer node that wins an input instance is
rewarded by having
a. a higher probability of winning the next training instance to be presented.
b. its connect weights modified to more closely match those of the input
instance.
c. its connection weights modified to more closey match those of its neighbors.
d. neighoring connection weights modified to become less similar to its own
connection weights.

21 A two-layered neural network used for unsupervised clustering.


a. backpropagation network
b. Kohonen network
c. perceptron network
d. aggomerative network

22 This neural network explanation technique is used to determine the relative


importance of individual input attributes.
a. sensitivity analysis
b. average member technique
c. mean squared error analysis
d. absolute average technique
23 Which one of the following is not a major strength of the neural network
approach?
a. Neural networks work well with datasets containing noisy data.
b. Neural networks can be used for both supervised learning and unsupervised
clustering.
c. Neural network learning algorithms are guaranteed to converge to an
optimal solution.
d. Neural networks can be used for applications that require a time element to be
included in the data.

24 During backpropagation training, the purpose of the delta rule is to make weight
adjustments so as to
a. minimize the number of times the training data must pass through the network.
b. minimize the number of times the test data must pass through the network.
c. minimize the sum of absolute differences between computed and actual outputs.
d. minimize the sum of squared error differences between computed and actual
output.

25 Epochs represent the total number of


a. input layer nodes.
b. passes of the training data through the network.
c. network nodes.
d. passes of the test data through the network.

26 Two classes each of which is represented by the same pair of numeric attributes
are linearly separable if
a. at least one of the pairs of attributes shows a curvilinear relationship between the
classes.
b. at least one of the pairs of attributes shows a high positive correlation between the
classes.
c. at least one of the pairs of attributes shows a high positive correlation between the
classes.
d. a straight line partitions the instances of the two classes.
27 The test set accuracy of a backpropagation neural network can often be improved
by
a. increasing the number of epochs used to train the network.
b. decreasing the number of hidden layer nodes.
c. increasing the learning rate.
d. decreasing the number of hidden layers.

28 This type of supervised network architecture does not contain a hidden layer.
a. backpropagation
b. perceptron
c. self-organizing map
d. genetic

29 The total delta measures the total absolute change in network connection weights
for each pass of the training data through a neural network. This value is most
often used to determine the convergence of a
a. perceptron network.
b. feed-forward network.
c. backpropagation network.
d. self-organizing network.

30 Simple regression assumes a __________ relationship between the input attribute


and output attribute.
a. linear
b. quadratic
c. reciprocal
d. inverse

31 Regression trees are often used to model _______ data.


a. linear
b. nonlinear
c. categorical
d. symmetrical

32 The leaf nodes of a model tree are


a. averages of numeric output attribute values.
b. nonlinear regression equations.
c. linear regression equations.
d. sums of numeric output attribute values.

33 Logistic regression is a ________ regression technique that is used to model data


having a _____outcome.
a. linear, numeric
b. linear, binary
c. nonlinear, numeric
d. nonlinear, binary

34 This technique associates a conditional probability value with each data instance.
a. linear regression
b. logistic regression
c. simple regression
d. multiple linear regression

35 The probability of a hypothesis before the presentation of evidence.


a. a priori
b. subjective
c. posterior
d. conditional
36 This supervised learning technique can process both numeric and categorical
input attributes.
a. linear regression
b. Bayes classifier
c. logistic regression
d. backpropagation learning

37 With Bayes classifier, missing data items are


a. treated as equal compares.
b. treated as unequal compares.
c. replaced with a default value.
d. ignored.

38 Choose the options that are incorrect regarding machine learning (ML) and
artificial intelligence (AI),
(A) ML is an alternate way of programming intelligent machines.
(B) ML and AI have very different goals.
(C) ML is a set of techniques that turns a dataset into a software.
(D) AI is a software that can emulate the human mind.

39 Which of the following sentence is FALSE regarding regression?


(A) It relates inputs to outputs.
(B) It is used for prediction.
(C) It may be used for interpretation.
(D) It discovers causal relationships.

40 . Gradient of a continuous and differentiable function


(A) is zero at a minimum
(B) is non-zero at a maximum
(C) is zero at a saddle point
(D) decreases as you get closer to the minimum
41. Computational complexity of Gradient descent is,
(A) linear in D
(B) linear in N
(C) polynomial in D
(D) dependent on the number of iterations

42 K-fold cross-validation is
(A) linear in K
(B) quadratic in K
(C) cubic in K
(D) exponential in K

43 You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect
it to), while the test error is much higher than the train error.
What do you think is the main reason behind this behavior. Choose the
most probable option.
(A) High variance
(B) High model bias
(C) High estimation bias
(D) None of the above

44 Adding more basis functions in a linear model... (pick the most probably
Option)
(A) Decreases model bias
(B) Decreases estimation bias
(C) Decreases variance
(D) Doesn't affect bias and variance
45.The number of nodes in the input layer is 10 and the hidden layer is 5. The maximum number
of connections from the input layer to the hidden layer are

A) 50
B) Less than 50
C) More than 50
D) It is an arbitrary value

46.Which of the following functions can be used as an activation function in the output layer
if we wish to predict the probabilities of n classes (p1, p2..pk) such that sum of p over all n
equals to 1?

A) Softmax
B) ReLu
C) Sigmoid
D) Tanh

47.Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The weights to the input

neurons are 4,5 and 6 respectively. Assume the activation function is a linear constant value of 3.

What will be the output ?

A) 32

B) 643

C) 96

D) 48

48. In the neural network, every parameter can have their different learning rate.

A) TRUE

B) FALSE

49.In the neural network, every parameter can have their different learning rate.
A) TRUE

B) FALSE

50The red curve above denotes training accuracy with respect to each epoch in a deep

learning algorithm. Both the green and blue curves denote validation accuracy.

Which of these indicate overfitting?

A) Green Curve
B) Blue Curve

51Which of the following statement is true regrading dropout?


52: Dropout gives a way to approximate by combining many different architectures

2: Dropout demands high learning rates

53: Dropout can help preventing overfitting

A) Both 1 and 2

B) Both 1 and 3

C) Both 2 and 3

D) All 1, 2 and 3

54.IN Neural network The input from Input layer is then feed into the______.
A. Input layer

B. Output layer

C. Hidden layer

D. None of these

55.____________ computes the output volume by computing dot product between all filters and
image patch.
A. Input Layer
B. Convolution Layer
C. Activation Function Layer
D. Pool Layer

56.________is a pooling operation that selects the maximum element from the region of the
feature map covered by the filter.
A. Max Pooling
B. Average Pooling
C. Global pooling
D. None of these
57.Recurrent Neural Networks are best suited for Text Processing.
True
False
58.What does LSTM stand for?
Long Short Term Memory
Least Squares Term Memory
Least Square Time Mean
Long Short Threshold Memory
Answer:-Long Short Term Memory

59.What is the difference between the actual output and generated output known as?
Output Modulus
Accuracy
Cost
Output Difference
Answer:-Cost

60. Recurrent Neural Networks are best suited for Text Processing.
True
False
Answer:-True

61.Prediction Accuracy of a Neural Network depends on _______________ and ______________.


Input and Output
Weight and Bias
Linear and Logistic Function
Activation and Threshold
Answer:-Weight and Bias

62.Recurrent Networks work best for Speech Recognition.


True
False
Answer:-True

63.GPU stands for __________.


Graphics Processing Unit
Gradient Processing Unit
General Processing Unit
Good Processing Unit.
Answer:- Graphics Processing Unit
(64.Gradient at a given layer is the product of all gradients at the previous layers.
False
True
Answer:- True

65_____________________ is a Neural Nets way of classifying inputs.


Learning
Forward Propagation
Activation
Classification
Answer:- Forward Propagation

66.Name the component of a Neural Network where the true value of the input is not observed.
Hidden Layer
Gradient Descent
Activation Function
Output Layer
Answer:- Hidden Layer

67.________________ works best for Image Data.


AutoEncoders
Single Layer Perceptrons
Convolution Networks
Random Forest
Answer:- Convolution Networks

68.Neural Networks Algorithms are inspired from the structure and functioning of the Human
Biological Neuron.
False
True
Answer:- True

69. In a Neural Network, all the edges and nodes have the same Weight and Bias values.
True
False
Answer:- False

70._______________ is a recommended Model for Pattern Recognition in Unlabeled Data.


CNN
Shallow Neural Networks
Autoencoders
RNN
Answer:- Autoencoders

71.Process of improving the accuracy of a Neural Network is called _______________.


Forward Propagation
Cross Validation
Random Walk
Training
Answer:- Training

72. Data Collected from Survey results is an example of ___________________.


Data
Information
Structured Data
Unstructured Data
Answer:- Structured Data

73. A Shallow Neural Network has only one hidden layer between Input and Output layers.
False
True
Answer:- True

74. Support Vector Machines, Naive Bayes and Logistic Regression are used for solving
___________________ problems.
Clustering
Classification
Regression
Time Series
Answer:- Classification

75. The rate at which cost changes with respect to weight or bias is called __________________.
Derivative
Gradient
Rate of Change
Loss

76. What does LSTM stand for?


Long Short Term Memory
Least Squares Term Memory
Least Square Time Mean
Long Short Threshold Memory
Answer:-Long Short Term Memory

77. All the Visible Layers in a Restricted Boltzmannn Machine are connected to each other.
True
False
Answer:- False

78.All the neurons in a convolution layer have different Weights and Biases.
True
False
Answer:- False

79.What is the method to overcome the Decay of Information through time in RNN known as?
Back Propagation
Gradient Descent
Activation
Gating
Answer:- Gating

80. Recurrent Network can input Sequence of Data Points and Produce a Sequence of Output.
False
True
Answer:- True

81.A Deep Belief Network is a stack of Restricted Boltzmann Machines.


False
True
Answer:-True

82. Restricted Boltzmann Machine expects the data to be labeled for Training.
False
True
Answer:- False

83.What is the best Neural Network Model for Temporal Data?


Recurrent Neural Network
Convolution Neural Networks
Temporal Neural Networks
Multi Layer Perceptrons
Answer:- Recurrent Neural Network

84. RELU stands for ______________________________.


Rectified Linear Unit
Rectified Lagrangian Unit
Regressive Linear Unit
Regressive Lagrangian Unit
Answer:- Rectified Linear Unit

85. Why is the Pooling Layer used in a Convolution Neural Network?


They are of no use in CNN.
Dimension Reduction
Object Recognition
Image Sensing
Answer:- Dimension Reduction
86 What are the two layers of a Restricted Boltzmann Machine called?
Input and Output Layers
Recurrent and Convolution Layers
Activation and Threshold Layers
Hidden and Visible Layers
Answer:- Hidden and Visible Layers

87. The measure of Difference between two probability distributions is know as


________________________.
Probability Difference
Cost
KL Divergence
Error
Answer:- KL Divergence

88. A _________________ matches or surpasses the output of an individual neuron to a visual


stimuli.
Max Pooling
Gradient
Cost
Convolution
Answer:- Convolution

89.The rate at which cost changes with respect to weight or bias is called __________________.
Derivative
Gradient
Rate of Change
Loss
Answer:- Gradient

90.Autoencoders are trained using _____________________.


Feed Forward
Reconstruction
Back Propagation
They do not require Training
Answer:- Back Propagation

91. How do RNTS interpret words?


One Hot Encoding
Lower Case Versions
Word Frequencies
Vector Representations
Answer:-Vector Representations

92. De-noising and Contractive are examples of __________________.


Shallow Neural Networks
Autoencoders
Convolution Neural Networks
Recurrent Neural Networks
Answer:-Autoencoders

93. Autoencoders cannot be used for Dimensionality Reduction.


False
True
Answer:-False

You might also like