Neural Networks Question Bank
Neural Networks Question Bank
11. With a supervised learning algorithm, we can specify target output values, but we may never
get close to those targets at the end of learning. Give two reasons why this might happen.
Answer:
(i) data may be valid, and inconsistency results from a stochastic aspect of the task (or some aspect
of
(ii) the data may contain errors - e.g. measurement errors or typographical errors
12. Describe the architecture and the computational task of the NetTalk neural network.
Answer:
Each group of 29 input units represents a letter, so inputs together represent seven letters
computational
task is to output the phoneme representation corresponding to the seven's middle letter.
13. Why does a time-delay neural network (TDNN) have the same set of incoming weights for each
Answer:
To provide temporal translation invariance. Or So that the TDNN will be able to identify the input
sound,
Answer:
that the weight matrix for connections from the input layer to the hidden layer is Wih, and that the
weight matrix for connections from the hidden layer to the output layer is Who.
Answer:
16. In a Jordan network with i input neurons, h hidden layer neurons, and o output neurons:
(a) how many neurons will there be in the state vector, and
(b) if i = 4, h = 3, and o = 2, draw a diagram showing the connectivity of the network. Do not forget
Answer:
(a) o neurons in state vector (same as output vector – that_s letter o, not zero)
(b)
17. Draw a diagram illustrating the architecture of Elman’s simple recurrent network that performs
a temporal version of the XOR task. How are the two inputs to XOR provided to this network?
Answer:
The inputs are passed sequentially to the single input unit (0) of the temporal XOR net.
18. Briefly describe the use of cluster analysis in Elman’s lexical class discovery experiments, and
Answer:
Elman clustered hidden unit activation patterns corresponding to different input vectors and
different
sequences of input units. He found that the clusters corresponded well to the grammatical contexts
in
which the inputs (or input sequences) occurred, and thus concluded that the network had in effect
learned
the grammar.
19. Draw an architectural diagram of a rank 2 tensor product network where the dimensions of the
input/output vectors are 3 and 4. You do not need to show the detailed internal structure of the
binding units.
Answer:
20. Draw a diagram of a single binding unit in a rank 2 tensor product network illustrating the
21. Define the concepts of dense and sparse random representations. How do their properties
Answer:
In a dense random representation, each vector component is chosen at random from a uniform
distribution over say [–1, +1]. In a sparse random representation, the non-zero components are
chosen in
this way, but most components are chosen (at random) to be zero. In both cases, the vectors are
Members of orthonormal sets of vectors have length one and are orthogonal to one another. Vectors
in
dense and sparse random representations are “orthogonal on average” – their inner products have a
mean of zero.
22. What is a Hadamard matrix? Describe how a Hadamard matrix can be used to produce suitable
distributed concept representation vectors for a tensor product network. What are the properties
Answer:
A Hadamard matrix H is a square matrix of size n, all of whose entries are ±1, which satisfies HHT = In
…
the identity matrix of size n. The rows of a Hadamard matrix, once normalized, can be used as
distributed
representation vectors in a tensor product network. This is because the rows are orthogonal to each
23. In a 2-D self-organizing map with input vectors of dimension m, and k neurons in the map,
Answer:
mk
24. Describe the competitive process of the Self-Organising Map algorithm.
Answer:
The weight vector for each of the neurons in the SOM also has dimension m. So for neuron j, the
weight
For an input pattern x, compute the inner product wj•x for each neuron, and choose the largest
inner
product. Let i(x) denote the index of the winning neuron (and also the output of a trained SOM).
Answer:
Given a set of vectors X, the Voronoi cells about those vectors are the ones that partition the space
they
lie in, according to the nearest-neighbour rule. That is, the Voronoi cell that a vector lies in is that
26. Briefly explain the term code book in the context of learning vector quantisation.
Answer:
When compressing data by representing vectors by the labels of a relatively small set of
reconstruction
27. Describe the relationship between the Self-Organising Map algorithm and the Learning Vector
Quantisation algorithm.
Answer:
In order to use Learning Vector Quantisation (LVQ), a set of approximate reconstruction vectors is
first
found using the unsupervised SOM algorithm. The supervised LVQ algorithm is then used to fine-
tune the
Answer:
An attractor is a bounded subset of space to which non-trivial regions of initial conditions converge
at time
• chaotic attractor: stays within a bounded region of space, but no predictable cyclic path
29. Write down the energy function of a BSB network with weight matrix W, feedback constant β,
Answer:
30. Compute the weight matrix for a 4-neuron Hopfield net with the single fundamental memory
ξ1
Answer:
Answer:
● Classification
● Noise Reduction
● Prediction
Ability to learn
● Ability to generalize
► i.e. produce reasonable outputs for inputs it has not been taught how to deal with
– sends one output signal to many other neurons, possibly including its input neurons
(recurrent network)
• In biological systems, one neuron can be connected to as many as 10,000 other neurons.
• Usually, a neuron receives its information from other neurons in a confined area, its so-called
receptive field.
• NNs are able to learn by adapting their connectivity patterns so that the organism improves its
The output of a neuron is a function of the weighted sum of the inputs plus a bias
● The function of the entire neural network is simply the computation of the outputs of all the
neurons
Hebbian Larning(1949)
“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes place in
firing it, some growth process or metabolic change takes place in one or both cells such that A’s
efficiency, as one of the cells firing B, is increased.”
Δwi,j = c⋅xi⋅xj
Eventually, the connection strength will reflect the correlation between the neurons’ outputs.
• Winner neuron adapts its tuning (pattern of weights) even further towards the current input
Obviously, the fact that threshold units can only output the values 0 and 1 restricts their applicability
to
certain problems.
We can overcome this limitation by eliminating the threshold and simply turning fi into the identity
function
With this kind of neuron, we can build feedforward networks with m input neurons and n output
neurons
Linear neurons are quite popular and useful for applications such as interpolation.
However, they have a serious limitation: Each neuron computes a linear function, and therefore the
This means that if an input vector x results in an output vector y, then for any factor φ the input φ⋅x
will
It is especially useful for high-dimensional functions. We will use it to iteratively minimize the
network’s
(or neurons) error by finding the gradient of the error surface in weight space and adjusting the
Error is defined as the MSE between the neuron’s net input net_j and its desired output d_j (=
class(i_j))
The idea is to pick samples in random order and perform (slow) gradient descent in their individual
error
functions.
This technique allows incremental learning, i.e., refining of the weights as more training samples are
added.
43. Explain the difference between Internal Representation Issues and External Interpretation
Issues?
As we said before, in all network types, the amplitude of input signals and internal signals is limited:
Without this limitation, patterns with large amplitudes would dominate the network’s behavior.
A disproportionately large input signal can activate a neuron even if the relevant connection weight
is very
small.
From the perspective of the embedding application, we are concerned with the interpretation of
input and
output signals.
These signals constitute the interface between the embedding application and its NN component.
Often, these signals only become meaningful when we define an external interpretation for them.
This is analogous to biological neural systems: The same signal becomes completely different
meaning
when it is interpreted by different brain areas (motor cortex, visual cortex etc.).
Without any interpretation, we can only use standard methods to define the difference (or similarity)
between signals.
In production mode, the network decides that its current input is in the k-th class if and only if ok = 1,
and
For units with real-valued output, the neuron with maximal output can be picked to indicate the class
of
the input.
This maximum should be significantly greater than all other outputs, otherwise the input is
misclassified.
• Supervised learning:
An archaeologist determines the gender of a human skeleton based on many past examples of
• Unsupervised learning:
The archaeologist determines whether a large number of dinosaur skeleton fragments belong to
the same species or multiple species. There are no previous data to guide the archaeologist, and
47. Explain different ways of representing the data in the neural network system? 10
48. Explain temporal data representations. Give example. 10
As you know, no equation would tell you the ideal number of neurons in a multi-layer
network.
Ideally, we would like to use the smallest number of neurons that allows the network to do its task
• faster training,
So far, we have determined the number of hidden-layer units in BPNs by “trial and error.”
However, there are algorithmic approaches for adapting the size of a network to a given task.
Some techniques start with a large network and then iteratively prune connections and nodes that
Other methods start with a minimal network and then add connections and nodes until the network
Finally, there are algorithms that combine these “pruning” and “growing” approaches.
None of these algorithms are guaranteed to produce “ideal” networks. (It is not even clear how to
define
an “ideal” network.)
However, numerous algorithms exist that have been shown to yield good results for most
applications.
It is of the “network growing” type and can be used to build multi-layer networks of adequate size.
This learning algorithm is much faster than backpropagation learning, because only one neuron is
trained
at a time.
On the other hand, its inability to retrain neurons may prevent the cascade correlation network from
Covariance tells us something about the strength and direction (directly vs. inversely proportional) of
the
For many applications, it is useful to normalize this variable so that it ranges from -1 to 1.
The result is the correlation coefficient r, which for a dataset (xi, yi) with i = 1, …, n is given by:
Σ=
−−
ii
xxyy
cov(x,y) ( )( )
ΣΣ
==
−−
−−
==
ii
ii
iii
xxyy
xxyy
()()
( )( )
r corr(x, y)
In the case of high (close to 1) or low (close to -1) correlation coefficients, we can use one variable as
a
To quantify the linear relationship between the two variables, we can use linear regression:
52. What are the benefits to have smallest number of neurons in the network? 4
53. Develop a cascade correlation algorithm? Why it is used for? What are its advantages?
We start with a minimal network consisting of only the input neurons (one of them should be a
constant
The output neurons (and later the hidden neurons) typically use output functions that can also
produce
negative outputs; e.g., we can subtract 0.5 from our sigmoid function for a (-0.5, 0.5) output range.
Then we successively add hidden-layer neurons and train them to reduce the network error step by
step:
Weights to each new hidden node are trained to maximize the covariance of the node’s output with
the
Covariance:
: error of k-th output node for p-th input sample before the new node is added
It is of the “network growing” type and can be used to build multi-layer networks of adequate size.
Since we want to maximize S (as opposed to minimizing some error), we use gradient ascent:
: sign of the correlation between the node’s output and the k-th network output
: learning rate
: derivative of the node’s activation function with respect to its net input, evaluated at p-th pattern
If we can find weights so that the new node’s output perfectly covaries with the error in each output
node,
we can set the new output node weights and offsets so that the new error is zero.
More realistically, there will be no perfect covariance, which means that we will set each output
node
To do this, we can use gradient descent or linear regression for each individual output node weight.
ΣΣ
==
=−−
11
, , ) ( )( )
new w
new p x ,
kpE,
new k x and E
ΣΣ
==
Δ==−
pip
kkpk
iSEEfI
wS
,η()'
ipI,
kS
pf'
The next added hidden node will further reduce the remaining network error, and so on, until we
reach a
This learning algorithm is much faster than backpropagation learning, because only one neuron is
trained
at a time.
On the other hand, its inability to retrain neurons may prevent the cascade correlation network from
54. What are input space clusters and radial basic functions (RBFs)? 6
To achieve such local “receptive fields,” we can use radial basis functions, i.e., functions whose
output
only depends on the Euclidean distance μ between the input vector and another (“weight”) vector.
55. Explain linear interpolation for one dimensional and multidimensional case? 5
For function approximation, the desired output for new (untrained) inputs could be estimated by
linear
interpolation.
In the multi-dimensional case, hyperplane segments connect neighboring points so that the desired
output for a new input x0 is determined by the P0 known samples that surround it:
Where Dp is the Euclidean distance between x0 and xp and f(xp) is the desired output value for input
xp.
56. Explain different types of learning methods? What are counter propagation networks?
( ) ( / c)2
g ρ μ ∝ e− μ
( ) ( ) ( ( ) ( ))( )
()21
2101
01xx
fxfxfxfxxx
−−
=+
()()
12
0()−−
−−
DD
fxDfxDfx
()()()
11
11
00
...
...
()−−−
−−−
+++
+++
PP
DDD
DfDfDf
xxx
x12
( ) 5 4 7 6 5.5 1
2≈
+++
+++
=−−−−
−−−−
DDDD
fDDDD0x
Unsupervised/Supervised Learning ….
learning.
Although this network uses linear neurons, it can learn nonlinear functions by means of a hidden
layer of
competitive units.
Moreover, the network is able to learn a function and its inverse at the same time.
However, to simplify things, we will only consider the feedforward mechanism of the CPN.
If we are using such linear interpolation, then our radial basis function (RBF) ρ0 that weights an input
vector based on its distance to a neuron’s reference (weight) vector is ρ0(D) = D-1.
For the training samples xp, p = 1, …, P0, surrounding the new input x, we find for the network’s
output o:
(In the following, to keep things simple, we will assume that the network has only one output
neuron.
Since it is difficult to define what “surrounding” should mean, it is common to consider all P training
This, however, implies a network that has as many hidden nodes as there are training samples. This
in
unacceptable because of its computational complexity and likely poor generalization ability – the
network
It is more useful to have fewer neurons and accept that the training set cannot be learned 100%
accurately:
Here, ideally, each reference vector μi of these N neurons should be placed in the center of an
inputspace
cluster of training samples with identical (or at least similar) desired output ϕi.
To learn near-optimal values for the reference vectors and the output weights, we can – as usual –
employ gradient descent.
58. Write a note on distance and similarity functions with respect to counterpropagation network?
In the hidden layer, the neuron whose weight vector is most similar to the current input vector is the
“winner.”
There are different ways of defining such maximal similarity, for example:
A simple CPN with two input neurons, three hidden neurons, and two output neurons can be
described as
follows:
s(w, x) = w⋅ x
=Σ( − )
i i d(w, x) w x 2
1 ( ), where ( )
pp
ppddfx
o∝Σρx−x=
( ) Σ=
=−
ppd
P
o
1ρxx
( ) Σ=
=−
Nii
1ϕρxμ
The CPN learning process (general form for n input units and m output units):
2. If you use the cosine similarity function, normalize (shrink/expand to “length” 1) the input vector x
3. Initialize the input neurons with the resulting vector and compute the activation of the hidden-
layer
4. In the hidden (competitive) layer, determine the unit W with the largest activation (the winner).
5. Adjust the connection weights between W and all N input-layer units according to the formula:
6. Repeat steps 1 to 5 until all training patterns have been processed once.
7. Repeat step 6 until each input pattern is consistently associated with the same competitive unit.
8. Select the first vector pair in the training set (the current pattern).
10. Adjust the connection weights between the winning hidden-layer unit and all M output layer
units
11. Repeat steps 9 and 10 for each vector pair in the training set.
The assumption underlying Quickprop is that the network error as a function of each individual
weight can
be approximated by a paraboloid.
Based on this assumption, whenever we find that the gradient for a given weight switched its sign
between successive epochs, we should fit a paraboloid through these data points and use its
minimum as
Σ==
xxj
|| || 2
w (t 1) w (t) (x wH (t))
n Wn
Wn
Wn + = +α −
w (t 1) w (t) ( y wO (t))
m mW
mW
mW + = +β −
Newton’s method:
Notice that this method cannot be applied if the error gradient has not decreased in magnitude and
has
In that case, we would ascent in the error function or make an infinitely large weight modification.
In most cases, Quickprop converges several times faster than standard backpropagation learning.
The Rprop algorithm takes a very different approach to improving backpropagation as compared to
Quickprop.
Instead of making more use of gradient information for better weight updates, Rprop only uses the
sign of
the gradient, because its size can be a poor and noisy estimator of required weight updates.
Furthermore, Rprop assumes that different weights need different step sizes for updates, which vary
The basic idea is that if the error gradient for a given weight wij had the same sign in two consecutive
epochs, we increase its step size Δij, because the weight’s optimal value may be far away.
If, on the other hand, the sign switched, we decrease the step size.
Weights are always changed by adding or subtracting the current step size, regardless of the absolute
This way we do not “get stuck” with extreme weights that are hard to change because of the shallow
E = aw2 + bw+ c
E t aw t b
Et==+
( ) '( ) 2 ( ) E t aw t b
Et=−=−+
∂−
( 1) '( 1) 2 ( 1)
( 1)
'( ) '( 1)
( ) ( 1)
2 '( ) '( 1)
Δ−
−−
−−
−−
⇒=
wt
EtEt
wtwt
a E t E t ( 1)
Δ−
−−
⇒=−
wt
bEtEtEtwt
( 1) = 2 ( +1) + = 0
∂ + aw t b
Et
wtb
⇒ ( +1) = −
( 1)
'( ) '( 1)
( 1) ( 1)
Δ−
−−−Δ−
⎟ ⎟⎠
⎜ ⎜⎝
−−
Δ−
⇒+=
wt
EtEtwtEtwt
EtEt
w t w t '( 1) '( )
( 1) ( ) '( ) ( 1)
EtEt
wtwtEtwt
−−
Δ−
⇒+=+
⎪⎪⎪
<
⋅Δ
>
∂
⋅
⋅Δ
Δ=
−−
+−
, if 0
, if 0
( 1) ( )
( 1)
( 1) ( )
( 1)
()
tt
ij
ij
ij
ij
ij w
w
E
learning. Compared to both the standard backpropagation algorithm and Quickprop, Rprop has one
advantage:
Rprop does not require the user to estimate or empirically determine a step size parameter and its
change over time. Rprop will determine appropriate step size values by itself and can thus be applied
“as
A maxnet is a recurrent, one-layer network that uses competition to determine which of its nodes
has the
All pairs of nodes have inhibitory connections with the same weight -ε, where typically ε ≤ 1/(#
nodes).
In addition, each node has a self-excitatory connection to itself, whose weight θ is typically 1.
The nodes update their net input and their output by the following equations:
With each iteration, the neurons’ activations will decrease until only one neuron remains active.
This is the “winner” neuron that had the greatest initial input.
We can add maxnet connections to the hidden layer of a CPN to find the winner neuron.
=Σ
⎪⎪⎪
⎪⎪⎪
<
+Δ
>
−Δ
0 , otherwise
, if 0
, if 0
()
()
()
()
()
ij
t
ij
ij
ij
ij w
As you may remember, the counterpropagation network employs a combination of supervised and
unsupervised learning. We will now study Self-Organizing Maps (SOMs) as examples for completely
unsupervised learning (Kohonen, 1980). This type of artificial neural network is particularly similar to
In the human cortex, multi-dimensional sensory input spaces (e.g., visual input, tactile input) are
The projection from sensory inputs onto such maps is topology conserving.
This means that neighboring areas in these maps represent neighboring areas in the sensory input
space.
For example, neighboring areas in the sensory cortex are responsible for the arm and hand regions.
neighborhood.
Network structure:
A neighborhood function ϕ(i, k) indicates how closely neurons i and k in the output layer are
connected to each other. Usually, a Gaussian function on the distance between the two neurons
Their competitive learning algorithm is similar to the first (unsupervised) phase of CPN learning.
However, ART networks are able to grow additional neurons if a new input cannot be categorized
A greater value of ρ leads to more, smaller clusters (= input samples associated with the same winner
neuron).
We will only discuss ART-1 networks, which receive binary input vectors.
Bottom-up weights are used to determine output-layer candidates that may best match the current
input.
Top-down weights represent the “prototype” for the cluster defined by each output neuron.
A close match between input and prototype is necessary for categorizing the input.
Finding this match can require multiple signal exchanges between the two layers in both directions
until
Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a new weight
vector)
Stability: Existing clusters are not deleted by the introduction of new inputs (new clusters will just be
3. Repeat
a) Let j* be a node in A with largest yj, with ties being broken arbitrarily;
b) Compute s* = (s*
1,…,s*
n ) where s*
l = tl,j* xl ;
4. If A is empty, then create new node whose weight vector coincides with current input
pattern x;
end-while
Answer: B
Answer: B
67. For a minimum distance classifier with one input variable, what is the decision boundary
A. A line.
B. A curve.
C. A plane.
D. A hyperplane.
E. A discriminant.
Answer: E
68. For a Bayes classifier with two input variables, what is the decision boundary between two
classes?
A. A line.
B. A curve.
C. A plane.
D. A hypercurve.
E. A discriminant.
Answer: B
69. Design a minimum distance classifier with three classes using the following training data:
Then classify the test vector [0.5,−1]T with the trained classifier. Which class does this vector
belong to?
A. Class 1.
B. Class 2.
C. Class 3.
Answer: B
70. The decision function for a minimum distance classifier is dj(x) = xTmj – 1/2mj
Tmj where mj is
the prototype vector for class !j . What is the value of the decision function for each of the three
Answer: A
71. Is the following statement true or false? “An outlier is an input pattern that is very different
B. FALSE.
Answer: A
A. The ability of a pattern recognition system to approximate the desired output values for pattern
vectors
B. The ability of a pattern recognition system to approximate the desired output values for pattern
vectors
C. The ability of a pattern recognition system to extrapolate on pattern vectors which are not in the
training set.
D. The ability of a pattern recognition system to interpolate on pattern vectors which are not in the
test
set.
Answer: B
73. Is the following statement true or false? “In the human brain, roughly 70% of the neurons are
used for input and output. The remaining 30% are used for information processing.”
A. TRUE.
B. FALSE.
Answer: B
74. Which of the following statements is the best description of supervised learning?
A. “If a particular input stimulus is always active when a neuron fires then its weight should be
increased.”
B. “If a stimulus acts repeatedly at the same time as a response then a connection will form between
the
neurons involved. Later, the stimulus alone is sufficient to activate the response.”
C. “The connection strengths of the neurons involved are modified to reduce the error between the
Answer: C
75. Is the following statement true or false? “Artificial neural networks are parallel computing
A. TRUE.
B. FALSE.
Answer: A
76. Is the following statement true or false? “Knowledge is acquired by a neural network from its
environment through a learning process, and this knowledge is stored in the connections
A. TRUE.
B. FALSE
Answer: A
77. A neuron with 4 inputs has the weight vector w = [1, 2, 3, 4]T and a bias _ = 0 (zero). The
activation function is linear, where the constant of proportionality equals 2 — that is, the
activation function is given by f(net) = 2 × net. If the input vector is x = [4, 8, 5, 6]T then the output
A. 1.
B. 56.
C. 59.
D. 112.
E. 118.
Answer: E
78. Which of the following types of learning can used for training artificial neural networks?
A. Supervised learning.
B. Unsupervised learning.
C. Reinforcement learning.
Answer: D
C. Hopfield network.
80. Which of the following algorithms can be used to train a single-layer feedforward network?
C. A genetic algorithm.
Answer: D
81. What is the biggest difference between Widrow & Hoff’s Delta Rule and the Perceptron
A. There is no difference.
B. The Delta Rule is defined for step activation functions, but the Perceptron Learning Rule is defined
for
C. The Delta Rule is defined for sigmoid activation functions, but the Perceptron Learning Rule is
defined
D. The Delta Rule is defined for linear activation functions, but the Perceptron Learning Rule is
defined for
E. The Delta Rule is defined for sigmoid activation functions, but the Perceptron Learning Rule is
defined
Answer: D
82. Why are linearly separable problems interesting to neural network researchers?
A. Because they are the only problems that a neural network can solve successfully.
B. Because they are the only mathematical functions that are continuous.
C. Because they are the only mathematical functions that you can draw.
D. Because they are the only problems that a perceptron can solve successfully.
Answer: D
83. A perceptron with a unipolar step function has two inputs with weights w1 = 0.5 and w2 = −0.2,
and a threshold _ = 0.3 (_ can therefore be considered as a weight for an extra input which is
always set to -1).
For a given training example x = [0, 1]T , the desired output is 1. Does the perceptron give the
correct answer (that is, is the actual output the same as the desired output)?
A. Yes.
B. No.
Answer: B
84. The perceptron in question 22 is trained using the learning rule 4w = _ (d − y) x, where x is the
input vector, _ is the learning rate, w is the weight vector, d is the desired output, and y is the
actual output.
What are the new values of the weights and threshold after one step of training with the input
vector x = [0, 1]T and desired output 1, using a learning rate _ = 0.5?
Answer: C
85. The Perceptron Learning Rule states that “for any data set which is linearly separable, the
A. TRUE.
B. FALSE.
Answer: B
86. Is the following statement true or false? “The XOR problem can be solved by a multi-layer
perceptron but a multi-layer perceptron with bipolar step activation functions cannot learn to do
this.”
A. TRUE.
B. FALSE.
Answer: A
87. The Adaline neural network can be used as an adaptive filter for echo cancellation in
telephone circuits. For the telephone circuit given in the above figure, which one of the following
signals carries the corrected message sent from the human speaker on the left to the human
listener on the right? (Assume that the person on the left transmits an outgoing voice signal and
Answer: E
88. What is the credit assignment problem in the training of multi-layer feedforward networks?
Answer: E
89. Is the following statement true or false? “The generalized Delta rule solves the credit
A. TRUE.
B. FALSE.
Answer: A
90. A common technique for training MLFF networks is to calculate the generalization error on a
separate data set after each epoch of training. Training is stopped when the generalization error
A. Boosting.
B. Momentum.
C. Hold-one-out.
D. Early stopping.
Answer: E
91. Which of the following statements is NOT true for an autoassociative feedforward network
with
a single hidden layer of neurons?
A. During training, the target output vector is the same as the input vector.
C. The network could be trained using the backpropagation algorithm, but care must be taken to deal
with
D. After training, the hidden units give a representation that is equivalent to the principal
components of
E. The trained network can be split into two machines: the first layer of weights compresses the input
pattern (encoder), and the second layer of weights reconstructs the full pattern (decoder).
Answer: D
92. Which of the following statements is NOT true for a simple recurrent network (SRN)?
A. The training examples must be presented to the network in the correct order.
B. The test examples must be presented to the network in the correct order.
C. This type of network can predict the next chunk of data in the series from the past history of data.
D. The hidden units encode an internal representation of the data in the series that precedes the
current
input.
E. The number of context units should be the same as the number of input units.
Answer: E
93. How many hidden layers are there in an autoassociative Hopfield network?
A. None (0).
B. One (1).
C. Two (2).
Answer: A
94. A Hopfield network has 20 units. How many adjustable parameters does this network contain?
A. 95
B. 190
C. 200
D. 380
E. 400
Answer: B
95. Is the following statement true or false? “Patterns within a cluster should be similar in some
way.”
A. TRUE.
B. FALSE.
Answer: A
96. Is the following statement true or false? “Clusters that are similar in some way should be far
apart.”
A. TRUE.
B. FALSE.
Answer: B
97. Which of the following statements is NOT true for hard competitive learning (HCL)?
C. The input vectors are often normalized to have unit length — that is, k x k= 1.
D. The weights of the winning unit k are adapted by 4wk = _ (x − wk), where x is the input vector.
E. The weights of the neighbours j of the winning unit are adapted by 4wj = _j (x − wj ), where
_j < _ and j 6= k.
Answer: E
98. Which of the following statements is NOT true for a self-organizing feature map (SOFM)?
C. The network can grow during training by adding new cluster units when required.
D. The cluster units are arranged in a regular geometric pattern such as a square or ring.
E. The learning rate is a function of the distance of the adapted units from the winning unit.
Answer: C
Answer: A
C. copying the fittest member of each population into the mating pool.
D. preventing too many similar individuals from surviving to the next generation.
Answer: B
102. Is the following statement true or false? “A genetic algorithm could be used to search the
space of possible weights for training a recurrent artificial neural network, without requiring any
gradient information.”
A. TRUE.
B. FALSE.
Answer: A
103. Is the following statement true or false? “Learning produces changes within an agent that
A. TRUE.
B. FALSE.
Answer: A
104. Which application in intelligent mobile robots made use of a single-layer feedforward
network?
A. Goal finding.
B. Path planning.
C. Wall following.
D. Route following.
E. Gesture recognition.
Answer: C
105. Which application in intelligent mobile robots made use of a self-organizing feature map?
A. Goal finding.
B. Path planning.
C. Wall following.
D. Route following.
E. Gesture recognition.
Answer: D
106. Which application in intelligent mobile robots made use of a genetic algorithm?
A. Goal finding.
B. Path planning.
C. Wall following.
D. Route following.
E. Gesture recognition.
Answer: B