ANN - CAE-II Important Questions
ANN - CAE-II Important Questions
ANN - CAE-II Important Questions
The threshold is one of the key components of the perceptron. It determines, based
on the inputs, whether the perceptron fires or not. Basically, the perceptron
takes all of the weighted input values and adds them together. If the sum is
above or equal to some value (called the threshold) then the perceptron fires.
What is the threshold value in a neural network?
These certain conditions which differ neuron to neuron are called Threshold. For
example, if the input X1 into the first neuron is 30 and X2 is 0: This neuron
will not fire, since the sum 30+0 = 30 is not greater than the threshold i.e 100.
● The decision boundaries that are the threshold boundaries are only
allowed to be hyperplanes.
→ An adaptive filter is a system with a linear filter that has a transfer function controlled by variable
parameters and a means to adjust those parameters according to an optimization algorithm.
Because of the complexity of the optimization algorithms, almost all adaptive filters are digital filters.
Adaptive filters are required for some applications because some parameters of the desired
processing operation (for instance, the locations of reflective surfaces in a reverberant space) are
not known in advance or are changing. The closed loop adaptive filter uses feedback in the form of
an error signal to refine its transfer function.
▪ System identification
▪ Inverse modeling
❖ Filtering process
❖ Adaptive process
Filtering process:
Here two signals are computed-an output and an error signal.
Adaptive process:
Here the automatic adjustment of the synaptic weights of the neuron in
accordance with the error signal is done.
Above two process constitute a feedback loop acting around the neuron. The
manner in which the error signal is used to control the adjustment to synaptic
weights is determined by the cost function used to derive the adaptive filtering
algorithm.
Limitations of single layer perceptron:
• Newton’s Method
• Gauss-Newton Method
→ Least squares filters are best used mainly for slowly changing variables,
because they can give quirky results for signals with higher frequencies. (A
step input can be thought of as containing all frequencies). Higher-order
polynomial filters should probably be avoided for filtering because the
response to higher frequencies gets even more quirky, This is less of an
issue for smoothing.
11) What is the ADALINE networks algorithm?
→ Widrow and his graduate student Hoff introduced the ADALINE network and
learning rule which they called the LMS(Least Mean Square) Algorithm.
The linear networks (ADALINE) are similar to the perceptron, but their transfer
function is linear rather than hard-limiting.
This allows their outputs to take on any value, whereas the perceptron output is
limited to either 0 or 1.
Linear networks, like the perceptron, can only solve linearly separable problems.
→ Widrow and Hoff had the insight that they could estimate the mean square error
by using the squared error at each iteration. The LMS algorithm or Widrow-Hoff
learning algorithm, is based on an approximate steepest descent procedure.
→ Perceptron is the fundamental unit of a neural network which is linear in nature capable of
doing binary classifications. A perceptron can have multiple inputs but outputs only a binary
label.
Where ‘w’ is the weight applied to each input, and b is the bias term. You might have guessed
that this is going to be a straight line with the slope of ‘w’, and y-intercept of ‘b’. Bias is
beneficial in moving the decision boundaries in either direction.
● Step activation function: Indicates that given the value of the logit, whether or
not a neuron should be fired from this perceptron. Step wise activation function
can be described as follows:
This means that a neuron will only be fired if the value of the logit function is greater than or
equals to 0.
In case of a single input perceptron, the decision boundary is a linear line. In case of multi-input
perceptrons, the decision boundary expands to a hyperplane which is one dimension less than the
dimension of the surface it resides in.
→ Changing the learning rate for your stochastic gradient descent optimization
technique can improve performance while also cutting down on training time. This is
also known as adaptable learning rates or learning rate annealing. This method is
referred to as a learning rate schedule since the default schedule updates network weights
at a constant rate for each training period.
Techniques that reduce the learning rate over time are the simplest and arguably most
commonly used modification of the learning rate during training. These have the
advantage of making big modifications at the start of the training procedure when larger
learning rate values are employed and decreasing the learning rate later in the training
procedure when a smaller rate and hence smaller training updates are made to weights.
15) What is the Learning Rate Schedule for Training Models?
→ A Learning rate schedule is a predefined framework that adjusts the learning rate between
epochs or iterations as the training progresses. Two of the most common techniques for learning
rate schedule are,
● Constant learning rate: as the name suggests, we initialize a learning rate and don’t
change it during training;
● Learning rate decay: we select an initial learning rate, then gradually reduce it in
accordance with a scheduler.
Knowing what learning rate schedules are, you must be wondering why we need to decrease the
learning rate in the first place? Well, in a neural network, our model weights are updated as:
where eta is the learning rate, and partial derivative is the gradient.
For the training process, this is good. Early in the training, the learning rate is set to be large in
order to reach a set of weights that are good enough. Over time, these weights are fine-tuned to
reach higher accuracy by leveraging a small learning rate.
Proof. It is immediate from the code that should the algorithm terminate and return
a weight
vector, then the weight vector must separate the ` points from the ́ points. Thus, it
suffices
to show that the algorithm terminates after at most R2 updates. In other words, we
need to γ2
Note that w1 “ 0, and for k ě 1, note that if xj is the misclassified point during
iteration
k, we have
“ wk ̈ w‹ ` yjpxj ̈ w‹q
ą wk ̈ w‹ ` γ.
ď }wk}2 ` R2,