ANN - CAE-II Important Questions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

IMPORTANT QUESTIONS for CAE-II

1) What is a single layer perceptron in a neural network?

→ Single-layer perceptron is also called the feed-forward neural network. The


working of the single-layer perceptron (SLP) is based on the threshold transfer
between the nodes. This is the simplest form of ANN and it is generally used in
the linearly based cases for the machine learning problems.

2) What is a threshold in a perceptron?

→ Threshold in a perceptron network:

The threshold is one of the key components of the perceptron. It determines, based
on the inputs, whether the perceptron fires or not. Basically, the perceptron
takes all of the weighted input values and adds them together. If the sum is
above or equal to some value (called the threshold) then the perceptron fires.
What is the threshold value in a neural network?

These certain conditions which differ neuron to neuron are called Threshold. For
example, if the input X1 into the first neuron is 30 and X2 is 0: This neuron
will not fire, since the sum 30+0 = 30 is not greater than the threshold i.e 100.

3) What are the Limitations of a single layer perceptron?

→ * This neural network can represent only a limited set of functions.

● The decision boundaries that are the threshold boundaries are only

allowed to be hyperplanes.

● This model only works for the linearly separable data.

● A "single-layer" perceptron can't implement XOR. The reason is because


the classes in XOR are not linearly separable. You cannot draw a straight
line to separate the points (0,0),(1,1) from the points (0,1),(1,0). Led to the
invention of multi-layer networks.
4) What is adaptive filtering in neural networks?

→ Adaptive filtering in neural network:

An adaptive filter automatically adjusts Its own Impulse response. In this


paper adaptive noise canceller and adaptive signal enhancer systems are
implemented using feedforward and recurrent neural networks using back
propagation algorithm and real time recurrent learning algorithm respectively
for training.

5) What is adaptive filtration?

→ An adaptive filter is a system with a linear filter that has a transfer function controlled by variable
parameters and a means to adjust those parameters according to an optimization algorithm.
Because of the complexity of the optimization algorithms, almost all adaptive filters are digital filters.
Adaptive filters are required for some applications because some parameters of the desired
processing operation (for instance, the locations of reflective surfaces in a reverberant space) are
not known in advance or are changing. The closed loop adaptive filter uses feedback in the form of
an error signal to refine its transfer function.

6) What are the Properties of adaptive filters?

→ Properties of adaptive filter:

a) The principle property of an adaptive filter is its time-varying,


self-adjusting characteristics. An adaptive filter usually takes on the form
of an FIR filter structure, with an adaptive algorithm that continually updates
the filter coefficients, such that an error signal is minimized according to
some criterion.

7) What are Types of adaptive filters?


→ Types of adaptive filters:

The classical configurations of adaptive filtering are :

▪ System identification

▪ Prediction, noise cancellation

▪ Inverse modeling

8) What is the Adaptive filtering problem, Explain with example?

→ Adaptive filtering problem:

• Adaptive filtration is automatic removal of errors. The problem is how to


design a multiple input-single output model of the unknown dynamical system
by building it around a single linear neuron. Adaptive filter  operation consists
of two continuous processes-

❖ Filtering process 

❖ Adaptive process
Filtering process:
Here two signals are computed-an output and an error signal.
Adaptive process: 
Here the automatic adjustment of the synaptic weights of the neuron in
accordance with the error signal is done.
Above two process constitute a feedback loop acting around the neuron. The
manner in which the error signal is used to control the adjustment to synaptic
weights is determined by the cost function used to derive the adaptive filtering
algorithm.
Limitations of single layer perceptron:

• A "single-layer" perceptron can't implement XOR. The reason is because the


classes in XOR are not linearly separable. You cannot draw a straight line to
separate the points (0,0),(1,1) from the points (0,1),(1,0). Led to invention of
multi-layer networks.

9)What are the names of unconstrained optimization methods or techniques?

→ It is a measure of how to choose the weight vector of an adaptive filtering


algorithm so that it behaves in an optimum manner. Unconstrained optimization
problem is stated as-“Minimize the cost function with respect to the weight
vector”.

There are three methods:

• Method of Steepest Descent

• Newton’s Method

• Gauss-Newton Method

What linear least square filter?

→ Least squares filters are best used mainly for slowly changing variables,
because they can give quirky results for signals with higher frequencies. (A
step input can be thought of as containing all frequencies). Higher-order
polynomial filters should probably be avoided for filtering because the
response to higher frequencies gets even more quirky, This is less of an
issue for smoothing.
11) What is the ADALINE networks algorithm?

→ Widrow and his graduate student Hoff introduced the ADALINE network and
learning rule which they called the LMS(Least Mean Square) Algorithm.

The linear networks (ADALINE) are similar to the perceptron, but their transfer
function is linear rather than hard-limiting.
This allows their outputs to take on any value, whereas the perceptron output is
limited to either 0 or 1.

Linear networks, like the perceptron, can only solve linearly separable problems.

12) What is the limitation of the LMS algorithm?

→ Widrow and Hoff had the insight that they could estimate the mean square error
by using the squared error at each iteration. The LMS algorithm or Widrow-Hoff
learning algorithm, is based on an approximate steepest descent procedure.

13) What is the learning rate annealing in perceptrons?

→ Perceptron is the fundamental unit of a neural network which is linear in nature capable of
doing binary classifications. A perceptron can have multiple inputs but outputs only a binary
label.

A perceptron consists of:


● Logit: The equation of Logit resonates to the equation of a straight line i.e. y =
mx+c. This equation represents a straight line with a slope of ‘m’, and y
intercept of ‘c’.

In a similar way, Logit function in a perceptron is represented as:

Where ‘w’ is the weight applied to each input, and b is the bias term. You might have guessed
that this is going to be a straight line with the slope of ‘w’, and y-intercept of ‘b’. Bias is
beneficial in moving the decision boundaries in either direction.

● Step activation function: Indicates that given the value of the logit, whether or
not a neuron should be fired from this perceptron. Step wise activation function
can be described as follows:
This means that a neuron will only be fired if the value of the logit function is greater than or
equals to 0.

In case of a single input perceptron, the decision boundary is a linear line. In case of multi-input
perceptrons, the decision boundary expands to a hyperplane which is one dimension less than the
dimension of the surface it resides in.

Hence summing up,

Perceptron = Logit + Step function

14) What is Learning Rate Annealing? 

→ Changing the learning rate for your stochastic gradient descent optimization
technique can improve performance while also cutting down on training time. This is
also known as adaptable learning rates or learning rate annealing. This method is
referred to as a learning rate schedule since the default schedule updates network weights
at a constant rate for each training period.

Techniques that reduce the learning rate over time are the simplest and arguably most
commonly used modification of the learning rate during training. These have the
advantage of making big modifications at the start of the training procedure when larger
learning rate values are employed and decreasing the learning rate later in the training
procedure when a smaller rate and hence smaller training updates are made to weights.
15) What is the Learning Rate Schedule for Training Models?

→ A Learning rate schedule is a predefined framework that adjusts the learning rate between
epochs or iterations as the training progresses. Two of the most common techniques for learning
rate schedule are,

● Constant learning rate: as the name suggests, we initialize a learning rate and don’t
change it during training;
● Learning rate decay: we select an initial learning rate, then gradually reduce it in
accordance with a scheduler.

Knowing what learning rate schedules are, you must be wondering why we need to decrease the
learning rate in the first place? Well, in a neural network, our model weights are updated as:

where eta is the learning rate, and partial derivative is the gradient.

For the training process, this is good. Early in the training, the learning rate is set to be large in
order to reach a set of weights that are good enough. Over time, these weights are fine-tuned to
reach higher accuracy by leveraging a small learning rate.

16) What is perceptron Convergence Theorem?

→ Perceptron convergence:- The Perceptron Learning Algorithm makes at most R2


γ2

updates (after which it returns a separating hyperplane).

Proof. It is immediate from the code that should the algorithm terminate and return
a weight

vector, then the weight vector must separate the ` points from the ́ points. Thus, it
suffices
to show that the algorithm terminates after at most R2 updates. In other words, we
need to γ2

show that k is upper-bounded by R2 . Our strategy to do so is to derive both lower


and upper γ2

and bounds on the length of wk`1 in terms of k, and to relate them.

Note that w1 “ 0, and for k ě 1, note that if xj is the misclassified point during
iteration

k, we have

wk`1 ̈ w‹ “ pwk ` yjxjq ̈ w‹

“ wk ̈ w‹ ` yjpxj ̈ w‹q

ą wk ̈ w‹ ` γ.

It follows by induction that wk`1 ̈ w‹ ą kγ. Since wk`1 ̈ w‹ ď }wk`1}}w‹} “


}wk`1}, we get

}wk`1} ą kγ. (1) To obtain an upper bound, we argue that

}wk`1}2 “}wk `yjxj}2

“ }wk}2 ` }yjxj}2 ` 2pwk ̈ xjqyj

“ }wk}2 ` }xj}2 ` 2pwk ̈ xjqyj ď }wk}2 ` }xj}2

ď }wk}2 ` R2,

from which it follows by induction that

Together, (1) and (2) yield

which implies k ă R2 . Our proof is done.

You might also like