0% found this document useful (0 votes)
5 views20 pages

UNIT-1 Notes

Machine learning is a subset of AI that enables systems to learn from data and improve automatically without explicit programming. It has advantages such as identifying patterns and trends, and operates without human intervention, while also being applicable in various neural network architectures like perceptrons and multilayer perceptrons. Training methods include supervised and unsupervised approaches, with loss functions playing a critical role in evaluating model performance.

Uploaded by

mommopps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

UNIT-1 Notes

Machine learning is a subset of AI that enables systems to learn from data and improve automatically without explicit programming. It has advantages such as identifying patterns and trends, and operates without human intervention, while also being applicable in various neural network architectures like perceptrons and multilayer perceptrons. Training methods include supervised and unsupervised approaches, with loss functions playing a critical role in evaluating model performance.

Uploaded by

mommopps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT-1

Que 1.1. Define the term Machine Learning.


Answer
1. Machine learning is an application of Artificial Intelligence (AI) that
provides systems the ability to automatically learn and improve from
experience without being explicitly programmed.
2. Machine learning focuses on the development of computer programs
that can access data.
3. The primary aim is to allow the computers to learn automatically without
human intervention or assistance and adjust actions accordingly.
4. Machine learning enables analysis of massive quantities of data.
5. It generally delivers faster and more accurate results in order to identify
profitable opportunities or dangerous risks.
6. Combining machine learning with AI and cognitive technologies can
make it even more effective in processing large volumes of information.
Que 1.2. What are the advantages and disadvantages of machine
learning ?
Answer
Advantages of machine learning are :
1. Easily identifies trends and patterns :
a. Machine learning can review large volumes of data and discover
specific trends and patterns that would not be apparent to humans.
b. For an e-commerce website like Flipkart, it serves to understand
the browsing behaviours and purchase histories of its users to help
cater to the right products, deals, and reminders relevant to them.
c. It uses the results to reveal relevant advertisements to them.
2. No human intervention needed (automation) : Machine learning
does not require physical force i.e., no human intervention is neededThese points are called support
vectors, all having the same distance to
the dividing line.
4. To find the support vectors, there is an efficient optimizing algorithm.
Optimal dividing hyper plane is determined by a few parameters, namely
by the support vectors.
5. Support vector machines, apply this algorithm to non-linearly separable
problems in a two-step process :
a. In the first step, a non-linear transformation is applied to the data,
with the property that the transformed data is linearly separable.
b. In the second step, the support vectors are then determined in the
transformed space.
6. It is always possible to make the classes linearly separable by
transforming the vector space, as long as the data contains no
contradictions.
7. Such a separation can be reached, for example by introducing a new
(n + 1)th dimension and the definition,
xn+1 =
11
02
if class
if class
􀂭􀂏􀂽
􀂮􀂏􀂾􀂯􀂿
x
x
8. It can be shown that there are such generic transformations even for
arbitrarily shaped class division boundaries in the original vector space.
In the transformed space, the data are then linearly separable.
9. However, the number of dimensions of the new vector space grows
exponentially with the number of dimensions of the original vector
space.
10. However, the large number of new dimensions is not so problematic
because, when using support vectors, the dividing plane, as mentioned
above, is determined by only a few parameters.
11. The central non-linear transformation of the vector space is called the
kernel, because of which support vector machines are also known as
kernel methods.
12. The original SVM theory developed for classification has been extended
and can now be used on regression problems also.
Que 1.8. What is perceptron model ? Explain its working.
Answer
1. The perceptron is the simplest form of a neural network used for
classification of patterns said to be linearly separable.
2. It consists of a single neuron with adjustable synaptic weights and bias.
3. The perceptron build around a single neuron is limited for performing
pattern classification with only two classes. By expanding the output layer of perceptron to include
more than one
neuron, more than two classes can be classified.
5. Suppose, a perceptron have synaptic weights denoted by w1, w2, w3, …..
wm.
6. The input applied to the perceptron are denoted by x1, x2, …… xm.
7. The externally applied bias is denoted by b.

From the model, we find that the hard limiter input or induced local field
of the neuron as
9. The goal of the perceptron is to correctly classify the set of externally

applied input x1, x2, ...… xm into one of two classes G1 and G2.
10. The decision rule for classification is that if output y is +1 then assign the
point represented by input x1, x2, ……. xm to class G1 else y is –1 then
assign to class G2.
11. In Fig. 1.8.2, if a point (x1, x2) lies below the boundary lines is assigned to
class G2 and above the line is assigned to class G1. Decision boundary is
calculated as :

12. There are two decision regions separated by a hyperplane defined as

The synaptic weights w1, w2, …….. wm of the perceptron can be adapted
on an iteration by iteration basis.
13. For the adaption, an error-correction rule known as perceptron
convergence algorithm is used.
14. For a perceptron to function properly, the two classes G1 and G2 must be
linearly separable.
15. Linearly separable means, the pattern or set of inputs to be classified
must be separated by a straight line.
16. Generalizing, a set of points in n-dimensional space are linearly separable
if there is a hyperplane of (n – 1) dimensions that separates the sets.

Write short note on logistic regression.


Answer
1. Logistic regression is a supervised classification algorithm. It is based on
maximum likelihood estimation.
2. In a classification problem, the target variable (output) y, can take only
discrete values for given set of features (or inputs) x.
3. Logistic regression assumes the binomial distribution of the dependent
variable. In logistic regression, we predict the value by 1 or 0.
4. Logistic regression builds a regression model to predict the probability
that a given data entry belongs to the category numbered as 1. As linear
regression models the data using the linear function, logistic regression
models the data using the sigmoid function as :

Activation function is used to convert a linear regression equation to the


logistic regression equation
Explain different types of neuron connection with architecture.
Answer
Different types of neuron connection are :
1. Single-layer feed forward network

In this type of network, we have only two layers i.e., input layer and output layer but input
layer does not count because no computation is performed in this layer. Output layer is
formed when different weights are applied on input nodes and the cumulative effect per node
is taken.
c. After this the neurons collectively give the output layer to compute the output signals.

2. Multilayer feed forward network :


a. This layer has hidden layer which is internal to the network and has no direct contact with
the external layer.
b. Existence of one or more hidden layers enables the network to be computationally stronger.
c. There are no feedback connections in which outputs of the model are fed back into itself.

Single node with its own feedback :


a. When outputs can be directed back as inputs to the same layer or preceding layer nodes,
then it results in feedback networks.
b. Recurrent networks are feedback networks with closed loop. Fig. 1.10.1 shows a single
recurrent network having single neuron with feedback to itself.

Single-layer recurrent network :

a. This network is single layer network with feedback connection in which processing
element’s output can be directed back to itself or to other processing element or both.
b. Recurrent neural network is a class of artificial neural network where connections between
nodes form a directed graph along a sequence.
c. This allows it to exhibit dynamic temporal behaviour for a time sequence. Unlike feed
forward neural networks, RNNs can use their internal state (memory) to process sequences of
inputs.

Multilayer recurrent network :


a. In this type of network, processing element output can be directed to the processing
element in the same layer and in the preceding layer forming a multilayer recurrent network.
b. They perform the same task for every element of a sequence, with the output being
depended on the previous computations. Inputs are not needed at each time step.
c. The main feature of a multilayer recurrent neural network is its hidden state, which captures
information about a sequence
Explain single layer neural network.
Answer
1. A single-layer neural network represents the most simple form of neural
network, in which there is only one layer of input nodes that send
weighted inputs to a subsequent layer of receiving nodes, or in some
cases, one receiving node.
2. This single-layer design was part of the foundation for systems which
have now become much more complex.
3. Single-layer neural networks can also be thought of as part of a class of
feedforward neural networks, where information only travels in one
direction, through the inputs, to the output.
4. Adaline network is an example of single layer neural network.
Adaline network : Refer Q. 1.6, Page 1–5M,Unit-1.
Que 1.12. Explain multilayer perceptron with its architecture and
characteristics.
Answer
Multilayer perceptron :
1. The perceptrons which are arranged in layers are called multilayer
perceptron. This model has three layers : an input layer, output layer
and hidden layer.
2. For the perceptrons in the input layer, the linear transfer function used
and for the perceptron in the hidden layer and output layer, the sigmoidal
or squashed-S function is used.
3. The input signal propagates through the network in a forward direction.
4. On a layer by layer basis, in the multilayer perceptron bias b(n) is treated
as a synaptic weight driven by fixed input equal to +1.
x(n) = [+1, x1(n), x2(n), ………. xm(n)]T
where n denotes the iteration step in applying the algorithm.
Correspondingly, we define the weight vector as :
w(n) = [b(n), w1(n), w2(n)……….., wm(n)]T
5. Accordingly, the linear combiner output is written in the compact form :

The algorithm for adapting the weight vector is stated as :


1. If the nth number of input set x(n), is correctly classified into linearly
separable classes, by the weight vector w(n) (that is output is correct)
then no adjustment of weights are done.

Otherwise, the weight vector of the perceptron is updated in accordance


with the rule.
Architecture of multilayer perceptron :

Fig. 1.12.1 shows architectural graph of multilayer perceptron with two hidden layer and an
output layer.
2. Signal flow through the network progresses in a forward direction, from the left to right
and on a layer-by-layer basis.
3. Two kinds of signals are identified in this network :
a. Functional signals : Functional signal is an input signal and propagates forward and
emerges at the output end of the network as an output signal.
b. Error signals : Error signal originates at an output neuron and
propagates backward through the network.
4. Multilayer perceptrons have been applied successfully to solve some
difficult and diverse problems by training them in a supervised manner
with highly popular algorithm known as the error backpropagation
algorithm.
Characteristics of multilayer perceptron :
1. In this model, each neuron in the network includes a non-linear activation function (non-
linearity is smooth). Most commonly used non-linear function is defined by :
where vj is the induced local field (i.e., the sum of all weights and bias)
and y is the output of neuron j.
2. The network contains hidden neurons that are not a part of input or
output of the network. Hidden layer of neurons enabled network to
learn complex tasks.
2. The network exhibits a high degree of connectivity.
What a shallow network computes ?
Answer
1. Shallow networks are the neural networks with less depth i.e., less number of hidden layers.
2. These neural networks have one hidden layer and an output layer.
3. Shallow neural networks is a term used to describe neural network that usually have only one
hidden layer as opposed to deep neural network which have several hidden layers.
4. Fig. 1.13.1, below shows a shallow neural network with single hidden layer, single input layer and
single output layer :

5. The neurons present in the hidden layer of our shallow neural network compute the following :
The superscript number [i] denotes the layer number and the subscript number j denotes the neuron
number in a particular layer.

Briefly explain training a network.


Answer
1. Once a network has been structured for a particular application, that network is ready to be
trained.
2. To start this process the initial weights are chosen randomly. Then, the
3. training begins.
3. There are two approaches to training :
a. Supervised training :
i. In supervised training, both the inputs and the outputs are provided.
ii. The network then processes the inputs and compares its resulting outputs against the
desired outputs.
iii. Errors are then propagated back through the system, causing the system to adjust the
weights which controls the network.
iv. This process occurs over and over as the weights are continually tweaked.
4. v. The set of data which enables the training is called the “training set.” During the training of
a network the same set of data is processed many times as the connection weights are ever
refined.
b. Unsupervised (adaptive) training :
i. In unsupervised training, the network is provided with inputs but not with desired outputs.
ii. The system itself must then decide what features it will use to group the input data. This is
often referred to as self-organization or adaption.
iii. This adaption to the environment is the promise which would enable science fiction types
of robots to continually learn on their own as they encounter new situations and new
environments.
Que 1.15. Write short note on loss function.
Answer
Loss function estimates how well particular algorithm models the provided
data. Loss functions are classified into two classes based on the type of
learning task as :
1. Regression losses :
a. Mean squared error (Quadratic Loss or L2 Loss) : It is the average of the squared difference
between predictions and actual observations.
Backpropagation generalizes the gradient computation in the delta rule
and is in turn generalized by automatic differentiation, where
backpropagation is a special case of reverse accumulation (reverse mode).

Que 1.17. How tuning parameters effect the backpropagation neural network ?
Answer
Effect of tuning parameters of the backpropagation neural network :
1. Momentum factor :
a. The momentum factor has a significant role in deciding the values
of learning rate that will produce rapid learning.
b. It determines the size of change in weights or biases.
c. If momentum factor is zero, the smoothening is minimum and the
entire weight adjustment comes from the newly calculated change.
d. If momentum factor is one, new adjustment is ignored and previous
one is repeated.
e. Between 0 and 1 is a region where the weight adjustment is smoothened by an amount
proportional to the momentum factor.
f. The momentum factor effectively increases the speed of learning without leading to
oscillations and filters out high frequency variations of the error surface in the weight space.
2. Learning coefficient :
a. An formula to select learning coefficient has been :

Where N1 is the number of patterns of type 1 and m is the number of different pattern types.
b. The small value of learning coefficient less than 0.2 produces slower but stable training.
c. The largest value of learning coefficient i.e., greater than 0.5, the weights are changed
drastically but this may cause optimum combination of weights to be overshot resulting in
oscillations about the optimum.
d. The optimum value of learning rate is 0.6 which produce fast learning without leading to
oscillations.
3. Sigmoidal gain :
a. If sigmoidal function is selected, the input-output relationship of the neuron can be set as
O = (1 )
1
(1 􀀎 e􀀐 􀁏 􀀎 )
...(1.17.1)
where 􀁏 is a scaling factor known as sigmoidal gain.
b. As the scaling factor increases, the input-output characteristic of
the analog neuron approaches that of the two state neuron or the
activation function approaches the (Satisifiability) function.
c. It also affects the backpropagation. To get graded output, as the
sigmoidal gain factor is increased, learning rate and momentum
factor have to be decreased in order to prevent oscillations.
4. Threshold value :
a. 􀁔 in equation (1.17.1) is called as threshold value or the bias or the
noise factor.
b. A neuron fires or generates an output if the weighted sum of the
input exceeds the threshold value.
c. One method is to simply assign a small value to it and not to change
it during training.
d. The other method is to initially choose some random values and
change them during training.
Que 1.18. Write short note on gradient descent.
Answer
1. Gradient descent is an optimization technique in machine learning and
deep learning and it can be used with all the learning algorithms.
2. A gradient is the slope of a function, the degree of change of a parameter
with the amount of change in another parameter.
3. Mathematically, it can be described as the partial derivatives of a set of
parameters with respect to its inputs. The more the gradient, the steeper
the slope.
4. Gradient descent is a convex function.
5. Gradient descent can be described as an iterative method which is used
to find the values of the parameters of a function that minimizes the
cost function as much as possible.
6. The parameters are initially defined a particular value and from that,
Gradient descent is run in an iterative fashion to find the optimal values
of the parameters, using calculus, to find the minimum possible value of
the given cost function.
Que 1.19. Discuss selection of various parameters in Backpropagation Neural Network
(BPN).
Selection of various parameters in BPN :
1. Number of hidden nodes :
a. The guiding criterion is to select the minimum nodes in the first
and third layer, so that the memory demand for storing the weights
can be kept minimum.
b. The number of separable regions in the input space M, is a function
of the number of hidden nodes H in BPN and H = M – 1.
c. When the number of hidden nodes is equal to the number of training
patterns, the learning could be fastest.
d. In such cases, BPN simply remembers training patterns losing all
generalization capabilities.
e. Hence, as far as generalization is concerned, the number of hidden
nodes should be small compared to the number of training patterns
with help of VCdim (Vapnik Chervonenkis dimension) probability
theory.
f. We can estimate the selection of number of hidden nodes for a
given number of training patterns as number of weights which is
equal to I1 * I2 + I2 * I3, where I1 and I3 denote input and output
nodes and I2 denote hidden nodes.
g. Assume the training samples T to be greater than VCdim. Now if
we accept the ratio 10 : 1
Explain different types of gradient descent.
Answer
Different types of gradient descent are :
1. Batch gradient descent :
a. This is a type of gradient descent which processes all the training
example for each iteration of gradient descent.
b. When the number of training examples is large, then batch gradient
descent is computationally very expensive. So, it is not preferred.
c. Instead, we prefer to use stochastic gradient descent or
mini-batch gradient descent.
2. Stochastic gradient descent :
a. This is a type of gradient descent which processes single training
example per iteration.
b. Hence, the parameters are being updated even after one iteration
in which only a single example has been processed.
c. Hence, this is faster than batch gradient descent. When the number
of training examples is large, even then it processes only one
example which can be additional overhead for the system as the
number of iterations will be large.
Mini-batch gradient descent :
a. This is a mixture of both stochastic and batch gradient descent.
b. The training set is divided into multiple groups called batches.
c. Each batch has a number of training samples in it.
d. At a time, a single batch is passed through the network which
computes the loss of every sample in the batch and uses their
average to update the parameters of the neural network.
Que 1.21. What are the advantages and disadvantages of
stochastic gradient descent ?
Answer
Advantages of stochastic gradient descent :
1. It is easier to fit into memory due to a single training sample being
processed by the network.
2. It is computationally fast as only one sample is processed at a time.
3. For larger datasets it can converge faster as it causes updates to the
parameters more frequently.
4. Due to frequent updates the steps taken towards the minima of the loss
function have oscillations which can help getting out of local minimums
of the loss function (in case the computed position turns out to be the
local minimum).
Disadvantages of stochastic gradient descent :
1. Due to frequent updates the steps taken towards the minima are very
noisy. This can often lead the gradient descent into other directions.
2. Also, due to noisy steps it may take longer to achieve convergence to the
minima of the loss function.
3. Frequent updates are computationally expensive due to using all
resources for processing one training sample at a time.
4. It loses the advantage of vectorized operations as it deals with only a
single example at a time.
Que 1.22. Explain neural networks as universal function
approximation.
Answer
1. Feedforward networks with hidden layers provide a universal approximation
framework.
2. The universal approximation theorem states that a feedforward network
with a linear output layer and atleast one hidden layer with any
“squashing” activation function can approximate any Borel measurable
function from one finite-dimensional space to another with any desired
non-zero amount of error, provided that the network is given enough
hidden units. The derivatives of the feedforward network can also approximate the
derivatives of the function.
4. The concept of Borel states that for any continuous function on a closed
and bounded subset of Rn is Borel measurable and therefore may be
approximated by a neural network.
5. The universal approximation theorem means that a large MLP will be
able to represent function.
6. Even if the MLP is able to represent the function, learning can fail for
two different reasons.
a. Optimization algorithm used for training may not be able to find
the value of the parameters that corresponds to the desired function.
b. Training algorithm might choose the wrong function due to
overfitting.
7. Feedforward networks provide a universal system for representing
functions, in the sense that, given a function, there exists a feedforward
network that approximates the function.
8. There is no universal procedure for examining a training set of specific
examples and choosing a function that will generalize to points not in
the training set.
9. The universal approximation theorem says that there exists a network
large enough to achieve any degree of accuracy we desire, but the
theorem does not say how large this network will be.
10. Scientists provide some bounds on the size of a single-layer network
needed to approximate a broad class of functions. In the worse case, an
exponential number of hidden units.
11. This is easiest to see in the binary case : the number of possible binary
functions on vectors v 􀂏 {0,1}n is 22n and selecting one such function
requires 2n bits, which will in general require O(2n) degrees of freedom.
12. A feedforward network with a single layer is sufficient to represent any
function, but the layer may be infeasibly large and may fail to learn and
generalize correctly.

You might also like