0% found this document useful (0 votes)
30 views10 pages

MLT Answer Key

Uploaded by

Uma Maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views10 pages

MLT Answer Key

Uploaded by

Uma Maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

EXCEL ENGINEERING COLLEGE, KOMARAPALAYAM

(AUTONOMOUS)
B.E / B.Tech INTERNAL ASSESSMENT EXAMINATION–II, SEP-
2024
V- Semester
20AI502 MACHINE LEARNING TECHNIQUES

PART - A

1)Define multilayer perceptron.


 A multilayer perceptron is a class of feedforward artificial neural network.
 In MLP the nodes are arranged into a set of layers and layer contains some number of identical units.

2) Identify the parameters in a Perceptron network.


It consists of four main parameters named
 input values (Input nodes),
 weights and Bias,
 net sum,
 activation function
3) Infer about activation function..

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and
further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a
neuron.

4)Indicate how to estimate error in the output layer?


The following equation is used to calculate the error function, E, for all patterns

This error term is called sum squared error.

5)List out the difference between MLP and RBFN

MLP (Multi-Layer Perceptron)

1. Activation Function: MLP uses non-linear activation functions like sigmoid, ReLU, or tanh in
its hidden layers.

2. Architecture: The network learns weights for connections between layers and processes data
through fully connected layers.

RBFN (Radial Basis Function Network)


1. Activation Function: RBFN uses radial basis functions (typically Gaussian) as activation
functions in the hidden layer.

2. Architecture: RBFN has a two-layer architecture where the first layer uses radial basis
functions, and the output layer is typically linear.

6) Relate entropy and information gain.


Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an
attribute.
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

7)Infer probability and statistics in machine learning.


Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the
frequency of past events.

8)List the stages of self organizing map.


1. Initialization
2. Competition
3. Cooperation
4. Adaption

9) State the nearest neighbor method.


K-NN algorithm assumes the similarity between the new case/data and available cases and put the new
case into the category that is most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the similarity.

PART – B

10 a)Explain multi-layer perceptron model with going forward and going backward

The simplest kind of feed-forward network is a multilayer perceptron (MLP), as shown in Figure 1.
The units are arranged into a set of layers, and each layer contains some number of identical units.
Every unit in one layer is connected to every unit in the next layer; we say that the network is fully
connected. The first layer is the input layer, and its units take the values of the input features.
The last layer is the output layer, and it has one unit for each value the network outputs.
All the layers in between these are known as hidden layers
The units in these layers are known as input units, output units, and hidden units, respectively.
The number of layers is known as the depth, and the number of units in a layer is known as the width.
Training of MLP

target output for node k is tk, the difference between the actual output and the expected output is given
by:

The training of MLP contains two pass


1. forward pass
2. backward pass
It is shown in the figure
Going forward
In the forward pass, the signal flow moves from the input layer through the hidden layers to the output
layer, and the decision of the output layer is measured against the desired output.
Notation
The activations of the input units as xj and the activation of the output unit as y.
The units in the `l’ th hidden layer will be denoted hi(l).
Mathematical Representation
The network’s computations can be written out for forward pass as:

Going Backward
Finally, Backpropagation is derived by assuming that it is desirable to minimize the error on the output
nodes over all the patterns presented to the neural network. The following equation is used to calculate
the error function, E, for all patterns

Output Layer
If the actual activation value of the output node, k, is Ok, and the expected target output for node k is tk,
the difference between the actual output and the expected output is given by:

The error signal for node k in the output layer can be calculated as

where the Ok(1-Ok) term is the derivative of the Sigmoid function.


With the delta rule, the change in the weight connecting input node j and output node k is proportional to
the error at node k multiplied by the activation of node j.
The formulas used to modify the weight, wj,k, between the output node, k, and the node, j is:

where is the change in the weight between nodes j and k, lr is the learning rate.
Hidden Layer
The error signal for node j in the hidden layer can be calculated as

where the Sum term adds the weighted error signal for all nodes, k, in the output layer.
As before, the formula to adjust the weight, wi,j, between the input node, i, and the node, j is:

10 (b)Illustrate the radial basis function network with a diagram


Radial basis function (RBF) networks are feed-forward networks trained using a supervised training
algorithm.
They are typically configured with a single hidden layer of units whose activation function is selected
from a class of functions called basis functions.
The structure of an RBF networks in its most basic form involves three entirely different layers

The input layer is made up of source nodes (sensory units) whose number is equal to the dimension p
of the input vector u.
Hidden layer
The second layer is the hidden layer which is composed of nonlinear units that are connected directly
to all of the nodes in the input layer.
Each hidden unit takes its input from all the nodes at the components at the input layer. As mentioned
above the hidden units contains a basis function, which has the parameters center and width. The
center of the basis function for a node i at the hidden layer is a vector ci whose size is the as the input
vector u and there is normally a different center for each unit in the network.
First, the radial distance di, between the input vector u and the center of the basis function ci is
computed for each unit i in the hidden layer as

using the Euclidean distance.


The output hi of each hidden unit i is then computed by applying the basis function G to this distance

Output layer
The transformation from the input space to the hidden unit space is nonlinear, whereas the
transformation to the hidden unit space to the output space is linear.
The jth output is computed as
In summary, the mathematical model of the RBF network can be expressed as:

where is the the Euclidean distance between u and ci


Training RBF Networks
The training of a RBF network can be formulated as a nonlinear unconstrained optimization problem
given below:
Given input output training patterns (uk,yk), k=1,2, ..K, choose wi,j and ci,, i=1,2...L, j=1,2...M so as to
minimize

Adjusting the widths


In its simplest form, all hidden units in the RBF network have the same width or degree of sensitivity
to inputs. However, in portions of the input space where there are few patterns, it is sometime
desirable to have hidden units with a wide area of reception. Likewise, in portions of the input space,
which are crowded, it might be desirable to have very highly tuned processors with narrow reception
fields. Computing these individual widths increases the performance of the RBF network at the
expense of a more complicated training process.
Adjusting the centers
In radial basis function networks, however, the weights into the hidden layer basis units are usually
set before the second layer of weights is adjusted. As the input moves away from the connection
weights, the activation value falls off. This behavior leads to the use of the term “center” for the first-
layer weights. These center weights can be computed using Kohonen feature maps, statistical
methods such as K-Means clustering, or some other means.
Adjusting the weights
Once the hidden layer weights are set, a second phase of training is used to adjust the output weights.
This process typically uses the standard steepest descent algorithm.

11(a)Elaborate the concept of decision tree in detail.

Decision Trees:
A Trees has nodes. and branches.
Rooted tree has a root and Children node and also there will leaves without children.
A decision tree is a tree is also a classifier
The tree has two types of nodes:
1. Decision nodes/ specify a choice
2. leaf Nodes.
1. Decision Node:
It Specify a choice or a test base on this you can decide which direction we can go.
This test is usually done on the Valuer of a feature (or) attribute of the instance.
The test is on some attribute and there is a branch for each outcome.
2. Leaf Node:
Leaf Node indicates the classification of an example or value of the example.
Example:
Whether loan will approve

We have three decision nodes and Four-Leaf nodes.


Given some a training examples we have to generate a decision tree.
Issues in constructing decision tree.
There may be many decision trees which fit the training examples Which decision tree Prefer smaller trees.
What is Trees smaller trees?
Trees with small no of nodes or low depth.
Finding smallest decision tree that fits the data is a computationally hand problem.
Constructing Decision Trees:
At First, we have all the training examples. Then we have to choose a test here
Suppose the test has two outcomes Yes and No. Assume test on A5
Attribute.

If all the examples in D1 have the same output Y then there is no need expand node D1, But if there are
different values, then the node should be split further.

Decision tree are build recursively.


At every step we have to make decision whether to stop growing the tree at that node or whether to split
further.
If to split,it should be decided which attribute to split on.
Top-Down Induction of decision tree /ID3 proposed by Quilian:
1. A ← the best decision attribute for next node.
2. Assign A as decision attribute for node.
3. For each value of a create new descendant
4 Sort training examples of leaf node according to the attribute value of the branch
5. If all training examples perfectly classified (same value of target attribute) Stop, else iterate over leaf
nodes.
Two Decision has to be taken:
Which attribute to choose
When to stop
1. When to stop:
No more I/P features.
All examples are classified the same.
Too few examples to make informative.

2.Which test to split on?


Split gives smallest error.
With multi values features
1)Split on all values or
2)Split values into half

To choose which attribute there are multiple methods.


But the popular method is based on entropy and information gain.
Entropy System is a measure of disorder in a system.
If it a particular node all the examples have positive or all the examples negative Then all the example
belongs to same set then it is a homogeneous set. Entropy is 0.
If half belongs to one class and half belong to another class then entropy is highest class.
Leaf node has all examples belong to same class.

11(b)1. Describe K-nearest algorithm. (8)

Given training data {(x1, y1), . . . , (xN, yN)}


N input/output pairs; xi - input, yi - output/label.
xi is a vector consisting of D features. Also called attributes or dimensions
Features can be discrete or continuous xim denotes the m-th feature of xi
Forms of the output:
yi ! {1, . . . , C} for classification; a discrete variable
yi ! R for regression; a continuous (real-valued) variable
Goal: predict the output y for an unseen test example x
Prediction Rule: Look at the K most similar training examples.
For classification: assign the majority class label (majority voting)
For regression: assign the average response
The algorithm requires:
Parameter K: number of nearest neighbors to look for
Distance function: To compute the similarities between examples
Compute the test point’s distance from each training point.
Sort the distances in ascending (or descending) order
Use the sorted distances to select the K nearest neighbors
Use majority rule (for classification) or averaging (for regression)
K-Nearest Neighbors is called a non-parametric method. Unlike other supervised learning algorithms, K-
Nearest Neighbors doesn’t learn an explicit mapping f from the training data.
It simply uses the training data at the test time to make predictions
The K-NN algorithm requires computing distances of the test example from each of the training examples

2. Interpret about K-means clustering algorithm in detail.(8)

 K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters.
 Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so on.
 It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that
each dataset belongs only one group that has similar properties.
 It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.
 It is a centroid-based algorithm, where each cluster is associated with a centroid.
 The main aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
 The k-means clustering algorithm mainly performs two tasks:
 Determines the best value for K center points or centroids by an iterative process.
 Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids.
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

You might also like