Unit 2
Unit 2
UNIT – II
Chapter I:
DECISION TREE
Decision tree learning is a method for approximating discrete-valued target functions, in which the learned
function is represented by a decision tree.
Decision trees classify instances by sorting them down the tree from the root to some leaf node, which
provides the classification of the instance.
Each node in the tree specifies a test of some attribute of the instance, and each branch descending from
that node corresponds to one of the possible values for this attribute.
An instance is classified by starting at the root node of the tree, testing the attribute specified by this node,
then moving down the tree branch corresponding to the value of the attribute in the given example. This
process is then repeated for the subtree rooted at the new node.
An example is classified by sorting it through the tree to the appropriate leaf node, then returning the
classification associated with this leaf.
1
Machine Learning-19CS601
2
Machine Learning-19CS601
Summary of the ID3 algorithm specialized to learning Boolean-valued functions. ID3 is a greedy
algorithm that grows the tree top-down, at each node selecting the attribute that best classifies the
local training examples. This process continues until the tree perfectly classifies the training
examples, or until all attributes have been used.
3
Machine Learning-19CS601
Example:
To illustrate the operation of ID3, consider the learning task represented by the training examples of
below table.
Here the target attribute PlayTennis, which can have values yes or no
for different days.
Consider the first step through the algorithm, in which the topmost node of the decision tree is created.
Suppose S is a collection of 14 examples of some boolean concept, including 9 positive and 5
negative examples. Then the entropy of S relative to this boolean classification is:
The entropy is 1 when the collection contains an equal number of positive and negative
examples
If the collection contains unequal numbers of positive and negative examples, the entropy is
between 0 and 1.
4
Machine Learning-19CS601
S = [6+, 2−]
Weak
S = [3+, 3−]
Strong
= 0.048
ID3 determines the information gain for each candidate attribute (i.e., Outlook, Temperature, Humidity,
and Wind), then selects the one with highest information gain
5
Machine Learning-19CS601
According to the information gain measure, the Outlook attribute provides the best prediction of the target
attribute, Play Tennis, over the training examples. Therefore, Outlook is selected as the decision attribute
for the root node, and branches are created below the root for each of its possible values i.e., Sunny,
Overcast, and Rain.
6
Machine Learning-19CS601
7
Machine Learning-19CS601
Chapter II
INTRODUCTION
Artificial neural networks (ANNs) provide a general, practical method for learning real-valued,
discrete-valued, and vector-valued target functions.
Biological Motivation
The study of artificial neural networks (ANNs) has been inspired by the observation that
biological learning systems are built of very complex webs of interconnected Neurons
Human information processing system consists of brain neuron: basic building block cell that
communicates information to and from various parts of body
ANN learning is well-suited to problems in which the training data corresponds to noisy,
complex sensor data, such as inputs from cameras and microphones.
9
Machine Learning-19CS601
PERCEPTRON
Figure: A perceptron
10
Machine Learning-19CS601
The learning problem is to determine a weight vector that causes the perceptron to produce
the correct + 1 or - 1 output for each of the given training examples.
The role of the learning rate is to moderate the degree to which weights are
changed at each step. It is usually set to some small value (e.g., 0.1) and is
sometimes made to decay as the number of weight-tuning iterations increases
Drawback:
The perceptron rule finds a successful weight vector when the training examples are linearly
separable, it can fail to converge if the examples are not linearly separable.
The BACKPROPAGATION Algorithm learns the weights for a multilayer network, given a
network with a fixed set of units and interconnections. It employs gradient descent to attempt to
minimize the squared error between the network output values and the target values for these
outputs.
11
Machine Learning-19CS601
where,
outputs - is the set of output units in the network
tkd and Okd - the target and output values associated with the kth output unit
d - training example
12
Machine Learning-19CS601
Algorithm:
13
Machine Learning-19CS601
Chapter III
INTRODUCTION
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning. The goal of the SVM
algorithm is to create the best line or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane. SVM chooses the extreme
points/vectors that help in creating the hyperplane. These extreme cases are called as
support vectors, and hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a
model that can accurately identify whether it is a cat or dog, so such a model can be created
by using the SVM algorithm. We will first train our model with lots of images of cats and
dogs so that it can learn about different features of cats and dogs, and then we test it with
this strange creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case
14
Machine Learning-19CS601
of cat and dog. On the basis of the support vectors, it will classify it as a cat.
Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization,
etc. Types of SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier. O
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if
a dataset cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.
15
Machine Learning-19CS601
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the below
image
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the
lines from both the classes. These points are called support vectors. The distance between
the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize
this margin. The hyperplane with maximum margin is called the optimal hyperplane.
2.4.2. Non-Linear SVM: If data is linearly arranged, then we can separate it by using a
straight line, but for non-linear data, we cannot draw a single straight line. Consider the
below image:
16
Machine Learning-19CS601
So to separate these data points, we need to add one more dimension. For linear data, we
have used two dimensions x and y, so for non-linear data, we will add a third dimension z.
It can be calculated as: z=x 2 +y2 By adding the third dimension, the sample space will
become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
17
Machine Learning-19CS601
18
Machine Learning-19CS601
Versatility: SVMs can be used for both classification and regression tasks, and it can be
applied to a wide range of applications such as natural language processing, computer vision
and bioinformatics.
Sparse solution: SVMs have sparse solutions, which means that they only use a subset of the
training data to make predictions. This makes the algorithm more efficient and less prone to
overfitting.
Regularization: SVMs can be regularized, which means that the algorithm can be modified
to avoid overfitting.
19
Machine Learning-19CS601
20