DWDM Unit IV
DWDM Unit IV
UNIT –IV
CLASSIFICATION ANALYSIS
Classification, which is the task of assigning objects to one of several predefined categories, is a pervasive
problem that encompasses many diverse applications.
Examples include detecting spam email messages based upon the message header and content, categorizing
cells as malignant or benign based upon the results of MRI scans, and classifying galaxies based upon their
shapes .
Input Output
Classification
Attribute set Class label
model
(x) (y)
Classification as the task of mapping an input attribute set x into its class label y.
4.1 Preliminaries
(Classification). Classification is the task of learning a tar-get function f that maps each attribute
set x to one of the predefined class labels y.
The target function is also known informally as a classification model. A classification model
is useful for the following purposes.
Predictive Modeling A classification model can also be used to predict the class label of unknown
records. Suppose we are given the following characteristics of a creature known as a gila monster:
We can use a classification model built from the data set shown in Table to determine the class to
which the creature belongs.
Classification techniques are most suited for predicting or describing data sets with binary or
Each entry fij in this table denotes the number of records from class i predicted to be of class j.
For instance, f01 is the number of records from class 0 incorrectly predicted as class 1.
Based on the entries in the confusion matrix, the total number of correct predictions made by the
model is (f11 + f00) and the total number of incorrect predictions is (f10 + f01).
This can be done using a performance metric such as accuracy, which is defined as follows:
Equivalently, the performance of a model can be expressed in terms of its error rate, which is given
by the following equation:
Most classification algorithms seek models that attain the highest accuracy, or equivalently, the
lowest error rate when applied to the test set
Since all cold-blooded vertebrates are non-mammals, a leaf node labeled Non-mammals is created as the
right child of the root node.
If the vertebrate is warm-blooded, a subsequent attribute, Gives Birth, is used to distinguish mammals from
other warm-blooded creatures, which are mostly birds.
Classifying a test record is straightforward once a decision tree has been constructed.
Starting from the root node, we apply the test condition to the record and follow the appropriate branch
based on the outcome of the test.
This will lead us either to another internal node, for which a new test condition is applied, or to a leaf node.
The class label associated with the leaf node is then assigned to the record.
In principle, there are exponentially many decision trees that can be constructed from a given set of
attributes.
While some of the trees are more accurate than others, finding the optimal tree is computationally infeasible
because of the exponential size of the search space.
Nevertheless, efficient algorithms have been developed to induce a reasonably accurate, albeit suboptimal,
decision tree in a reasonable amount of time.
These algorithms usually employ a greedy strategy that grows a decision tree by making a series of locally
optimum decisions about which attribute to use for partitioning the data .
Classifying an unlabeled vertebrate. The dashed lines represent the outcomes of applyingvarious
attribute test conditions on the unlabeled vertebrate. The vertebrate is eventually assigned to the
Non-mammal class.
Hunt’s Algorithm
In Hunt’s algorithm, a decision tree is grown in a recursive fashion by parti-tioning the training
records into successively purer subsets.
Let Dt be the set of training records that are associated with node t and y = {y 1, y2, . . . , yc} be the
class labels. The following is a recursive definition of Hunt’s algorithm.
Step 1: If all the records in Dt belong to the same class yt, then t is a leaf node labeled as yt.
Step 2: If Dt contains records that belong to more than one class, an at-tribute test condition is selected
to partition the records into smaller subsets. A child node is created for each outcome of the test
condi-tion and the records in Dt are distributed to the children based on the outcomes. The
algorithm is then recursively applied to each child node.
Training set for predicting borrowers who will default on loan payments.
A.RAMESH, Asst . prof , C.S.E Dept
Page 6
III CSE UNIT -IV
The initial tree for the classification problem contains a single node with class label Defaulted = No (see
Figure 4.7(a)), which means that most of the borrowers successfully repaid their loans.
The tree, however, needs to be refined since the root node contains records from both classes.
The records are subsequently divided into smaller subsets based on the outcomes of the Home Owner test
condition, as shown in Figure 4.7(b).
Hunt’s algorithm will work if every combination of attribute values is present in the training data
and each combination has a unique class label. These assumptions are too stringent for use in most
practical situations. Ad-ditional conditions are needed to handle the following cases:
1. It is possible for some of the child nodes created in Step 2 to be empty; i.e., there are no records
associated with these nodes. This can happen if none of the training records have the combination
of attribute values associated with such nodes. In this case the node is declared a leaf node with the
same class label as the majority class of training records associated with its parent node.
2. In Step 2, if all the records associated with D t have identical attribute values (except for the class
label), then it is not possible to split these records any further. In this case, the node is declared a
leaf node with the same class label as the majority class of training records associated with this
node.
A learning algorithm for inducing decision trees must address the following two issues.
1. How should the training records be split? Each recursive step of the tree-growing process must
select an attribute test condition to divide the records into smaller subsets. To implement this step,
the algorithm must provide a method for specifying the test condition for different attribute types
as well as an objective measure for evaluating the goodness of each test condition.
2. How should the splitting procedure stop? A stopping condition is needed to terminate the tree-
growing process. A possible strategy is to continue expanding a node until either all the records
belong to the same class or all the records have identical attribute values. Although both
conditions are sufficient to stop any decision tree induction algorithm, other criteria can be
imposed to allow the tree-growing procedure to terminate earlier
Ordinal Attribute
Ordinal attribute can also produce binary or multiway splits .
Ordinal attribute values can be grouped as long as the grouping does not violate the order property
of the attribute values .
IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can
express a rule in the following from −
IF condition THEN conclusion
Points to remember −
Rule Extraction
Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from
a decision tree.
Points to remember −
One rule is created for each path from the root to the leaf node.
To form a rule antecedent, each splitting criterion is logically ANDed.
The leaf node holds the class prediction, forming the rule consequent.
Rule Pruning
The rule is pruned is due to the following reason −
The Assessment of quality is made on the original set of training data. The rule
may perform well on training data but less well on subsequent data. That's why
the rule pruning is required.
FOIL is one of the simple and effective method for rule pruning. For a given rule R,
FOIL_Prune = pos - neg / pos + neg
where pos and neg is the number of positive tuples covered by R, respectively.
Bayesian Classification
Baye's Theorem
Bayes' Theorem is named after Thomas Bayes. There are two types of probabilities −
Solved Example
Question 1: Calculate P(H/X) if P(X/H) = 0.25, P(X) = 0.4 and P(H) = 0.5 using Bayes theorem.
Solution:
Given,
P(X/H) = 0.25
P(X) = 0.4
P(H) = 0.5
Using Bayes Theorem Formula
P(H|X) = P(X|H)P(H)/P(X)
P(H|X) = (0.25 × 0.5)/0.4
Answer = 0.3125
The arc in the diagram allows representation of causal knowledge. For example, lung
cancer is influenced by a person's family history of lung cancer, as well as whether or
not the person is a smoker. It is worth noting that the variable PositiveXray is
independent of whether the patient has a family history of lung cancer or that the
patient is a smoker, given that we know the patient has lung cancer.
The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly
suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often
To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the
illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task is to
classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently
exiting objects.
Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case
(which hasn't been observed yet) is twice as likely to have membership GREEN rather than RED. In the
Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on previous
experience, in this case the percentage of GREEN and RED objects, and often used to predict
Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for class
membership are:
Having formulated our prior probability, we are now ready to classify a new object (WHITE circle).
Since the objects are well clustered, it is reasonable to assume that the more GREEN (or RED) objects
in the vicinity of X, the more likely that the new cases belong to that particular color. To measure this
likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points
irrespective of their class labels. Then we calculate the number of points in the circle belonging to
From the illustration above, it is clear that Likelihood of X given GREEN is smaller than Likelihood of X
given RED, since the circle encompasses 1 GREEN object and 3 RED ones. Thus:
Although the prior probabilities indicate that X may belong to GREEN (given that there are twice as
many GREEN compared to RED) the likelihood indicates otherwise; that the class membership of X is
RED (given that there are more RED objects in the vicinity of X than GREEN). In the Bayesian analysis,
the final classification is produced by combining both sources of information, i.e., the prior and the
likelihood, to form a posterior probability using the so-called Bayes' rule (named after Rev. Thomas
Bayes 1702-1761).
Finally, we classify X as RED since its class membership achieves the largest posterior probability.
Classification by Backpropagation
Backpropagation is the essence of neural network training. It is the
method of fine-tuning the weights of a neural network based on the error
rate obtained in the previous epoch (i.e., iteration). Proper tuning of the
weights allows you to reduce error rates and make the model reliable by
increasing its generalization.
Backpropagation: A neural network learning algorithm
Started by psychologists and neurobiologists to develop
and test computational analogues of neurons
A neural network: A set of connected input/output units
where each connection has a weight associated with it
During the learning phase, the network learns by
adjusting the weights so as to be able to predict the
correct class label of the input tuples
Also referred to as connectionist learning due to the
connections between units
Strength
5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.
SVM
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression. But generally, they are
used in classification problems. In 1960s, SVMs were first introduced but later they got
refined in 1990. SVMs have their unique way of implementation as compared to other
machine learning algorithms. Lately, they are extremely popular because of their ability to
handle multiple continuous and categorical variables.
Working of SVM
Bagging
tree, Logistic regression, etc., to predict the output for the same
Boosting
learners. The weak learner is the classifiers that are correct only
Boosting)