0% found this document useful (0 votes)
17 views22 pages

DWDM Unit IV

The document discusses classification analysis, which involves assigning objects to predefined categories, with applications in spam detection, medical diagnosis, and astronomy. It explains the structure of classification tasks, the importance of classification models, and various techniques such as decision trees and rule-based classifiers. Additionally, it outlines the process of building classification models, including training and testing datasets, evaluation metrics like accuracy, and the decision tree induction method.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

DWDM Unit IV

The document discusses classification analysis, which involves assigning objects to predefined categories, with applications in spam detection, medical diagnosis, and astronomy. It explains the structure of classification tasks, the importance of classification models, and various techniques such as decision trees and rule-based classifiers. Additionally, it outlines the process of building classification models, including training and testing datasets, evaluation metrics like accuracy, and the decision tree induction method.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

III CSE UNIT -IV

UNIT –IV

CLASSIFICATION ANALYSIS

 Classification, which is the task of assigning objects to one of several predefined categories, is a pervasive
problem that encompasses many diverse applications.
 Examples include detecting spam email messages based upon the message header and content, categorizing
cells as malignant or benign based upon the results of MRI scans, and classifying galaxies based upon their
shapes .

(a) A spiral galaxy. (b) An elliptical galaxy.


Classification of galaxies. The images are from the NASA website.

Input Output
Classification
Attribute set Class label
model
(x) (y)

Classification as the task of mapping an input attribute set x into its class label y.
4.1 Preliminaries

 The input data for a classification task is a collection of records.


 Each record, also known as an instance or example, is characterized by a tuple (x, y), where x is
the attribute set and y is a special attribute, designated as the class label (also known as category or
target attribute).
 Table shows a sample data set used for classifying vertebrates into one of the following categories:
mammal, bird, fish, reptile, or amphibian.
 The attribute set includes properties of a vertebrate such as its body temperature, skin cover,
method of reproduction, ability to fly, and ability to live in water.
 Although the attributes presented in Table are mostly discrete, the attribute set can also contain
continuous features.
 The class label, on the other hand, must be a discrete attribute.
 This is a key characteristic that distinguishes classification from regression, a predictive modeling
task in which y is a continuous attribute.

A.RAMESH, Asst . prof , C.S.E Dept


Page 1
III CSE UNIT -IV

The vertebrate data set.

Name Body Skin Gives Aquatic Aerial Has Hiber- Class


Temperature Cover Birth Creature Creature Legs nates Label
human warm-blooded hair yes no no yes no mammal
python cold-blooded scales no no no no yes reptile
salmon cold-blooded scales no yes no no no fish
whale warm-blooded hair yes yes no no no mammal
frog cold-blooded none no semi no yes yes amphibian
komodo cold-blooded scales no no no yes no reptile
dragon
bat warm-blooded hair yes no yes yes yes mammal
pigeon warm-blooded feathers no no yes yes no bird
cat warm-blooded fur yes no no yes no mammal
leopard cold-blooded scales yes yes no no no fish
shark
turtle cold-blooded scales no semi no yes no reptile
penguin warm-blooded feathers no semi no yes no bird
porcupine warm-blooded quills yes no no yes yes mammal
eel cold-blooded scales no yes no no no fish
salamander cold-blooded none no semi no yes yes amphibian

(Classification). Classification is the task of learning a tar-get function f that maps each attribute
set x to one of the predefined class labels y.

The target function is also known informally as a classification model. A classification model
is useful for the following purposes.

Descriptive Modeling A classification model can serve as an explanatory tool to distinguish


between objects of different classes. For example, it would be useful—for both biologists and
others—to have a descriptive model that summarizes the data shown in and explains what features
define a vertebrate as a mammal, reptile, bird, fish, or amphibian.

Predictive Modeling A classification model can also be used to predict the class label of unknown
records. Suppose we are given the following characteristics of a creature known as a gila monster:

Name Body Skin Gives Aquatic Aerial Has Hiber- Class


Temperature Cover Birth Creature Creature Legs nates Label
gila monstercold-blooded scales no no no yes yes ?

 We can use a classification model built from the data set shown in Table to determine the class to
which the creature belongs.
 Classification techniques are most suited for predicting or describing data sets with binary or

A.RAMESH, Asst . prof , C.S.E Dept


Page 2
III CSE UNIT -IV
nominal categories.
 They are less effective for ordinal categories (e.g., to classify a person as a member of high-,
medium-, or low-income group) because they do not consider the implicit order among the
categories.

4.2 General approach to solving a classification problem


 A classification technique (or classifier) is a systematic approach to building classification
models from an input data set.
 Examples include decision tree classifiers, rule-based classifiers, neural networks, support vector
machines, and naive Bayes classifiers.
 Each technique employs a learning algorithm to identify a model that best fits the relationship
between the attribute set and class label of the input data.
 The model generated by a learning algorithm should both fit the input data well and correctly
predict the class label s of records it has never seen before .

General approach for building a classification model.


 First, a training set consisting of records whose class labels are known must be provided.
 The training set is used to build a classification model, which is subsequently applied to the test
set, which consists of records with unknown class labels.
 Evaluation of the performance of a classification model is based on the counts of test records
correctly and incorrectly predicted by the model.
 These counts are tabulated in a table known as a confusion matrix.
 Table depicts the confusion matrix for a binary classification problem.

A.RAMESH, Asst . prof , C.S.E Dept


Page 3
III CSE UNIT -IV

Confusion matrix for a 2-class problem

 Each entry fij in this table denotes the number of records from class i predicted to be of class j.
 For instance, f01 is the number of records from class 0 incorrectly predicted as class 1.
 Based on the entries in the confusion matrix, the total number of correct predictions made by the
model is (f11 + f00) and the total number of incorrect predictions is (f10 + f01).
 This can be done using a performance metric such as accuracy, which is defined as follows:

Accuracy = Number of correct predictions f11 + f00


=
Total number of predictions f00 + f01+ f10+f11

 Equivalently, the performance of a model can be expressed in terms of its error rate, which is given
by the following equation:

Error rate = Number of wrong predictions = f01 + f10


Total number of predictions f00+ f01 + f10 + f11

 Most classification algorithms seek models that attain the highest accuracy, or equivalently, the
lowest error rate when applied to the test set

4.3 Decision Tree Induction


 This section introduces a decision tree classifier, which is a simple yet widely used classification
technique.

4.3.1 How a Decision Tree Works


 Instead of classifying the vertebrates into five distinct groups of species, we assign them to two
categories: mammals and non-mammals.
 Suppose a new species is discovered by scientists. How can we tell whether it is a mammal or a
non-mammal? One approach is to pose a series of questions about the characteristics of the
species.
 The first question we may ask is whether the species is cold- or warm-blooded. If it is cold-
blooded, then it is definitely not a mammal.
 Otherwise, it is either a bird or a mammal. In the latter case, we need to ask a follow-up question:
Do the females of the species give birth to their young ? Those that do give birth are definitely
mammals .
 Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges.
A.RAMESH, Asst . prof , C.S.E Dept
Page 4
III CSE UNIT -IV
 Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing edges.
 In a decision tree, each leaf node is assigned a class label.
 The non-terminal nodes, which include the root and other internal nodes, contain attribute test
conditions to separate records that have different characteristics.
 For example, the root node shown in Figure uses the attribute Body Temperature to separate warm-
blooded from cold-blooded vertebrates.

 Since all cold-blooded vertebrates are non-mammals, a leaf node labeled Non-mammals is created as the
right child of the root node.
 If the vertebrate is warm-blooded, a subsequent attribute, Gives Birth, is used to distinguish mammals from
other warm-blooded creatures, which are mostly birds.
 Classifying a test record is straightforward once a decision tree has been constructed.
 Starting from the root node, we apply the test condition to the record and follow the appropriate branch
based on the outcome of the test.
 This will lead us either to another internal node, for which a new test condition is applied, or to a leaf node.
The class label associated with the leaf node is then assigned to the record.

4.3.2 How to Build a Decision Tree

 In principle, there are exponentially many decision trees that can be constructed from a given set of
attributes.
 While some of the trees are more accurate than others, finding the optimal tree is computationally infeasible
because of the exponential size of the search space.
 Nevertheless, efficient algorithms have been developed to induce a reasonably accurate, albeit suboptimal,
decision tree in a reasonable amount of time.
 These algorithms usually employ a greedy strategy that grows a decision tree by making a series of locally
optimum decisions about which attribute to use for partitioning the data .

A.RAMESH, Asst . prof , C.S.E Dept


Page 5
III CSE UNIT -IV

Classifying an unlabeled vertebrate. The dashed lines represent the outcomes of applyingvarious
attribute test conditions on the unlabeled vertebrate. The vertebrate is eventually assigned to the
Non-mammal class.
Hunt’s Algorithm
 In Hunt’s algorithm, a decision tree is grown in a recursive fashion by parti-tioning the training
records into successively purer subsets.
 Let Dt be the set of training records that are associated with node t and y = {y 1, y2, . . . , yc} be the
class labels. The following is a recursive definition of Hunt’s algorithm.

Step 1: If all the records in Dt belong to the same class yt, then t is a leaf node labeled as yt.

Step 2: If Dt contains records that belong to more than one class, an at-tribute test condition is selected
to partition the records into smaller subsets. A child node is created for each outcome of the test
condi-tion and the records in Dt are distributed to the children based on the outcomes. The
algorithm is then recursively applied to each child node.

Training set for predicting borrowers who will default on loan payments.
A.RAMESH, Asst . prof , C.S.E Dept
Page 6
III CSE UNIT -IV

 The initial tree for the classification problem contains a single node with class label Defaulted = No (see
Figure 4.7(a)), which means that most of the borrowers successfully repaid their loans.
 The tree, however, needs to be refined since the root node contains records from both classes.
 The records are subsequently divided into smaller subsets based on the outcomes of the Home Owner test
condition, as shown in Figure 4.7(b).

Hunt’s algorithm for inducing decision trees.

 Hunt’s algorithm will work if every combination of attribute values is present in the training data
and each combination has a unique class label. These assumptions are too stringent for use in most
practical situations. Ad-ditional conditions are needed to handle the following cases:

1. It is possible for some of the child nodes created in Step 2 to be empty; i.e., there are no records
associated with these nodes. This can happen if none of the training records have the combination
of attribute values associated with such nodes. In this case the node is declared a leaf node with the
same class label as the majority class of training records associated with its parent node.

2. In Step 2, if all the records associated with D t have identical attribute values (except for the class
label), then it is not possible to split these records any further. In this case, the node is declared a
leaf node with the same class label as the majority class of training records associated with this
node.

A.RAMESH, Asst . prof , C.S.E Dept


Page 7
III CSE UNIT -IV
Design Issues of Decision Tree Induction

A learning algorithm for inducing decision trees must address the following two issues.
1. How should the training records be split? Each recursive step of the tree-growing process must
select an attribute test condition to divide the records into smaller subsets. To implement this step,
the algorithm must provide a method for specifying the test condition for different attribute types
as well as an objective measure for evaluating the goodness of each test condition.
2. How should the splitting procedure stop? A stopping condition is needed to terminate the tree-
growing process. A possible strategy is to continue expanding a node until either all the records
belong to the same class or all the records have identical attribute values. Although both
conditions are sufficient to stop any decision tree induction algorithm, other criteria can be
imposed to allow the tree-growing procedure to terminate earlier

4.3.3 Methods for Expressing Attribute Test Conditions


 Decision tree induction algorithms must provide a method for expressing an attribute test
condition and its corresponding outcomes for different attribute types.
Binary Attribute
 Binary Attributes The test condition for a binary attribute generates two potential outcomes.

Test condition for binary attribute


Nominal Attribute
 Nominal Attributes Since a nominal attribute can have many values, its test condition can be
expressed in two ways, as shown in below :

A.RAMESH, Asst . prof , C.S.E Dept


Page 8
III CSE UNIT -IV
Test conditions for nominal attributes
 For multiway split the number of outcomes depends on the number of distinct values for the
corresponding attribute.
 For binary split the number of outcomes will be two.

Ordinal Attribute
 Ordinal attribute can also produce binary or multiway splits .
 Ordinal attribute values can be grouped as long as the grouping does not violate the order property
of the attribute values .

Different ways of grouping ordinal attribute values


Continuous Attributes
 Continuous Attributes For continuous attributes, the test condition can be expressed as a
comparison test (A < v) or (A ≥ v) with binary outcomes, or a range query with outcomes of the
form vi ≤ A < vi+1, for i = 1, . . . , k.

Test condition for continuous attributes.

A.RAMESH, Asst . prof , C.S.E Dept


Page 9
III CSE UNIT -IV
Multiway versus binary splits.
Data Mining - Rule Based Classification

IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can
express a rule in the following from −
IF condition THEN conclusion

Let us consider a rule R1,

R1: IF age = youth AND student = yes

THEN buy_computer = yes

Points to remember −

 The IF part of the rule is called rule antecedent or precondition.


 The THEN part of the rule is called rule consequent.
 The antecedent part the condition consist of one or more attribute tests and
these tests are logically ANDed.
 The consequent part consists of class prediction.

Note − We can also write rule R1 as follows −

R1: (age = youth) ^ (student = yes))(buys computer = yes)


If the condition holds true for a given tuple, then the antecedent is satisfied.

Rule Extraction
Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from
a decision tree.

Points to remember −

To extract a rule from a decision tree −

 One rule is created for each path from the root to the leaf node.
 To form a rule antecedent, each splitting criterion is logically ANDed.
 The leaf node holds the class prediction, forming the rule consequent.

Rule Pruning
The rule is pruned is due to the following reason −

 The Assessment of quality is made on the original set of training data. The rule
may perform well on training data but less well on subsequent data. That's why
the rule pruning is required.

A.RAMESH, Asst . prof , C.S.E Dept


Page 10
III CSE UNIT -IV
 The rule is pruned by removing conjunct. The rule R is pruned, if pruned version
of R has greater quality than what was assessed on an independent set of
tuples.

FOIL is one of the simple and effective method for rule pruning. For a given rule R,
FOIL_Prune = pos - neg / pos + neg

where pos and neg is the number of positive tuples covered by R, respectively.

Bayesian Classification

Bayesian classification is based on Bayes' Theorem. Bayesian classifiers are the


statistical classifiers. Bayesian classifiers can predict class membership probabilities
such as the probability that a given tuple belongs to a particular class.

Baye's Theorem
Bayes' Theorem is named after Thomas Bayes. There are two types of probabilities −

 Posterior Probability [P(H/X)]


 Prior Probability [P(H)]
where X is data tuple and H is some hypothesis.

According to Bayes' Theorem,


P(H/X)= P(X/H)P(H) / P(X)

Solved Example
Question 1: Calculate P(H/X) if P(X/H) = 0.25, P(X) = 0.4 and P(H) = 0.5 using Bayes theorem.
Solution:
Given,
P(X/H) = 0.25
P(X) = 0.4
P(H) = 0.5
Using Bayes Theorem Formula
P(H|X) = P(X|H)P(H)/P(X)
P(H|X) = (0.25 × 0.5)/0.4
Answer = 0.3125

Bayesian Belief Network


Bayesian Belief Networks specify joint conditional probability distributions. They are
also known as Belief Networks, Bayesian Networks, or Probabilistic Networks.

 A Belief Network allows class conditional independencies to be defined between


subsets of variables.

A.RAMESH, Asst . prof , C.S.E Dept


Page 11
III CSE UNIT -IV
 It provides a graphical model of causal relationship on which learning can be
performed.
 We can use a trained Bayesian Network for classification.

There are two components that define a Bayesian Belief Network −

 Directed acyclic graph


 A set of conditional probability tables

Directed Acyclic Graph

 Each node in a directed acyclic graph represents a random variable.


 These variable may be discrete or continuous valued.
 These variables may correspond to the actual attribute given in the data.

Directed Acyclic Graph Representation


The following diagram shows a directed acyclic graph for six Boolean variables.

The arc in the diagram allows representation of causal knowledge. For example, lung
cancer is influenced by a person's family history of lung cancer, as well as whether or
not the person is a smoker. It is worth noting that the variable PositiveXray is
independent of whether the patient has a family history of lung cancer or that the
patient is a smoker, given that we know the patient has lung cancer.

Conditional Probability Table


The conditional probability table for the values of the variable LungCancer (LC)
showing each possible combination of the values of its parent nodes, FamilyHistory
(FH), and Smoker (S) is as follows −

A.RAMESH, Asst . prof , C.S.E Dept


Page 12
III CSE UNIT -IV

Naive Bayes Classifier

The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly

suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often

outperform more sophisticated classification methods.

To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the

illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task is to

classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently

exiting objects.

Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case

(which hasn't been observed yet) is twice as likely to have membership GREEN rather than RED. In the

Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on previous

experience, in this case the percentage of GREEN and RED objects, and often used to predict

outcomes before they actually happen.

Thus, we can write:

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for class

membership are:

A.RAMESH, Asst . prof , C.S.E Dept


Page 13
III CSE UNIT -IV

Having formulated our prior probability, we are now ready to classify a new object (WHITE circle).

Since the objects are well clustered, it is reasonable to assume that the more GREEN (or RED) objects

in the vicinity of X, the more likely that the new cases belong to that particular color. To measure this

likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points

irrespective of their class labels. Then we calculate the number of points in the circle belonging to

each class label. From this we calculate the likelihood:

From the illustration above, it is clear that Likelihood of X given GREEN is smaller than Likelihood of X

given RED, since the circle encompasses 1 GREEN object and 3 RED ones. Thus:

Although the prior probabilities indicate that X may belong to GREEN (given that there are twice as

many GREEN compared to RED) the likelihood indicates otherwise; that the class membership of X is

RED (given that there are more RED objects in the vicinity of X than GREEN). In the Bayesian analysis,

the final classification is produced by combining both sources of information, i.e., the prior and the

likelihood, to form a posterior probability using the so-called Bayes' rule (named after Rev. Thomas

Bayes 1702-1761).

A.RAMESH, Asst . prof , C.S.E Dept


Page 14
III CSE UNIT -IV

Finally, we classify X as RED since its class membership achieves the largest posterior probability.

Classification by Backpropagation
Backpropagation is the essence of neural network training. It is the
method of fine-tuning the weights of a neural network based on the error
rate obtained in the previous epoch (i.e., iteration). Proper tuning of the
weights allows you to reduce error rates and make the model reliable by
increasing its generalization.

Backpropagation: A neural network learning algorithm

Started by psychologists and neurobiologists to develop
and test computational analogues of neurons

A neural network: A set of connected input/output units
where each connection has a weight associated with it

During the learning phase, the network learns by
adjusting the weights so as to be able to predict the
correct class label of the input tuples

Also referred to as connectionist learning due to the
connections between units

Neural Network as a Classifier

A.RAMESH, Asst . prof , C.S.E Dept


Page 15
III CSE UNIT -IV
Weakness

1.Long training time

2.Require a number of parameters typically best determined


empirically, e.g., the network topology or ``structure."

3.Poor interpretability: Difficult to interpret the symbolic


meaning behind the learned weights and of ``hidden
units" in the network

Strength

1.High tolerance to noisy data

2.Ability to classify untrained patterns

3.Well-suited for continuous-valued inputs and outputs .

4.Successful on a wide array of real-world data

5.Algorithms are inherently parallel

6.Techniques have recently been developed for the extraction


of rules from trained neural networks

How Backpropagation Algorithm Works


The Back propagation algorithm in neural network computes the gradient
of the loss function for a single weight by the chain rule. It efficiently
computes one layer at a time, unlike a native direct computation. It
computes the gradient, but it does not define how the gradient is used. It
generalizes the computation in the delta rule.

Consider the following Back propagation neural network example


diagram to understand:

A.RAMESH, Asst . prof , C.S.E Dept


Page 16
III CSE UNIT -IV

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually
randomly selected.
3. Calculate the output for every neuron from the input layer, to the
hidden layers, to the output layer.
4. Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.

Keep repeating the process until the desired output is achieved.

SVM
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression. But generally, they are
used in classification problems. In 1960s, SVMs were first introduced but later they got
refined in 1990. SVMs have their unique way of implementation as compared to other
machine learning algorithms. Lately, they are extremely popular because of their ability to
handle multiple continuous and categorical variables.

Working of SVM

A.RAMESH, Asst . prof , C.S.E Dept


Page 17
III CSE UNIT -IV
An SVM model is basically a representation of different classes in a hyperplane in
multidimensional space. The hyperplane will be generated in an iterative manner by SVM
so that the error can be minimized. The goal of SVM is to divide the datasets into classes
to find a maximum marginal hyperplane (MMH).

The followings are important concepts in SVM −


 Support Vectors − Datapoints that are closest to the hyperplane is called support
vectors. Separating line will be defined with the help of these data points.
 Hyperplane − As we can see in the above diagram, it is a decision plane or space
which is divided between a set of objects having different classes.
 Margin − It may be defined as the gap between two lines on the closet data points
of different classes. It can be calculated as the perpendicular distance from the
line to the support vectors. Large margin is considered as a good margin and
small margin is considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find a maximum marginal
hyperplane (MMH) and it can be done in the following two steps −
 First, SVM will generate hyperplanes iteratively that segregates the classes in best
way.
 Then, it will choose the hyperplane that separates the classes correctly.

A.RAMESH, Asst . prof , C.S.E Dept


Page 18
III CSE UNIT -IV

Lazy Learners (or Learning from Your


Neighbors)

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms


based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most
similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN
algorithm.
o K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make
any assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.

Why do we need a K-NN Algorithm?


Suppose there are two categories, i.e., Category A and Category B, and
we have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm.
With the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:

A.RAMESH, Asst . prof , C.S.E Dept


Page 19
III CSE UNIT -IV

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data points
in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
o Step-6: Our model is ready.

Techniques to improve Classification Accuracy

Decision making with data mining is very much complex task.


Ensemble technique is one of the common strategies to
improve the accuracy of classifier. In general ensemble learning
is an effective technology that combines the predictions from

A.RAMESH, Asst . prof , C.S.E Dept


Page 20
III CSE UNIT -IV
multiple base classifiers. Most commonly used ensemble
techniques are bagging and Boosting.

Bagging

Bagging, also known as Bootstrap Aggregating, is used to

improve accuracy and make the model more generalize by

reducing the variance, i.e., avoiding overfitting. In this, we take

multiple subsets of the training dataset. For each subset, we

take a model with the same learning algorithms like Decision

tree, Logistic regression, etc., to predict the output for the same

set of test data. Once we predict each model, we use a model

averaging technique to get the final prediction output. One of

the famous techniques used in Bagging is Random Forest. In the

Random forest, we use multiple decision trees.

Boosting

Boosting is primarily used to reduce the bias and variance in a

supervised learning technique. It refers to the family of an

algorithm that converts weak learners (base learner) to strong

learners. The weak learner is the classifiers that are correct only

up to a small extent with the actual classification, while the

A.RAMESH, Asst . prof , C.S.E Dept


Page 21
III CSE UNIT -IV
strong learners are the classifiers that are well correlated with

the actual classification. Few famous techniques of Boosting are

AdaBoost, GRADIENT BOOSTING, XgBOOST (Extreme Gradient

Boosting)

A.RAMESH, Asst . prof , C.S.E Dept


Page 22

You might also like