0% found this document useful (0 votes)

10 views13 pages

Classification&Decision Tree

Classification is the process of assigning objects to predefined categories, with applications in spam detection and medical diagnosis. It involves building a classification model using training data, which can be evaluated using a confusion matrix to measure accuracy and error rates. Various techniques such as decision trees, support vector machines, and Bayesian classifiers are employed to create these models and make predictions.

Uploaded by

RAMU NAIK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

Classification&Decision Tree

Uploaded by

RAMU NAIK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Classification

Classification
Classification is the task of assigning objects to one of the several predefined categories in a persistent
problem that encompasses many diverse applications. Examples of classification are are as given below
1. Detecting spam E-mail massages based on header and content

2. Categorizing cells as benign/malignant based upon result of MRI scan

Definition 1. Classification is the task of learning a target function f that maps each attribute set X
to one of the predefined class level Y .
The target function is also known as classification model and it is useful for Descriptive as well as
predictive modelings.

Descriptive Modeling
A classification model can serve as explanatory tool to distinguish between different classes. For exam-
ple, it would be useful for both biologist and others to have a descriptive model that can summarize
the data given below

Table 1: Vertebrate Data Set

Body Skin Give Aquatic Areal Has Class
Name Hibernes
Temp Cover Birth Creature Creature Legs Label
Human Warm Hair Yes No No Yes No Mammal
Python Cold Scales No No No No Yes Reptile

Predictive Modeling
A classification model can also be used to predict the class label of the unknown records. It can be
treated as a black box that automatically assign a class label when presented with attribute set of
unknown record.
Classification techniques are most suited for predicting or describing dataset with binary of nominal
categories. They are less effective for ordinal categories because they do not consider the implicit
ordering among the categories.

General approach to solve a classification Problem

Classification technique is a systematic approach for building a classification model based on training
data set. Some example of classification models are

1. Decision Tree Classifier

2. Rule Based Classifier

1
3. Nearest Neighbor Classifier

4. Support vector Machine (SVM) Classifier

5. Bayesian Classifier

6. Neural Network Based Classifier

Each of above technique applies a learning algorithm to identify the model that best fits the rela-
tionship between attribute and class label of input data. A key objective of learning algorithm is to
build models with good generalization capability i.e. models that accurately predict the class label of
previously unknown record.

Learning
algorithm

Training Induction Learn

Model
Set Model

Deduction Apply
Test Set
Model

Figure 1: General approach to solve classification problem

A training set consisting of records whose class labels are known. It is used to build classification
model. This model is applied to test set which consists of record with unknown class labels.

Evaluation of Classification model

Evaluation of performance of classification model is based on counts of test records correctly or incor-
rectly predicted by model. These counts are tabulated in table known as confusion matrix

2
Table 2: Confusion matrix
Predicted Class
Class 1 Class 0
Actual Class 1 f11 f10
Class Class 0 f01 f00

In the above table fij indicates the number of records from class i predicted to be of class j. Based on
the entries of the confusion matrix, the total number of the correct prediction by the model is

f11 + f00 .

Similarly, total number of wrong prediction is

f10 + f01

Confusion matrix provides information needed to determine how well the classification model performs.
Based on information provided by confusion matrix, we can define performance measures to compare
the performance of different classification models.
No of correct predictions
Accuracy = =
Total no of predictions
f11 + f00
f11 + f01 + f10 + f00
No of wrong predictions
Error Rate = =
Total no of predictions
f10 + f01
f11 + f01 + f10 + f00
No of True Positive
Positive Predictive Value = =
Total no of Positive
f11
f11 + f10
No of True Negative
Negative Predictive value = =
Total no of Negative
f00
f01 + f00
Most classification algorithms seek models that attain the highest accuracy, or equivalently lowest error
rate.

Decision Tree Classifier

Suppose a new species is discovered by scientist, we have to classify weather it is mammal or non-
mammal. One approach is to pose a series of question about the characteristics of the species as given
below

1. Weather the species is cold or warm blooded?

3
2. Do the females of species gives birth?

A series of questions and their possible answer can be organized in the form of a hierarchical structure
consisting of nodes and edges. This hierarchical structure is known as decision tree.

Body Temperature

Cold Warm

Non Mammal Give Birth

no yes

Non Mammal Mammal

The decision tree has three types of nodes:

1. Root Node: This node has not any incoming edges and has zero or more outgoing edge. In the
above tree, (1) is root node.

2. Internal Node: This node has exactly one incoming edge and two or more outgoing edge. In
the above tree, (2) is internal node.

3. Leaf/Terminal Node: This node has exactly one incoming edge and no outgoing edge. In the
above tree, (3), (4) and (5) are leaf node. Leaf node always assigns a class label.

The non-terminal nodes, which includes root and other internal nodes, contain attributes test conditions
to separate records that have different characteristics.

Construction of a decision tree

There may be many decision trees that that can be constructed from a given set of attributes. Effi-
cient algorithms have been developed to induce reasonably accurate and optimal decision tree. Hunt’s
algorithm is one among them.

Haunt’s Algorithm
In this algorithm, a decision tree is grown in iterative fashion to partition training records that are
associated with node t and y = {y1 , . . . , yc } be the class labels. This algorithm is a two step algorithm
as given below

1. If all records in Dt belong to same class yt , then t is a leaf node labeled as yt .

4
2. If Dt contains records that belong to more than one class, an attribute test condition is selected
to partition the records into smaller subsets. A child node is created for each outcome of the test
condition and and the records in Dt are distributed to children based on outcome. The algorithm
is then applied to each child node

Example 1. Let us consider following Loan defaulter dataset. Based on this training dataset, construct
a decision tree for predicting borrowers who will default on loan payment.

Table 3: Loan Defaulter Data

Home Marital Defaulted
TId Annual Income(AI)
Owner(HO) Status(MS) Borrower
1 y s n 125k
2 n m n 100k
3 n s n 70k
4 y m n 120k
5 n d y 95k
6 n m n 60k
7 y d n 220k
8 n s y 85k
9 n m n 75k
10 n s y 90k
y= yes, n=no, m=married, s=single, d=divorcee

Solution. Based on the above dataset, we can grow a decision tree model in following way

Defaulted=No

Figure 2: Step-I

Home Owner

Yes No

Defaulted=No Defaulted=No

Figure 3: Step-II

Some Important Notes

1. Hunt’s Algorithm will work if every combination of attribute value is present in the training data
and each combination has unique class label.

5
Home Owner

Yes No

Defaulted=No Marital Status

M s/d

Defaulted=No Defaulted=Yes

Figure 4: Step-III

Home Owner

Yes No

Defaulted=No Marital Status

M s/d

Defaulted=No Annual Income

<80K >80K

Defaulted=No Defaulted=Yes

Figure 5: Step-IV

6
2. It is possible for some of the child nodes created in second step to be empty i.e. no record
associated with the nodes. This can happen if none of the training records have the combination
of attribute value associated with such nodes. in this case, the node is declared as leaf node with
the same class label as majority class of training records associates with parent node.

3. In second step, if all records associated with Dt have identical values, then it is not possible to
split these records any further. In this case, the node is declared as leaf node with same class
label as majority class of training records associated with this nodes.

Design Issues in Decision Tree Induction

There are two major issues in decision tree induction.

1. How should training be spitted?

2. How should splitting procedure stop?

To deal with the first issue, we need test condition for different attribute type while in order to dealing
with second issue, we need condition to stop tree growing process. A possible strategy is to continue
tree growing process until either all records belong to same class or all record have identical attribute
value.

Different types of attribute

Binary Attribute
A binary attribute generates only two outcomes. The figure given below is an example of binary
attribute.

Binary Attribute

Yes No

Decision 1 Decision 2

Figure 6: Binary Attribute

Nominal Attribute
Nominal attribute can have more than one split. An example of nominal attribute split is given in the
figure. Nominal attribute can also be converted to binary attribute by two way split. In example given
below, we can keep single and divorcee in one category and Married in other category.

7
Marital Status

Single Divorcee

Married

Figure 7: Nominal Attribute

Ordinal Attribute
Like nominal attribute, ordinal attribute can also have more than one split. An example of ordinal
attribute is given in the figure. Ordinal attributes have inherent ordering between categories.

Shirt Size

XX Large

Small X Large

Medium Large

Figure 8: Ordinal Attribute

Ordinal attribute can also produce Binary splits. It can be grouped as long as group does not violet
the order property of attribute values. An example of binary split of above attribute is given below.

Shirt Size

Small, Medium Large, X Large, XX Large

Figure 9: Ordinal Binary Attribute

8
Continuous Attribute
A continuous attribute can have binary or multi-way split. For continuous attributes, the test condition
can be expressed by comparison test A < υ or A > υ with binary outcome. It can also be splitted by
comparing a range query with outcome υi ≤ A < υi + l for i = 1, . . . , k. For multi-way split algorithm,
one must consider all possible ranges of continuous variable.

Annual Imcome

More than 80 K

Less than 10K 50K–80K

10K–20K 20K-50K

Figure 10: Continuous Attribute

Measures of Selecting Best Split

The measures of selecting best split can be defined in terms of class distribution of records before and
after splitting. Let p(i|t) denote the fraction of records belonging to class i at a given node t. In a
two class problem, the class distribution at any node can be written as (p0 , p1 ) where p0 = 1 − p1 .
The measures developed for selecting the best split are often based on the degree of impurity of child
nodes. The smaller the degree of impurity, the more skewed the class distribution. Entropy, Gini and
Classification error are few important measure of impurity.These can be expressed as

c−1
X
Entropy(t) = − p(i|t) log(p(i|t))
i=0
c−1
X
Gini(t) = 1 − [p(i|t)]2
i=0
Classification Error(t) = 1 − max[p(i|t)],
i

where c is the number of classes and 0 log2 0 = 0.

Algorithm of decision Tree induction

Input of algorithm consists of the training record E and the attribute set F . The algorithm is as follows:

1. The createN ode() function extends the decision tree by creating a new node. a node in the
decision tree has either a test condition, denoted as node.test cond, or a class label, denoted as
node.label.

9
Algorithm 1 Algorithm for decision tree induction
TreeGrowth(E, F)

1. if stopping cond(E, F) = TRUE then

2. leaf = createNode()

3. leaf .label = Classify(E)

4. return leaf

5. else

6. root = createNode()

7. root.test cond = find.best split(E, F)

8. let V = {v|vis a all outcome of root.test cond}

9. for each v ∈ V do

10. Ev = {e|root.test cond(e) = v}

11. child = TreeGrowth(Ev , F)

12. add child as descendant of root and label the edge (root → child) as v

13. end for

14. end if

15. return root

10
2. The f ind best split() function determines which attribute should be selected as the test condition
for splitting the training records. As previously noted, the choice of test condition depends on the
impurity measure is used to determine the good of split. Some widely used measure is entropy,
Gini index and χ2 statistic.
3. The classify function determines the class label to be assign to leaf node. For each leaf node t,
the p(i|t) denotes the fraction of training records from class i associated with the node t. In most
cases, the leaf node is assigned to class that has majority number of training records:
leaf.label = argmax p(i|t)
i

where the argmax operator returns the argument i that maximizes the expression p(i|t).
4. The stopping cond() function is used to terminate the tree-growing process by testing whether
all records have either the same class label or the same attribute values.
After building the decision tree, a tree-pruning step can be performing to reduce the size of the decision
tree. Decision trees that are too large are susceptible to a phenomenon known as over fitting.

Bayes Theorem for Classification

We can express joint probability of two event X and Y as
P (X, Y ) = P (Y |X)P (X) = P (X|Y )P (Y )
P (X|Y )P (Y )
⇒ P (Y |X) =
P (X)
The equation mentioned in second line is known as Bayes Theorem. In details, Bayes theorem can be
expressed as
Example 2. Consider a football game between two rival teams: Team= 0 and Team-1. Suppose Team-0
wins 65% of the time and Team-1 wins the reaming matches. Among the game won by team-0 only 30%
of win come from Team-1’s field. On the other hand 75% of Team-1’s win com while playing at home
ground. If the Team-1 is going to host the next match, which team is most likely to be winner?
Solution. X: Team Hosting the Match
Y: Winner of the Match

P [Y= 0] = 0.65
P [Y= 1] = 0.35
P [X = 1|Y= 1] = 0.75
P [X = 1|Y= 0] = 0.30
P (X = 1|Y = 1)P (Y = 1)
P (Y = 1|X = 1) =
P (X = 1)

where, P (X = 1) = P (X = 1|Y = 1)P (Y = 1) + P (X = 1|Y = 0)P (Y = 0) Using above equation

0.75 × 0.35
P (Y = 1|X = 1) = = 5738
0.75 × 0.35 + 0.3 × 0.65

11
Using Bayes Theorem for Classification
Let us consider X as attribute set and Y as class variable. If the class variable has a non deterministic
relationship with attributes then we can treat X and Y as random variables and capture their relation-
ship statistically using Bayes theorem of P (Y |X) . The conditional probability is known as posterior
probability of Y given X as opposed to its prior probability P (Y ).
During the training phase we need to learn posterior probability P (Y |X) for every combination of
X and Y . based on the information gathered from the training data.
By knowing these probabilities, a test record X can be classified by finding class Y ′ can maximize
the posterior probability P (Y ′ |X).
Mow Let us consider the Loan default data and let

X = (HO = n, MS = m, AI = 120K)
.
To, classify the record, we need to compute the particular probabilities P (Y es|X) and P (N o|X) based
on the information available in the training data. If P (Y es|X) > P (N o|X). then the record is classified
as Yes otherwise we classify it as No.
Estimating the posterior probabilities accurately for every combination of class level and attribute
values is a difficult task because it require a large dataset even for moderate number of attributes.
Bayes Theorem is useful because it allows us to express the posterior probabilities in terms of prior
probability P (Y ) conditional probability P (X|Y ). It can be written as

P (X|Y )P (Y )
P (Y |X) =
P (X)
While comparing the posterior for different value of Y, denominator will always remain constant
and thus it can be ignored. P (Y ) can be easily estimated from training set by comparing fraction
of training record that belongs to each class.Further P (X|Y ) can be calculated using two methods as
given below

1. Näive Bayes classifier

2. Bayesian Belief Network

Näive Bayes classifier

It estimates the class conditional probability by assuming that the attributes are conditionally indepen-
dent given the class level Y . the conditional independence assumption can mathematically be expressed
as
d
Y
P (X|Y = y) = P (Xi |Y = y)
i=1

The vector X consists of of d attributes X1 · · · Xd . Let X, Y and Z are three random variables.X
is said to be conditionally independent of Y given Z if the following condition holds.

P (X|Y, Z) = P (X|Z) (1)

Conditional independence can also be Written in the form

12
P (X, Y, X)
P (X, Y |Z) =
P (Z)
P (X, Y, Z)P (Y, Z)
=
P (Y, Z)P (Z)
= P (X|Y, Z)P (Y |Z)

How Näive Bayes classifier Works?

With the conditional probability assumption, instead of computing the class conditional probability of
every combination of X, we only have to estimate the conditional probability of each X given Y . Later
approach is more practical because it does not require a very large training set to obtain good estimate
of probability

P (Y ) di=1 P (Xi |Y )
Q
P (Y |X) =
P (X)
Note that the denominator is fixed for every class Y . Hence, we only need to calculate numerator for
each class level.

Example 3. In Example 1, assume that “Y= Defaulted Borrower” and remaining variables are features
(X). Based on these information find P (Y = yes|X) and P (Y = no|X) if

X = (HO = N o, M S = m, AI = 120k).

.
Solution. Using table and the assumptions [AI|Y es] ∼ N (90, 25) and [AI|N o] ∼ N (110, 2975), we can
obtain the following probabilities

Based on the above information we can write P (X|Y = Y es) and P (X|Y = N o) as

Richard Carrier's "On The Historicity of Jesus"
88% (8)
Richard Carrier's "On The Historicity of Jesus"
57 pages
DMDW - MOD4 - Classification - PPT Updated
No ratings yet
DMDW - MOD4 - Classification - PPT Updated
128 pages
Classification Data Mining
No ratings yet
Classification Data Mining
84 pages
Sidney A. Morris - Calculating Chance - Card and Casino Games-Springer (2024)
100% (1)
Sidney A. Morris - Calculating Chance - Card and Casino Games-Springer (2024)
193 pages
Classification Part 1
No ratings yet
Classification Part 1
76 pages
3.5 Session 14 - Naive Bayes Classifier
67% (3)
3.5 Session 14 - Naive Bayes Classifier
47 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
Classification Slides
No ratings yet
Classification Slides
147 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Unit 3
No ratings yet
Unit 3
95 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
DWDM Unit Iv
No ratings yet
DWDM Unit Iv
81 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Unit 3
No ratings yet
Unit 3
34 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Module 3 Notes
No ratings yet
Module 3 Notes
31 pages
Unit 3
No ratings yet
Unit 3
29 pages
CH 4
No ratings yet
CH 4
21 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Unit 5
No ratings yet
Unit 5
14 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Lecture 11-Classification-M
No ratings yet
Lecture 11-Classification-M
33 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
2013 Facilitating Decision Support Through Decision Tree
No ratings yet
2013 Facilitating Decision Support Through Decision Tree
5 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Module 4
No ratings yet
Module 4
41 pages
Module 4DMDW
No ratings yet
Module 4DMDW
45 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Lecture3 2020classification PDF
No ratings yet
Lecture3 2020classification PDF
124 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Classification
No ratings yet
Classification
81 pages
Unit 3
100% (1)
Unit 3
21 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
4 Classification
No ratings yet
4 Classification
20 pages
Data Mining: Lecture - 03
No ratings yet
Data Mining: Lecture - 03
56 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Unit 3
No ratings yet
Unit 3
16 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
ITTC Alphabet Dictionary of Ship Hydrodynamics (2017)
No ratings yet
ITTC Alphabet Dictionary of Ship Hydrodynamics (2017)
148 pages
ExamQuestions and Solutions Probability PDF
0% (1)
ExamQuestions and Solutions Probability PDF
118 pages
GO2017 Aptitude PDF
No ratings yet
GO2017 Aptitude PDF
214 pages
Module 04
No ratings yet
Module 04
75 pages
Module4 QB 1
No ratings yet
Module4 QB 1
26 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Bayes Theorem Exercise Problems and Solutions
No ratings yet
Bayes Theorem Exercise Problems and Solutions
6 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Module 3
No ratings yet
Module 3
64 pages
An Intuitive Explanation of Bayes Theorem 1-4-2011
No ratings yet
An Intuitive Explanation of Bayes Theorem 1-4-2011
40 pages
Decision Tree and Ensemble
No ratings yet
Decision Tree and Ensemble
92 pages
PTSP
No ratings yet
PTSP
74 pages
Bayesian Inference
No ratings yet
Bayesian Inference
20 pages
Drake
No ratings yet
Drake
20 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
23 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
46 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Unit-4 Knowledge Representation
No ratings yet
Unit-4 Knowledge Representation
31 pages
WIA2003/WIB2003 Probability and Statistics: Bayes' Theorem
No ratings yet
WIA2003/WIB2003 Probability and Statistics: Bayes' Theorem
43 pages
Chapter 4 Techno Conditional Prob 2
100% (1)
Chapter 4 Techno Conditional Prob 2
7 pages
Bayes Lectures English
No ratings yet
Bayes Lectures English
74 pages
Naive Bayes
No ratings yet
Naive Bayes
24 pages
Bayesian Statistics and Modelling
No ratings yet
Bayesian Statistics and Modelling
28 pages
Ai Pro
No ratings yet
Ai Pro
11 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
Probability Theory - MIT OCW
No ratings yet
Probability Theory - MIT OCW
16 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
18 pages
Bayes' Theorem Questions With Solutions
No ratings yet
Bayes' Theorem Questions With Solutions
6 pages
McIntyre, Peter - 52 Concepts To Add To Your Cognitive Toolkit
No ratings yet
McIntyre, Peter - 52 Concepts To Add To Your Cognitive Toolkit
13 pages
Lesson5 Classification 2
No ratings yet
Lesson5 Classification 2
33 pages
ECN 236 - Probability 3
No ratings yet
ECN 236 - Probability 3
9 pages
BayesTheorem HitenKhuman AkshitAcharya
No ratings yet
BayesTheorem HitenKhuman AkshitAcharya
10 pages
Ranking DHI Attributes For Effective Prospect Risk Assessment Applied To The Otway Basin, Australia
No ratings yet
Ranking DHI Attributes For Effective Prospect Risk Assessment Applied To The Otway Basin, Australia
8 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Classification&Decision Tree

Uploaded by

Classification&Decision Tree

Uploaded by

Classification

2. Categorizing cells as benign/malignant based upon result of MRI scan

Table 1: Vertebrate Data Set

General approach to solve a classification Problem

1. Decision Tree Classifier

2. Rule Based Classifier

4. Support vector Machine (SVM) Classifier

6. Neural Network Based Classifier

Training Induction Learn

Figure 1: General approach to solve classification problem

Evaluation of Classification model

Similarly, total number of wrong prediction is

Decision Tree Classifier

1. Weather the species is cold or warm blooded?

Non Mammal Give Birth

Non Mammal Mammal

The decision tree has three types of nodes:

Construction of a decision tree

1. If all records in Dt belong to same class yt , then t is a leaf node labeled as yt .

Table 3: Loan Defaulter Data

Some Important Notes

Defaulted=No Marital Status

Defaulted=No Marital Status

Defaulted=No Annual Income

Design Issues in Decision Tree Induction

1. How should training be spitted?

2. How should splitting procedure stop?

Different types of attribute

Figure 6: Binary Attribute

Figure 7: Nominal Attribute

Figure 8: Ordinal Attribute

Small, Medium Large, X Large, XX Large

Figure 9: Ordinal Binary Attribute

Less than 10K 50K–80K

Figure 10: Continuous Attribute

Measures of Selecting Best Split

where c is the number of classes and 0 log2 0 = 0.

Algorithm of decision Tree induction

1. if stopping cond(E, F) = TRUE then

3. leaf .label = Classify(E)

7. root.test cond = find.best split(E, F)

8. let V = {v|vis a all outcome of root.test cond}

10. Ev = {e|root.test cond(e) = v}

11. child = TreeGrowth(Ev , F)

13. end for

15. return root

Bayes Theorem for Classification

where, P (X = 1) = P (X = 1|Y = 1)P (Y = 1) + P (X = 1|Y = 0)P (Y = 0) Using above equation

1. Näive Bayes classifier

2. Bayesian Belief Network

Näive Bayes classifier

P (X|Y, Z) = P (X|Z) (1)

How Näive Bayes classifier Works?

You might also like