100% found this document useful (1 vote)

104 views21 pages

Unit 3

The document discusses classification, which is a machine learning task of mapping attribute sets to predefined class labels. Classification models can be descriptive, to summarize data, or predictive, to predict class labels of unknown records. Some common classification techniques discussed are decision trees, rules, neural networks, support vector machines, and Naive Bayes. Decision tree induction builds trees that partition data into subsets using attribute test conditions to classify examples. The best split is selected using measures like information gain, Gini index, and cross entropy that quantify the purity of child nodes.

Uploaded by

nandan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

104 views21 pages

Unit 3

Uploaded by

nandan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT 4: CLASSIFICATION

Classification: Basic concepts:

Definition:

Classification is the task of learning a target function ‘f’ that maps each attribute set ‘x’ to one of
the predefined class label ‘y’.

In this attribute set ‘x’ can be any number of attributes and the attributes can be binary,
categorical and continuous. The class label ‘y’ must be a discrete attribute; i.e., either binary or
categorical (nominal or ordinal).

Classification models:

--Descriptive modeling is a classification model used for summarizing the data.

--Predictive modeling is a classification model used to predict the class label of unknown
records.

Applications:
i) Detecting spam email messages based upon the message header and content.
ii) Classifying galaxies based upon their shapes.
iii) Classifying the Students based on their Grades.
iv) Classifying the Patients according to their Medical records.
v) Classification can be used in credit approval.

General approach to solve a classification problem:

--A classification technique is a systematic approach to build classification models based on a

data set.

--Examples are decision tree classifiers, rule-based classifiers, neural networks, support vector
machines and naïve Bayes classifier.

--Each technique employs a learning algorithm to identify a model that best fits the relationship
between the attribute set and the class label of the input data.

--A training set consists of records whose class labels are known must be provided. The training
test is used to build a classification model, which is applied to the test set. The test set consists of
records whose class label is unknown

--Evaluation of the performance of a classification model is based on the counts of test records
correctly and incorrectly predicted by the model.

--These counts are tabulated in a table known as confusion matrix.

Predicted Class
Class 1 Class 0
Class=1 f11 f10
Actual Class f01 f00
Class=0

--Each entry fij in the table denotes the number of records from the class ‘i’ predicted to be of
class ‘j’.

--For example, f01 refers to the number of records from class 0 incorrectly predicted as class 1.

--Based on the entries in the confusion matrix, the total number of correct predictions made by
the model is (f11+f00) and the total number of incorrect predictions is (f01+f10).

--Although a confusion matrix provides the information needed to determine how well a
classification model performs, summarizing this information with a single number would make it
more convenient to compare the performance of different models.

--This can be done using a performance metric.

--Accuracy can be expresses as:

Accuracy= Number of correct predictions/ Total number of predictions

. Accuracy= (f11+f00)/(f11+f10+f00+f01)

-- Equivalently, Error rate can be expresses as:

Error rate=Number of wrong predictions/ Total number of predictions

. Error rate = (f10+f01)/(f11+f10+f00+f01)

Decision Tree Induction: Decision tree induction is a technique used for identifying
unknown class labels in classification. The topics are:

--Working of decision tree

--Building a decision tree

--Methods for expressing attribute test conditions

--Measures for selecting the best split

--Algorithm for decision tree induction

Working of a decision tree:

The tree has three types of nodes.

i) A root node has no incoming edges and zero or more outgoing edges.
ii) Internal nodes, each of which has exactly one incoming edge and two or more outgoing
edges.
iii) Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing
edges.

Fig: A decision tree for mammal classification problem

In this example, we are classifying whether vertebrate is a mammal or non-mammal. From this
decision tree, we can identify a new vertebrate as mammal or non-mammal. If the vertebrate is
cold-blooded, then it is a non-mammal. If the vertebrate is warm-blooded, then check the next
node gives berth. If it gives berth, then it is a mammal, else, non-mammal.

Fig: Classifying an unlabelled vertebrate

Building of a decision tree:

--There are various algorithms devised for constructing a decision tree. They are:

i) Hunt’s algorithm
ii) ID3 (Iterative Dichotomiser 3)
iii) C4.5 (Classification 4.5)
iv) CART (Classification Algorithm and Regression Tree)

--These algorithms usually employ a greedy strategy that grows a decision tree by making a
series of locally optimum decisions about which attribute to use for partitioning the data. One
such algorithm is Hunts algorithm.

Hunt’s algorithm

--In Hunt’s algorithm, a decision tree is grown in a recursive fashion by partitioning the training
records into subsets.

--Let Dt be a set of training records that are associated with node t and y={y 1,y2,…,yc} be the
class labels.

--The recursive procedure for hunt’s algorithm is as follows:

STEP 1

If all the records in Dt belong to same class yt, then t is a leaf node labeled as yt.

STEP 2

If Dt contains records that belong to more than one class, an attribute test condition is selected to
partition the records into smaller subsets. A child node is created for each outcome and the
records in Dt are distributed based on the outcomes. The algorithm is then recursively applied for
each node.

Fig: Training set for predicting borrowers who will default on loan payments
--In the above data set, the class labels for all the 10 records are not same, so step 1 cannot be
satisfied. We need to construct the decision tree using step 2.

--The class label has maximum number of records with “no”. So, label the node as follows:

--Select one of the attribute as root node, say, home owner since home owner with entry “yes”
need not require any further splitting. There are 3 records with home owner =yes and records
with home owner=no.

--The records with home owner=yes are classified and we now need to classify other 7 records
i.e., home owner=no. The attribute test condition can be applied either on marital status or annual
income.

--Let us select marital status, where we apply binary split. Here marital status=married need not
require further splitting.

--The records with marital status=married are classified and we now need to classify other 4
records i.e., home owner=no and marital status=single, divorced.

--The left out attribute is annual income. Here we select the range since it is a continuous
attribute.
--Now the other 4 records are also classified.

Additional conditions are needed to handle some special cases:

i) It is possible for some of the child nodes created in step 2 to be empty; i.e., there are no
records associated with these nodes. In such cases assign the same class label as the
majority class of training records associated with its parent node; i.e., in our example
majority class is no, so assign ‘no’ for the new record.
ii) If all the records in Dt have identical attribute values but the class label is different in
such cases, assign the majority class label.

Methods for expressing attribute test conditions:

The following are the methods for expressing attribute test conditions. They are:

i) Binary attribute: The test condition for binary attribute generate two outcomes as
shown below:

ii) Nominal attributes: since a nominal attribute can have many values, its test condition
can be expressed in two ways as shown below:
For a multi way split, the number of outcomes depends on the number of distinct
values for the corresponding attribute.

Some algorithms, such as CART supports only binary splits. In such case we can
partition the k-attribute values into 2k-1-1 ways.
For example, marital status is a 3-attribute value, we can split it in 22-1-1; i.e., 3 ways.
iii) Ordinal attribute: It can also produce binary or multi way splits. Ordinal attribute
values can be grouped as long as the grouping does not violate the order property of
the attribute values.

In the above example, condition a and condition b satisfies order but condition c
violates the order property.

iv) Continuous attributes: The test condition can be expressed as a comparison test (A<v)
or (A>=v) with binary outcomes, or a range query with outcomes of the form
vi<=A<vi+1 for i=1,2,…,k.
Measures for selecting the best split:

There are many measures that can be used to determine the best way to split the records.

Let P(i|t) denote the fraction of records belonging to class i at a node t. the measures for selecting
the best split are often based on the degree of impurity of the child nodes. The smaller the
degree of impurity, the more skewed the class distribution. For example, a node with class
distribution (0,1) has zero impurity, whereas a node with uniform class distribution (0.5,0.5) has
the highest impurity.

Examples of impurity measures include:

The 3 measures attain maximum values when the class distribution is uniform and minimum
when all the records belong to same class.

Compare the degree of impurity of the parent node with the degree of impurity of the child node.
The larger their difference, the better the test condition. The gain, ∆, is a criterion that can be
used to determine the goodness of a split.

Where I(.) is the impurity measure of a given node, N is the total number of records at the parent
node, k is the attribute values and N(vj) is the number of records associated with node vj. when
entropy is used as impurity measure the difference in entropy is known as information gain, ∆info.

Splitting of binary attributes

Suppose there are two ways to split the data into smaller subsets, say, A and B. before splitting
the GINI index is 0.5 since there are equal number of records from both the classes.

For attribute A,

For node N1, the GINI index is 1-[(4/7)2+(3/7)2]=0.4898

For node N2, the GINI index is 1-[(2/5)2+(3/5)2]=0.48

The average weighted GINI index is (7/12)(0.4898)+(5/12)(0.48)=0.486

For attribute B, the average weighted GINI index is 0.375, since the subsets for attribute B have
smaller GINI index than A, attribute B is preferable.

Splitting of nominal attributes

A nominal attribute can produce either binary or multi way split.

The computation of GINI index is same as for binary attributes. The smaller the average GINI
index is the best split. In our example, multi way split has the lowest GINI index, so it is the best
split.

Splitting of continuous attributes

In order to split a continuous attribute, we select a range.

In our example, the sorted values represents the ascending order of distinct values in continuous
attribute.

Split positions represent mean between two adjacent sorted values.

Calculate the GINI index for every split position and the smaller GINI index split position can be
chosen as the range for continuous attribute

Algorithm for decision tree induction:

i) The create node() function extends the decision tree by creating a new node. A node in the
decision tree has either a test condition, denoted as node.test_cond, or a class label,
denoted as node.label.
ii) The find.best_split () function determines which attribute should be selected as the test
condition for splitting the training records.
iii) The classify() function determines the class label to be assigned to a leaf node.
iv) The stopping_cond() function is used to terminate the tree-growing process by testing
whether all the records are classified or not.

1. Bayes’ Theorem:

It is a classification technique based on Bayes’ Theorem with an assumption of independence

among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular
feature in a class is unrelated to the presence of any other feature. For example, a fruit may be
considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features
depend on each other or upon the existence of the other features, all of these properties independently
contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:
Above,

 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).

 P(c) is the prior probability of class.

 P(x|c) is the likelihood which is the probability of predictor given class.

 P(x) is the prior probability of predictor.

How Naive Bayes algorithm works?

Let’s understand it using an example. Below I have a training data set of weather and
corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify
whether players will play or not based on weather condition. Let’s follow the below steps to
perform it.

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability =
0.29 and probability of playing is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each
class. The class with the highest posterior probability is the outcome of prediction.
Problem: Players will play if weather is sunny. Is this statement is correct? We can

solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64 Now, P
(Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based on various
attributes. This algorithm is mostly used in text classification and with problems having multiple
classes.

Applications of Naive Bayes Algorithms

Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be
used for making predictions in real time.

Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we
can predict the probability of multiple classes of target variable.

Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text
classification (due to better result in multi class problems and independence rule) have higher success
rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-
mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer
sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a
Recommendation System that uses machine learning and data mining techniques to filter unseen
information and predict whether a user would like a given resource or not

1. Naïve Bayesian Classification:

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. It is not a single algorithm but a family of algorithms where all of them share a common
principle, i.e. every pair of features being classified is independent of each other.

To start with, let us consider a dataset.

Consider a fictional dataset that describes the weather conditions for playing a game of Cricket.
Given the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for
playing Cricket.

Here is a tabular representation of our dataset.

OUTLOO TEMPERATURE HUMIDIT WINDY PLAY CRICKET

K Y

0 Rainy Hot High False No

1 Rainy Hot High True No

2 Overcas Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcas Cool Normal True Yes

t
7 Rainy Mild High False No
OUTLO TEMPERATURE HUMIDIT WINDY PLAY CRICKET
OK Y

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overca Mild High True Yes

12 Overca Hot Normal False Yes

13 Sunny Mild High True No

The dataset is divided into two parts, namely, feature matrix and the response vector.

 Feature matrix contains all the vectors (rows) of dataset in which each vector
consists of the value of dependent features. In above dataset, features are ‘Outlook’,
‘Temperature’, ‘Humidity’ and ‘Windy’.

 Response vector contains the value of class variable (prediction or output) for each
row of feature matrix. In above dataset, the class variable name is ‘Play Cricket’.

Bayes’ Theorem: Bayes’ Theorem finds the probability of an event occurring given the probabili
P( A | B ) =P( B | A ) P(A)
P(B)
where A and B are events and P(B) ? 0.

 Basically, we are trying to find probability of event A, given the event B is true. Event B is also
termed as evidence.

 P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is
seen). The evidence is an attribute value of an unknown instance(here, it is event B).

 P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

Now, with regards to our dataset, we can apply Bayes’ theorem in following way:
P( y | X ) = P( X | y ) P(y)
P(X)
where, y is class variable and X is a dependent feature vector (of size n) where:

X = { x1, x2, x3, x4, x5,................, Xn }

Just to clear, an example of a feature vector and corresponding class variable can be (refer 1st
row of dataset)
X = ( Rainy, Hot, High, False )
y = No
So basically, P(X|y) here means, the probability of “Not playing Cricket” given that the weather
conditions are “Rainy outlook”, “Temperature is hot”, “high humidity” and “no wind”.
Naive Assumption: Now, its time to put a naive assumption to the Bayes’ theorem,
which is independence among the features. So now, we split evidence into the
independent parts.
Now, if any two events A and B are independent, then,
P(A,B) = P(A)P(B)
Hence, we reach to the result:
P( y|x1,……, xn ) = P( x1|y ) P( x2|y )… P( xn|y ) P(y)
P(x1) P(x2)............P(xn)
which can be expressed as:

P( y | x1,……, xn ) = P( y ) πn i=1 P( xi | y )
P(x1) P(x2)............P(xn)
Now, as the denominator remains constant for a given input, we can remove that term:

P( y | x1,……, xn ) P( y ) πn i=1 P( xi | y )
Now, we need to create a classifier model. For this, we find the probability of given set of
inputs for all possible values of the class variable y and pick up the output with maximum

y = argmaxyP( y ) πn i=1 P( xi | y )
So, finally, we are left with the task of calculating P(y) and P(x i | y).
Please note that
P(y) is also called class probability and
P(xi | y) is called conditional probability.
The different naive Bayes classifiers differ mainly by the assumptions they make regarding the
distribution of P(xi | y).

Let us try to apply the above formula manually on our weather dataset. For this, we need to do
some pre-computations on our dataset.
We need to find P(xi | yj) for each xi in X and yj in y. All these calculations have been
demonstrated in the tables below:

So, in the figure above, we have calculated P(xi | yj) for each xi in X and yj in y manually in the tables

For example, probability of playing Cricket given that the temperature is cool,
i.e P(temp. = cool | play Cricket = Yes) = 3/9.

Also, we need to find class probabilities (P(y)) which has been calculated in the table 5.
For example, P(play Cricket = Yes) = 9/14.
So now, we are done with our pre-computations and the classifier is ready!
Let us test it on a new set of features (let us call it today):
Today = ( Sunny, Hot, Normal, False )
So, probability of playing cricket is given by:

P(Yes|today) = P( SunnyOutlook|Yes) P(HotTemperature|Yes) P( NormalHunidity|Yes) P(NoWind|Yes) P (Yes)

P(today)

and probability to not play cricket is given by

P(No|today) = P( SunnyOutlook|No) P(HotTemperature|No) P( NormalHunidity|No) P(NoWind|No) P (No)

P(today)

Since, P(today) is common in both probabilities, we can ignore P(today) and

find proportional probabilities as:

P( Yes | today ) 2 .2 .6 .6 . 9 ≈ 0.0141

9 9 9 14
And 9

P( No | today ) . . .
3 2 1 5 ≈ 0.0068
2.
5 5 5 14
Now, Since 5
P( Yes | today ) + P( No | today ) = 1

These numbers can be converted into a probability by making the sum equal to 1
( normalization)

P( Yes | today ) = 0.0141 = 0 67.

0.0141+0.0068

P( No | today ) = 0.0068 = 0 33.

0.0141+0.0068

Since

P( Yes | today ) > P( No | today )

So, Prediction that Cricket would be played is ‘ Yes’.

2. Bayesian Belief Networks

A Bayesian Belief Networks (BBN) is a special type of diagram (called a directed graph) together with
an associated set of probability tables. They are also known as Belief Networks, Bayesian Networks, or
Probabilistic Networks.

There are two components that define a Bayesian Belief Network −

 Directed acyclic graph
 A set of conditional probability tables

The graph consists of nodes and arcs. The nodes represent variables, which can be discrete or
continuous. The arcs represent causal relationships between variables.

Directed Acyclic Graph

 Each node in a directed acyclic graph represents a random variable.
 These variable may be discrete or continuous valued.
 These variables may correspond to the actual attribute given in the
data.
The following diagram shows a directed acyclic graph for six Boolean variables.
The arc in the diagram allows representation of causal knowledge. For example,
lung cancer is influenced by a person's family history of lung cancer, as well as
whether or not the person is a smoker. It is worth noting that the variable
PositiveXray is independent of whether the patient has a family history of lung
cancer or that the patient is a smoker, given that we know the patient has lung
cancer.

Conditional Probability Table

The conditional probability table for the values of the variable LungCancer (LC)
showing each possible combination of the values of its parent nodes,
FamilyHistory (FH), and Smoker (S) is as follows −

Mini Max
100% (1)
Mini Max
9 pages
Tree
No ratings yet
Tree
192 pages
Clustering PPT 1233
No ratings yet
Clustering PPT 1233
18 pages
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
Data Structures Semester 3
No ratings yet
Data Structures Semester 3
329 pages
Unit 3
No ratings yet
Unit 3
95 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
Module-2 Lecture 7
100% (1)
Module-2 Lecture 7
21 pages
Module 4
No ratings yet
Module 4
41 pages
String in Java
No ratings yet
String in Java
21 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
DAA Unit3 Notes and QBank
100% (1)
DAA Unit3 Notes and QBank
37 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
49 pages
Web Development Using PHP
No ratings yet
Web Development Using PHP
65 pages
Programs, Files Data Structures
100% (1)
Programs, Files Data Structures
13 pages
Markov Decision Process and Reinforcement Learning
No ratings yet
Markov Decision Process and Reinforcement Learning
36 pages
Unit 1 - Project Management - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Project Management - WWW - Rgpvnotes.in
13 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Pre-Placements Checklist
100% (1)
Pre-Placements Checklist
9 pages
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
23 pages
Topic 3 Data Models PDF
No ratings yet
Topic 3 Data Models PDF
12 pages
Packages
No ratings yet
Packages
11 pages
Minimum Spanning Tree (Prim's and Kruskal's Algorithms)
No ratings yet
Minimum Spanning Tree (Prim's and Kruskal's Algorithms)
17 pages
UNIT1
No ratings yet
UNIT1
38 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Unit I R Data Structures
No ratings yet
Unit I R Data Structures
30 pages
DAA - Backtracking Branch and Bound
No ratings yet
DAA - Backtracking Branch and Bound
39 pages
22CS302 - UNIT 1 To 3 - Material
No ratings yet
22CS302 - UNIT 1 To 3 - Material
93 pages
Data Structures
No ratings yet
Data Structures
11 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
FirstTwounitsNotes OOSD (16oct23)
No ratings yet
FirstTwounitsNotes OOSD (16oct23)
97 pages
CS6659 AI UNIT 3 Notes
50% (4)
CS6659 AI UNIT 3 Notes
30 pages
SPM Unit 5 Notes
No ratings yet
SPM Unit 5 Notes
24 pages
Ada Notes
No ratings yet
Ada Notes
148 pages
Algorithms: Dynamic Programming: 0-1 Knapsack Problem
No ratings yet
Algorithms: Dynamic Programming: 0-1 Knapsack Problem
13 pages
Red - Black Tree
No ratings yet
Red - Black Tree
12 pages
Java Programs 1-10
No ratings yet
Java Programs 1-10
23 pages
300+ TOP Teaching Aptitude MCQs and Answers Quiz Exam 2022
No ratings yet
300+ TOP Teaching Aptitude MCQs and Answers Quiz Exam 2022
23 pages
Unit 2 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Data Structure - WWW - Rgpvnotes.in
22 pages
N-Queens Problem
No ratings yet
N-Queens Problem
7 pages
Dsa Basic Data Structure
No ratings yet
Dsa Basic Data Structure
72 pages
Daa
No ratings yet
Daa
113 pages
3 Greedy Method New
No ratings yet
3 Greedy Method New
92 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
5.2 Faculty Cadre Proportion
No ratings yet
5.2 Faculty Cadre Proportion
4 pages
Unit 4
No ratings yet
Unit 4
4 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
Machine Learning (6CS4-02) Unit-3 Notes
No ratings yet
Machine Learning (6CS4-02) Unit-3 Notes
21 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
Automating Data Analyses Using Artificial Intelligence
No ratings yet
Automating Data Analyses Using Artificial Intelligence
114 pages
N Queen Problem
No ratings yet
N Queen Problem
12 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Daa M-4
No ratings yet
Daa M-4
28 pages
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
No ratings yet
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
15 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
Cse-IV-unix and Shell Programming (10cs44) - Notes
No ratings yet
Cse-IV-unix and Shell Programming (10cs44) - Notes
161 pages
Bayesforbeginners
No ratings yet
Bayesforbeginners
21 pages
Software Engineering
No ratings yet
Software Engineering
29 pages
Chap 6 - Software Reuse
No ratings yet
Chap 6 - Software Reuse
51 pages
Asymptotic Time Complexity
No ratings yet
Asymptotic Time Complexity
2 pages
DAA Unit-2: Fundamental Algorithmic Strategies
No ratings yet
DAA Unit-2: Fundamental Algorithmic Strategies
5 pages
(9,10) Transformers - 3
0% (1)
(9,10) Transformers - 3
92 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
Ds and Algo
No ratings yet
Ds and Algo
2 pages
ADSA Unit-4
No ratings yet
ADSA Unit-4
16 pages
Data Science Honor Syllabus Sem-I
No ratings yet
Data Science Honor Syllabus Sem-I
5 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Chapter 10: Algorithms 10.1. Deterministic and Non-Deterministic Algorithm
No ratings yet
Chapter 10: Algorithms 10.1. Deterministic and Non-Deterministic Algorithm
5 pages
Python Projects 23-24-25 With YEAR
No ratings yet
Python Projects 23-24-25 With YEAR
11 pages
CNS Unit-Ii
No ratings yet
CNS Unit-Ii
58 pages
CNS Unit-Vi
No ratings yet
CNS Unit-Vi
47 pages
CNS Unit - I
No ratings yet
CNS Unit - I
40 pages
Going Deeperwith Embedded FPGAPlatformfor Convolutional Neural Network
No ratings yet
Going Deeperwith Embedded FPGAPlatformfor Convolutional Neural Network
11 pages
Past Papers Imperial College London MSC Hydrology - HWRM
No ratings yet
Past Papers Imperial College London MSC Hydrology - HWRM
6 pages
Questions and Answers
No ratings yet
Questions and Answers
33 pages
MCA V20PCA305 EA2332251010576 ChandraLekha
No ratings yet
MCA V20PCA305 EA2332251010576 ChandraLekha
36 pages
MCS-224 Dec 2023
No ratings yet
MCS-224 Dec 2023
6 pages
5.5 Inno-Final (1) - 1
No ratings yet
5.5 Inno-Final (1) - 1
24 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
Machine Learning Technique Based Wrist Radial Pulse Diagnosis
No ratings yet
Machine Learning Technique Based Wrist Radial Pulse Diagnosis
37 pages
Agmas Getenet
No ratings yet
Agmas Getenet
75 pages
SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model
No ratings yet
SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model
11 pages
Data Mining Methods Basics - Resp
No ratings yet
Data Mining Methods Basics - Resp
33 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
24 pages
CourseOutcomes FLAT (2021-22)
No ratings yet
CourseOutcomes FLAT (2021-22)
2 pages
Liu Et Al 2022 Data Driven Machine Learning in Environmental Pollution Gains and Problems
No ratings yet
Liu Et Al 2022 Data Driven Machine Learning in Environmental Pollution Gains and Problems
10 pages
Set 3
No ratings yet
Set 3
1 page
Descriptive and Predictive Analytics On Adventure Works Cycle: A Corporate Decision Making
100% (1)
Descriptive and Predictive Analytics On Adventure Works Cycle: A Corporate Decision Making
9 pages
Financial Services Professional I Have
No ratings yet
Financial Services Professional I Have
23 pages
Evaluation Metrics: Anand Avati
No ratings yet
Evaluation Metrics: Anand Avati
31 pages
An Overview and Comparison of Free Python Libraries For Data Mining and Big Data Analysis
No ratings yet
An Overview and Comparison of Free Python Libraries For Data Mining and Big Data Analysis
6 pages
Deep CNN Model Based On VGG16 For Breast Cancer Classification
No ratings yet
Deep CNN Model Based On VGG16 For Breast Cancer Classification
6 pages
Psychoradiologic Utility of MR Imaging For Diagnosis of Attention Deficit Hyperactivity Disorder
No ratings yet
Psychoradiologic Utility of MR Imaging For Diagnosis of Attention Deficit Hyperactivity Disorder
11 pages
Face Detection Using Template Matching: Deepesh Jain Husrev Tolga Ilhan Subbu Meiyappan
No ratings yet
Face Detection Using Template Matching: Deepesh Jain Husrev Tolga Ilhan Subbu Meiyappan
19 pages
Analysis Study of Malware Classification Portable Executable Using Hybrid Machine Learning
No ratings yet
Analysis Study of Malware Classification Portable Executable Using Hybrid Machine Learning
6 pages
A Machine Learning Application For Latency Prediction in Operational 4G Networks
No ratings yet
A Machine Learning Application For Latency Prediction in Operational 4G Networks
4 pages
Sse - 27-12-459-01
No ratings yet
Sse - 27-12-459-01
1 page
Assignment 2
No ratings yet
Assignment 2
2 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT 4: CLASSIFICATION

Classification: Basic concepts:

--Descriptive modeling is a classification model used for summarizing the data.

General approach to solve a classification problem:

--A classification technique is a systematic approach to build classification models based on a

--These counts are tabulated in a table known as confusion matrix.

--This can be done using a performance metric.

--Accuracy can be expresses as:

Accuracy= Number of correct predictions/ Total number of predictions

-- Equivalently, Error rate can be expresses as:

Error rate=Number of wrong predictions/ Total number of predictions

. Error rate = (f10+f01)/(f11+f10+f00+f01)

--Working of decision tree

--Building a decision tree

--Methods for expressing attribute test conditions

--Measures for selecting the best split

--Algorithm for decision tree induction

Working of a decision tree:

The tree has three types of nodes.

Fig: A decision tree for mammal classification problem

Fig: Classifying an unlabelled vertebrate

--The recursive procedure for hunt’s algorithm is as follows:

Additional conditions are needed to handle some special cases:

Methods for expressing attribute test conditions:

Examples of impurity measures include:

Splitting of binary attributes

For node N1, the GINI index is 1-[(4/7)2+(3/7)2]=0.4898

For node N2, the GINI index is 1-[(2/5)2+(3/5)2]=0.48

The average weighted GINI index is (7/12)(0.4898)+(5/12)(0.48)=0.486

Splitting of nominal attributes

Splitting of continuous attributes

In order to split a continuous attribute, we select a range.

Split positions represent mean between two adjacent sorted values.

Algorithm for decision tree induction:

It is a classification technique based on Bayes’ Theorem with an assumption of independence

 P(c) is the prior probability of class.

 P(x|c) is the likelihood which is the probability of predictor given class.

 P(x) is the prior probability of predictor.

How Naive Bayes algorithm works?

Step 1: Convert the data set into a frequency table

solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Applications of Naive Bayes Algorithms

1. Naïve Bayesian Classification:

To start with, let us consider a dataset.

Here is a tabular representation of our dataset.

OUTLOO TEMPERATURE HUMIDIT WINDY PLAY CRICKET

0 Rainy Hot High False No

1 Rainy Hot High True No

2 Overcas Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcas Cool Normal True Yes

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overca Mild High True Yes

12 Overca Hot Normal False Yes

13 Sunny Mild High True No

 P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

X = { x1, x2, x3, x4, x5,................, Xn }

P(Yes|today) = P( SunnyOutlook|Yes) P(HotTemperature|Yes) P( NormalHunidity|Yes) P(NoWind|Yes) P (Yes)

and probability to not play cricket is given by

P(No|today) = P( SunnyOutlook|No) P(HotTemperature|No) P( NormalHunidity|No) P(NoWind|No) P (No)

Since, P(today) is common in both probabilities, we can ignore P(today) and

P( Yes | today ) 2 .2 .6 .6 . 9 ≈ 0.0141

P( Yes | today ) = 0.0141 = 0 67.

P( No | today ) = 0.0068 = 0 33.

P( Yes | today ) > P( No | today )

So, Prediction that Cricket would be played is ‘ Yes’.

There are two components that define a Bayesian Belief Network −

Directed Acyclic Graph

Conditional Probability Table

You might also like