0% found this document useful (0 votes)
32 views72 pages

Unit 3 Classification - Dr. Vidyut D

Uploaded by

DASH CAM SHORTS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views72 pages

Unit 3 Classification - Dr. Vidyut D

Uploaded by

DASH CAM SHORTS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 72

UNIT 3

CLASSIFICATION
Classification:

Decision-Tree Based Approach,


Rule-based Approach,
Instance-based classifiers ,
Support Vector
Machines,
Ensemble Learning,
DT
• Decision Trees (DTs) are a non-parametric
supervised learning method used for
classification and regression. The goal is to
create a model that predicts the value of a
target variable by learning simple decision
rules inferred from the data features. A tree
can be seen as a piecewise constant
approximation.
Some advantages of decision trees
are:
• Simple to understand and to
interpret. Trees can be visualized
• Requires little data preparation.
Other techniques often require data
normalization, dummy variables
need to be created and blank values
to be removed. Some tree and
algorithm combinations support
missing values.
Some advantages of decision
trees
• Able to handle both numerical and
categorical data.
• Possible to validate a model using
statistical tests. That makes it
possible to account for the reliability
of the model.
• Performs well even if its assumptions
are somewhat violated by the true
model from which the data were
generated.
The disadvantages of decision
trees include:
• Decision-tree learners can create over-complex
trees that do not generalize the data well. This
is called overfitting. Mechanisms such as
pruning, setting the minimum number of
samples required at a leaf node or setting the
maximum depth of the tree are necessary to
avoid this problem.
• Decision trees can be unstable because small
variations in the data might result in a
completely different tree being generated. This
problem is mitigated by using decision trees
within an ensemble.
Classification:
• Classification is the most widely used data
science task in business. The objective of a
classification model is to predict a target
variable that is binary (e.g., a loan decision) or
categorical (e.g., a customer type) when a set
of input variables are given. The model does
this by learning the generalized relationship
between the predicted target variable with all
other input attributes from a known dataset
Each algorithm differs by how the
relationship is extracted from the known
training dataset
• Decision trees approach the classification
problem by partitioning the data into purer
subsets based on the values of the input
attributes. The attributes that help achieve the
cleanest levels of such separation are considered
significant in their influence on the target variable
and end up at the root and closer-to-root levels of
the tree. The output model is a tree framework
than can be used for the prediction of new
unlabeled data
DT
• A decision tree uses a treelike graph that
represents a flow-chart-like structure in
which the starting point is the “root.” Each
internal node of the tree represents a test on
an attribute or subset of attributes. Each
branch from the node represents the outcome
of the test, and the final node is a “leaf” that
represents a class label. One can construct a
simple decision tree by hand.
Size of dtree
• The size (length and width) of the tree built
would mainly depend on the number of
features and the number of instances in the
dataset. Too small or too large tree may not be
favorable in terms of accuracy and the speed
at which it reaches a class label. Also the tree
built may have a high accuracy on the training
data, but very less accuracy on the test data.
This scenario is known as Overfitting.
Decision Tree

•Decision tree is a type of supervised


learning approach where we can make
predictions using known dataset called as
the training dataset. Sometimes decision
trees are also called classification and
regression trees.
Binary classification
• You have a set of attributes describing an
instance, e.g.
• the bank wants to put this customer into say
into two categories again whether the
customer will repay the loan or will not repay
the loan. So, basically like a fraud customer or
a real customer.
• This is another example, again a binary
classification problem.
Binary classification
• let say an object is an email; and I have two
categories or classes spam or non spam
email. So, this is a very common problem all of
us face that we get lot of spam emails and we
want to automatically put an email into one of
these two categories. So, this problem is
known as a classification problem
Patient example-multiclass
• There may be other instances for example, maybe a
patient comes to a doctor and the patient has one of
possible say five diseases. And the symptoms of the
patient would be the attributes of the patient. And
the classification system suppose automated
diagnosis system, we will use this attribute values and
tell whether which of these five diseases this patient
has. So, this is a five class classification problem.
• In general, you can have k categories and you would
call it a k class problem
206 DM UNIT 3 - CLASSIFICATION
Types of Decision Tree

1. Classification Tree

A classification tree is used when the dependent


variable is categorical. The value obtained by leaf
nodes in the training data is the mode response
(most repetitive value of a given set of values) of
observation falling in that region It follows a top-
down greedy approach.
206 DM UNIT 3 - CLASSIFICATION
Types of Decision Tree
2. Regression Tree
A regression tree is used when the dependent
variable is continuous. The value obtained by leaf
nodes in the training data is the mean response
(Sum of observations/Number of observations) of
observation falling in that region. Thus, if an unseen
data observation falls in that region, its prediction is
made with the mean value. This means that even if
the dependent variable in training data was
continuous, it will only take discrete values in the
test set. A regression tree follows a top-down greedy
Terminologies associated with decision tree
Parent node: In any two connected nodes, the one which is high
hierarchically, is a parent node.
Child node: In any two connected nodes, the one which is low
hierarchically, is a child node.
Root node: The starting node from which the tree starts, It has only ch
nodes. The root node does not have a parent node. (dark blue node in th
above image)
Leaf Node/leaf: Nodes at the end of the tree, which do not have any childr
are leaf nodes or called simply leaf. (green nodes in the above image)
Internal nodes/nodes: All the in-between the root node and the leaf nod
are internal nodes or simply called nodes. internal nodes have both a pare
and at least one child. (red nodes in the above image)
Splitting: Dividing a node into two or more sun-nodes or adding two or mo
children to a node.
Decision node: when a parent splits into two or more children nodes the
that node is called a decision node.
Pruning: When we remove the sub-node of a decision node, it is call
pruning. You can understand it as the opposite process of splitting.
Branch/Sub-tree: a subsection of the entire tree is called a branch or su
tree.
To build your first decision tree in R example-We will use the R
in-built data set named readingSkills to create a decision tree.
It describes the score of someone's readingSkills if we know the variables "age","shoesize","score" and
whether the person is a native speaker or not
Example
We will use the ctree() function to create the decision tree and see its graph.
# Load the party package. It will automatically load other
# dependent packages.
library(party)

# Create the input data frame.


input.dat <- readingSkills[c(1:105),]

# Give the chart file a name.


png(file = "decision_tree.png")

# Create the tree.


output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)

# Plot the tree.


plot(output.tree)

# Save the file.


dev.off()
When we execute the above code, it produces the following result
Building a Decision Tree in Python

1. import the libraries required to build a


decision tree in Python.

2. Load the data set using the read_csv()


function in pandas.
Building a Decision Tree in Python
https://scikit-learn.org/stable/modules/tree.html

3. Display the top five rows from the data set using the head()
function.
4. Separate the independent and dependent variables using the slicing
method.

5. Split the data into training and testing sets.

6. Train the model using the decision tree classifier.

7. Predict the test data set values using the model above.

8. Calculate the accuracy of the model using the accuracy score function.
Advantages of a decision tree

 Easy to visualize and interpret: Its graphical representation


is very intuitive to understand and it does not require any
knowledge of statistics to interpret it.
 Useful in data exploration: We can easily identify the most
significant variable and the relation between variables with a
decision tree. It can help us create new variables or put some
features in one bucket.
 Less data cleaning required: It is fairly immune to outliers
and missing data, hence less data cleaning is needed.
 The data type is not a constraint: It can handle both
categorical and numerical data.
DISADVANTAGES of DT
• Overfitting: single decision tree tends to overfit
the data which is solved by setting constraints
on model parameters i.e. height of the tree and
pruning(data compression technique in machine learning and
search algorithms that reduces the size of decision trees by removing
sections of the tree)
• Not exact fit for continuous data: It losses some
of the information associated with numerical
variables when it classifies them into different
categories
CONCLUSION
•Formally a decision tree is a graphical
representation of all possible solutions to a
decision. These days, tree-based algorithms are the
most commonly used algorithms in the case of
supervised learning scenarios. They are easier to
interpret and visualize with great adaptability. We can
use tree-based algorithms for both regression and
classification problems, However, most of the time
they are used for classification problem
Do it Your Self

• A high profile placement agency wants to


categories MBA students into classes. The
data collected contains there average
percentage in MBA and result of the
aptitude test conducted by the agency.
MBA Avg. %

<60 60 to 80 >80

Aptt. % Aptt. % Aptt. %

>70 <70 >85 85 to 70 <70 +ve

G A E G A E

E – Excelent
G – Good
A - Average
Tree algorithms: ID3, C4.5, C5.0 and CART

ID3 (Iterative Dichotomiser 3) was developed in


1986 by Ross Quinlan. The algorithm creates a
multiway tree, finding for each node (i.e. in a
greedy manner) the categorical feature that will
yield the largest information gain for categorical
targets. Trees are grown to their maximum size
and then a pruning step is usually applied to
improve the ability of the tree to generalize to
unseen data
Rule-based Approach,

• Rule-based classifiers are just another


type of classifier which makes the class
decision depending by using various
“if..else” rules. These rules are easily
interpretable and thus these classifiers are
generally used to generate descriptive
models. The condition used with “if” is
called the antecedent and the predicted
class of each rule is called the consequent
=IF(F2>79,"O",IF(F2>=70,"A+",IF(F2>=60,"A",IF(F
2>=55,"B+",IF(F2>=50,"B",IF(F2>=45,"C",IF(F2>=
40,"P",IF(F2>=0,"F"))))))))

UoP Seat Marks out of


Roll No Name of the Student Percentage GRADE
Nos/PRN nos 25

A01 12579 ABDULLAH AZHAR QURESHI 19 0 F


A10 12719 ADITYA KUMAR 17 68 A
A11 12587 ADITYA KUMAR PRASAD 19 76 A+
A12 12589 AISHWARY PATEL 19 76 A+
A13 12754 AISHWARYA DILIPRAO MURTADAK 17 68 A
A14 12756 AISHWARYA RAJENDRA NAIK 20 80 O
A15 12642 AISHWARYA RAVINDRA DESHPANDE 20 80 O
A16 12647 AKASH KHANDUJI DHANDE 21 84 O
A18 12665 AKSHAY BABASAHEB GHORPADE 17 68 A
A19 12592 AKSHAY JAGDISH MOHITE 12 48 C
Instance-based classifiers
• Instance: A single row of data is called
an instance. It is an observation from the
domain.
• Feature: A single column of data is called
a feature. It is a component of an
observation and is also called an attribute
of a data instance
• The Machine Learning systems which are
categorized as instance-based
learning are the systems that learn the
training examples by heart and then
generalizes to new instances based on
some similarity measure. It is called
instance-based because it builds the
hypotheses from the training instances.
It is also known as memory-based
learning or lazy-learning.
Support Vector Machines - What are they?

• A Support Vector Machine (SVM) is a supervised


machine learning algorithm that can be employed
for both classification and regression purposes.
SVMs are more commonly used in classification
problems and as such, this is what we will focus on
in this post.

• SVMs are based on the idea of finding a


hyperplane that best divides a dataset into two
classes, as shown in the image below.
Hyper planes in 2D and 3D feature
space:
How to implement
svm

https://youtu.be/1qzrTvHGPow
summary
• Support Vectors
• Support vectors are the data points nearest to
the hyperplane, the points of a data set that, if
removed, would alter the position of the
dividing hyperplane. Because of this, they can
be considered the critical elements of a data
set.
What is a hyperplane?

• As a simple example, for a classification task with only two


features (like the image above), you can think of a hyperplane as
a line that linearly separates and classifies a set of data.

• Intuitively, the further from the hyperplane our data points lie, the
more confident we are that they have been correctly classified.
We therefore want our data points to be as far away from the
hyperplane as possible, while still being on the correct side of it.

• So when new testing data is added, whatever side of the


hyperplane it lands will decide the class that we assign to it.
How do we find the right hyperplane?

• The distance between the


hyperplane and the nearest data
point from either set is known as the
margin. The goal is to choose a
hyperplane with the greatest
possible margin between the
hyperplane and any point within
the training set, giving a greater
chance of new data being classified
correctly.
But what happens when there is no clear
hyperplane?
Hyperplane
• Because we are now in three
dimensions, our hyperplane can no
longer be a line. It must now be a
plane as shown in the example
above. The idea is that the data will
continue to be mapped into higher
and higher dimensions until a
hyperplane can be formed to
segregate it.
• SVM Uses

• SVM is used for text classification tasks such as


category assignment, detecting spam and
sentiment analysis. It is also commonly used for
image recognition challenges, performing
particularly well in aspect-based recognition and
color-based classification. SVM also plays a vital role
in many areas of handwritten digit recognition, such
as postal automation services.
Pros & Cons of Support Vector
Machines

• Pros

• Accuracy
• Works well on smaller cleaner datasets
• It can be more efficient because it uses a subset of training points

• Cons

• Isn’t suited to larger datasets as the training time with SVMs can be
high
• Less effective on noisier datasets with overlapping classes
Ensemble Learning
Meaning
a group of things or people acting or taken
together as a whole, especially a group of
musicians who regularly play together:

Ensemble methods is a machine


learning technique that combines several
base models in order to produce one optimal
predictive model
Ensemble learners

• Ensemble learners are “meta”


models where the model is a
combination of several different
individual models. If certain
conditions are met, ensemble
learners can gain from the wisdom of
crowds and greatly reduce the
generalization error in data science.
Ensemble

• An ensemble is a machine learning


model that combines the predictions
from two or more models. The models
that contribute to the ensemble, referred to
as ensemble members, may be the same
type or different types and may or may not
be trained on the same training data.
Ensemble Learning
• https://youtu.be/CDewPfLV4Tc
• Ensemble Learning Tutorial | Ensemble
Techniques | Machine Learning Training |
Edureka
• https://youtu.be/CDewPfLV4Tc
Types of Ensemble Methods
• BAGGing, or Bootstrap AGGregating.
• BAGGing gets its name because it combines
Bootstrapping and Aggregation to form one
ensemble model. Given a sample of data, multiple
bootstrapped subsamples are pulled. A Decision
Tree is formed on each of the bootstrapped
subsamples. After each subsample Decision Tree
has been formed, an algorithm is used to aggregate
over the Decision Trees to form the most efficient
predictor. The image below will help explain:
• 2. Random Forest Models. Random Forest Models can be thought of as
BAGGing, with a slight tweak. When deciding where to split and how to
make decisions, BAGGed Decision Trees have the full disposal of
features to choose from. Therefore, although the bootstrapped
samples may be slightly different, the data is largely going to break off
at the same features throughout each model. In contrary,
• Random Forest models decide where to split based on a random
selection of features. Rather than splitting at similar features at each
node throughout, Random Forest models implement a level of
differentiation because each tree will split based on different features.
This level of differentiation provides a greater ensemble to aggregate
over, ergo producing a more accurate predictor. Refer to the image for
a better understanding.
• Similar to BAGGing, bootstrapped subsamples
are pulled from a larger dataset. A decision
tree is formed on each subsample. HOWEVER,
the decision tree is split on different features
(in this diagram the features are represented
by shapes).
Classification Model Selection and
Evaluation
• Model Selection and Evaluation is a hugely important
procedure in the machine learning workflow. This is the
section of our workflow in which we will analyse our model.
We look at more insightful statistics of its performance and
decide what actions to take in order to improve this model.
• This step is usually the difference between a model that
performs well and a model that performs very well. When
we evaluate our model, we gain a greater insight into what
it predicts well and what it doesn’t and this helps us turn it
from a model that predicts our dataset with a 65% accuracy
level to closer to 80% or 90%.
Recommender Systems

• Metrics and Scoring


• Let’s say we have two hypothesis for a task, h(x) and h’(x).
How would we know which one is better. Well from a high
level perspective, we might take the following steps:

• Measure the accuracy of both hypotheses


• Determine whether there is any statistical significance
between the two results. If there is, select the better
performing hypothesis. If not, we cannot say with any
statistical certainty that either h(x) or h’(x) is better.
Applications: B2B customer buying stage
prediction, Recommender Systems.
• Recommender system is a popular strategy for helping
customers to select desirable products or services.
Recommender systems solve this problem by analyzing large
volume of data to provide businesses and users with
personalized content and services according to their
preferences and tastes.
• This system is applied to help the customers in B2C as well
as B2B electronic commerce. By helping the buyer and
supplier, recommender system reduces the business
transaction costs. Let’s look at the example of e-market
which includes B2C and B2B transactions to illustrate the
recommender system.
• E-Commerce Example
• Amazon's recommendation system is capable of
intelligently analyzing and predicting customers'
shopping preferences in order to offer them a list of
recommended products.

• Although other stores have also introduced similar


functionalities to their websites in recent years,
Amazon's recommendation engine is considered to
be one of the best on the market.
• It is an artificial intelligence and machine
learning service that specializes in developing
recommender system solutions. It
automatically analyzes data, selects functions
and algorithms, optimizes the model based on
the data, and implements and maintains the
model to generate real-time
recommendations.

You might also like