Unit V - CART
Unit V - CART
Machine Learning
Last Updated : 23 Sep, 2022
CART( Classification And Regression Tree) is a variation of the decision tree algorithm. It
can handle both classification and regression tasks. Scikit-Learn uses the Classification
And Regression Tree (CART) algorithm to train Decision Trees (also called “growing”
trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and
CART Algorithm
CART is a predictive algorithm used in Machine learning and it explains how the target
variable’s values can be predicted based on other matters. It is a decision tree where each
fork is split into a predictor variable and each node has a prediction for the target variable
at the end.
In the decision tree, nodes are split into sub-nodes on the basis of a threshold value of an
attribute. The root node is taken as the training set and is split into two by considering the
best attribute and threshold value. Fur ther, the subsets are also split using the same logic.
This continues till the last pure sub-set is found in the tree or the maximum number of
StartAdYour
closed by
Subscription Now
▲
Start Your Coding Journey Now! Login Register
The best split point of each input is obtained.
Based on the best split points of each input in Step 1, the new “best ” split point is
identified.
Continue splitting until a stopping rule is satisfied or no fur ther desirable splitting is
available.
CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
searching for the best homogeneity for the sub nodes, with the help of the Gini index
criterion.
The Gini index is a metric for the classification tasks in CART. It stores the sum of squared
probabilities of each class. It computes the degree of probability of a specific variable that
is wrongly being classified when chosen randomly and a variation of the Gini coefficient. It
Where 0 depicts that all the elements are allied to a cer tain class, or only one class
exists there.
The Gini index of value 1 signifies that all the elements are randomly distributed across
A value of 0.5 denotes the elements are uniformly distributed into some classes.
▲
Start Your Coding Journey Now!
Mathematically, we can write Gini Impurity as follows:
Login Register
Classification tree
A classification tree is an algorithm where the target variable is categorical. The algorithm
is then used to identif y the “Class” within which the target variable is most likely to fall.
Classification trees are used when the dataset needs to be split into classes that belong to
Regression tree
A Regression tree is an algorithm where the target variable is continuous and the tree is
used to predict its value. Regression trees are used when the response variable is
continuous. For example, if the response variable is the temperature of the day.
d = 0, endtree = 0
Note(0) = 1, Node(1) = 0, Node(2) = 0
while endtree < 1
if Node(2d-1) + Node(2d) + .... + Node(2d+1-2) = 2 - 2d+1
endtree = 1
else
do i = 2d-1, 2d, .... , 2d+1-2
if Node(i) > -1
Split tree
else
Node(2i+1) = -1
Node(2i+2) = -1
end if
end do
end if
d = d + 1
end while
CART models are formed by picking input variables and evaluating split points on those
▲
Start Your Coding Journey Now!
Steps to create a Decision Tree using the CART algorithm:
Login Register
Greedy algorithm: In this The input space is divided using the Greedy method which is
known as a recursive binar y spitting. This is a numerical method within which all of the
values are aligned and several other split points are tried and assessed using a cost
function.
Stopping Criterion: As it works its way down the tree with the training data, the
recursive binar y splitting method described above must know when to stop splitting.
The most frequent halting method is to utilize a minimum amount of training data
allocated to ever y leaf node. If the count is smaller than the specified threshold, the
split is rejected and also the node is considered the last leaf node.
Tree pruning : Decision tree’s complexity is defined as the number of splits in the tree.
Trees with fewer branches are recommended as they are simple to grasp and less
prone to cluster the data. Working through each leaf node in the tree and evaluating the
effect of deleting it using a hold-out test set is the quickest and simplest pruning
approach.
Data preparation for the CART: No special data preparation is required for the CART
algorithm.
Advantages of CART
Classification and regression trees are Nonparametric and Nonlinear. Sale Ends Soon!
DSA Data Structures Algorithms Write & Earn Interview Preparation Topic-wise Practice C++
Classification and regression trees implicitly per form feature selection.
Limitations of CART
Over fitting.
High Variance.
low bias.