0% found this document useful (0 votes)
84 views4 pages

Unit V - CART

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views4 pages

Unit V - CART

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CART (Classification And Regression Tree) in

Machine Learning
Last Updated : 23 Sep, 2022

CART( Classification And Regression Tree) is a variation of the decision tree algorithm. It

can handle both classification and regression tasks. Scikit-Learn uses the Classification

And Regression Tree (CART) algorithm to train Decision Trees (also called “growing”

trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and

Charles Stone in 1984.

CART Algorithm

CART is a predictive algorithm used in Machine learning and it explains how the target

variable’s values can be predicted based on other matters. It is a decision tree where each

fork is split into a predictor variable and each node has a prediction for the target variable

at the end.

In the decision tree, nodes are split into sub-nodes on the basis of a threshold value of an

attribute. The root node is taken as the training set and is split into two by considering the

best attribute and threshold value. Fur ther, the subsets are also split using the same logic.

This continues till the last pure sub-set is found in the tree or the maximum number of

leaves possible in that growing tree.

The CART algorithm works via the following process:

StartAdYour
closed by
Subscription Now


Start Your Coding Journey Now! Login Register
The best split point of each input is obtained.

Based on the best split points of each input in Step 1, the new “best ” split point is

identified.

Split the chosen input according to the “best ” split point.

Continue splitting until a stopping rule is satisfied or no fur ther desirable splitting is

available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by

searching for the best homogeneity for the sub nodes, with the help of the Gini index

criterion.

Gini index/Gini impurity

The Gini index is a metric for the classification tasks in CART. It stores the sum of squared

probabilities of each class. It computes the degree of probability of a specific variable that

is wrongly being classified when chosen randomly and a variation of the Gini coefficient. It

works on categorical variables, provides outcomes either “successful” or “failure” and

hence conducts binar y splitting only.

The degree of the Gini index varies from 0 to 1,

Where 0 depicts that all the elements are allied to a cer tain class, or only one class

exists there.

The Gini index of value 1 signifies that all the elements are randomly distributed across

various classes, and

A value of 0.5 denotes the elements are uniformly distributed into some classes.

Start Your Coding Journey Now!
Mathematically, we can write Gini Impurity as follows:

Login Register

where pi is the probability of an object being classified to a par ticular class.

Classification tree

A classification tree is an algorithm where the target variable is categorical. The algorithm

is then used to identif y the “Class” within which the target variable is most likely to fall.

Classification trees are used when the dataset needs to be split into classes that belong to

the response variable(like yes or no)

Regression tree

A Regression tree is an algorithm where the target variable is continuous and the tree is

used to predict its value. Regression trees are used when the response variable is

continuous. For example, if the response variable is the temperature of the day.

Pseudo-code of the CART algorithm

d = 0, endtree = 0
Note(0) = 1, Node(1) = 0, Node(2) = 0
while endtree < 1
if Node(2d-1) + Node(2d) + .... + Node(2d+1-2) = 2 - 2d+1
endtree = 1
else
do i = 2d-1, 2d, .... , 2d+1-2
if Node(i) > -1
Split tree
else
Node(2i+1) = -1
Node(2i+2) = -1
end if
end do
end if
d = d + 1
end while

CART model representation

CART models are formed by picking input variables and evaluating split points on those

variables until an appropriate tree is produced.


Start Your Coding Journey Now!
Steps to create a Decision Tree using the CART algorithm:

Login Register
Greedy algorithm: In this The input space is divided using the Greedy method which is

known as a recursive binar y spitting. This is a numerical method within which all of the

values are aligned and several other split points are tried and assessed using a cost

function.

Stopping Criterion: As it works its way down the tree with the training data, the

recursive binar y splitting method described above must know when to stop splitting.

The most frequent halting method is to utilize a minimum amount of training data

allocated to ever y leaf node. If the count is smaller than the specified threshold, the

split is rejected and also the node is considered the last leaf node.

Tree pruning : Decision tree’s complexity is defined as the number of splits in the tree.

Trees with fewer branches are recommended as they are simple to grasp and less

prone to cluster the data. Working through each leaf node in the tree and evaluating the

effect of deleting it using a hold-out test set is the quickest and simplest pruning

approach.

Data preparation for the CART: No special data preparation is required for the CART

algorithm.

Advantages of CART

Results are simplistic.

Classification and regression trees are Nonparametric and Nonlinear. Sale Ends Soon!
DSA Data Structures Algorithms Write & Earn Interview Preparation Topic-wise Practice C++
Classification and regression trees implicitly per form feature selection.

Outliers have no meaningful effect on CART.

It requires minimal super vision and produces easy-to-understand models.

Limitations of CART

Over fitting.

High Variance.

low bias.

the tree structure may be unstable.

Applications of the CART algorithm

For quick Data insights.

In Blood Donors Classification.

For environmental and ecological data.

In the financial sectors.

You might also like