0% found this document useful (0 votes)
22 views

Feature Selection Method in Decision Tree Induction

The document discusses different feature selection methods that can be used in decision tree induction, including information gain, gain ratio, Gini index, chi-square test, wrapper methods, and embedded methods. Information gain and gain ratio are splitting criteria that measure reduction in entropy. Gini index measures data impurity. Chi-square evaluates independence between variables. Wrapper methods use a learning algorithm to evaluate feature subsets. Embedded methods perform selection during model construction.

Uploaded by

soma7513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Feature Selection Method in Decision Tree Induction

The document discusses different feature selection methods that can be used in decision tree induction, including information gain, gain ratio, Gini index, chi-square test, wrapper methods, and embedded methods. Information gain and gain ratio are splitting criteria that measure reduction in entropy. Gini index measures data impurity. Chi-square evaluates independence between variables. Wrapper methods use a learning algorithm to evaluate feature subsets. Embedded methods perform selection during model construction.

Uploaded by

soma7513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

FEATURE SELECTION

METHOD IN DECISION
TREE INDUCTION
71762351023
NITHISH KUMAR R
2

WHAT IS FSD ?
Feature selection is the process of choosing a subset of
relevant features (variables, attributes) from the original set
of features in a dataset. The goal of feature selection is to
improve model performance by reducing dimensionality,
decreasing computational cost, enhancing interpretability,
and mitigating the risk of overfitting.
INFORMATION GAIN (IG):
Information gain measures the reduction in entropy or
impurity achieved by splitting the data on a particular
feature.
Features with higher information gain are considered more
relevant for splitting.
Decision trees typically use information gain (or related
metrics like gain ratio) as the criterion for selecting the best
split at each node.
GAIN RATIO:
Gain ratio is a modification of information gain that penalizes attributes
with a large number of distinct values. It helps to avoid biases towards
attributes with many categories.

It is calculated as the ratio of information gain to the intrinsic information,


which measures the potential of an attribute to split the data into classes.
5
GINI INDEX:
• Gini index measures the impurity of a set of examples. It is computed as the sum of
the probabilities of each item being chosen times the probability of a mistake in
categorizing that item.
• Lower Gini index values indicate better splits. Decision trees often use Gini index as
an alternative splitting criterion.

CHI-SQUARE (Χ²) TEST:


Chi-square test evaluates the independence between variables by comparing observed
frequencies to expected frequencies.

In the context of decision trees, it assesses whether the distribution of the target
variable differs significantly in different subsets of the data defined by each attribute.

Features with higher chi-square statistics (indicating greater association with the target
variable) are selected.
6
WRAPPER METHODS:
Wrapper methods use a specific learning algorithm (e.g.,
decision tree) as a black box to evaluate the usefulness of
feature subsets.

They employ a search algorithm to select the optimal feature


subset based on the performance of the learning algorithm.

Examples include forward selection, backward elimination,


and recursive feature elimination.
7

EMBEDDED METHODS:
Embedded methods incorporate feature selection
within the model construction process. They select
features during the training phase of the learning
algorithm.

Decision trees often have built-in mechanisms to


evaluate feature importance during training, such as
pruning techniques that remove less important
features or regularization methods.

You might also like