0% found this document useful (0 votes)

20 views3 pages

Feature Importances With A Forest of Trees - Scikit-Learn 1.2.2 Documentation

This document describes and demonstrates how to use a random forest classifier to evaluate feature importances on an artificial classification task. It shows that the three informative features have higher importances compared to the non-informative features. It also compares two methods for computing feature importances - based on mean decrease in impurity and based on feature permutation - and finds that they identify the same important features, though with some differences in relative scores.

Uploaded by

Đặng Thị Tường Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views3 pages

Feature Importances With A Forest of Trees - Scikit-Learn 1.2.2 Documentation

Uploaded by

Đặng Thị Tường Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Feature importances with a forest of trees — scikit-learn 1.2.2 documentation https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.

html

Note: Click here to download the full example code or to run this example in your browser via Binder

Feature importances with a forest of trees

This example shows the use of a forest of trees to evaluate the importance of features on an artificial classification task. The blue bars
are the feature importances of the forest, along with their inter-trees variability represented by the error bars.

As expected, the plot suggests that 3 features are informative, while the remaining are not.

import matplotlib.pyplot as plt

Data generation and model fitting

We generate a synthetic dataset with only 3 informative features. We will explicitly not shuffle the dataset to ensure that the informative
features will correspond to the three first columns of X. In addition, we will split our dataset into training and testing subsets.

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

X, y = make_classification(
n_samples=1000,
n_features=10,
n_informative=3,
n_redundant=0,
n_repeated=0,
n_classes=2,
random_state=0,
shuffle=False,
)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

A random forest classifier will be fitted to compute the feature importances.

from sklearn.ensemble import RandomForestClassifier

feature_names = [f"feature {i}" for i in range(X.shape[1])]

forest = RandomForestClassifier(random_state=0)
forest.fit(X_train, y_train)

▾ RandomForestClassifier
RandomForestClassifier(random_state=0)

Feature importance based on mean decrease in impurity

Feature importances are provided by the fitted attribute feature_importances_ and they are computed as the mean and standard devi-
ation of accumulation of the impurity decrease within each tree.

Warning: Impurity-based feature importances can be misleading for high cardinality features (many unique values). See
Permutation feature importance as an alternative below.

import time
import numpy as np

start_time = time.time()
importances = forest.feature_importances_
std = np.std([tree.feature_importances_ for tree in forest.estimators_], axis=0)
elapsed_time = time.time() - start_time

print(f"Elapsed time to compute the importances: {elapsed_time:.3f} seconds")

Out: Elapsed time to compute the importances: 0.007 seconds

Toggle Menu
Let’s plot the impurity-based importance.

1 of 3 16/05/2023, 15:08
Feature importances with a forest of trees — scikit-learn 1.2.2 documentation https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

import pandas as pd

forest_importances = pd.Series(importances, index=feature_names)

fig, ax = plt.subplots()
forest_importances.plot.bar(yerr=std, ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in impurity")
fig.tight_layout()

We observe that, as expected, the three first features are found important.

Feature importance based on feature permutation

Permutation feature importance overcomes limitations of the impurity-based feature importance: they do not have a bias toward high-
cardinality features and can be computed on a left-out test set.

from sklearn.inspection import permutation_importance

start_time = time.time()
result = permutation_importance(
forest, X_test, y_test, n_repeats=10, random_state=42, n_jobs=2
)
elapsed_time = time.time() - start_time
print(f"Elapsed time to compute the importances: {elapsed_time:.3f} seconds")

forest_importances = pd.Series(result.importances_mean, index=feature_names)

Out: Elapsed time to compute the importances: 0.640 seconds

The computation for full permutation importance is more costly. Features are shuffled n times and the model refitted to estimate the im-
portance of it. Please see Permutation feature importance for more details. We can now plot the importance ranking.

fig, ax = plt.subplots()
forest_importances.plot.bar(yerr=result.importances_std, ax=ax)
ax.set_title("Feature importances using permutation on full model")
ax.set_ylabel("Mean accuracy decrease")
fig.tight_layout()
plt.show()

Toggle Menu

2 of 3 16/05/2023, 15:08
Feature importances with a forest of trees — scikit-learn 1.2.2 documentation https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

The same features are detected as most important using both methods. Although the relative importances vary. As seen on the plots,
MDI is less likely than permutation importance to fully omit a feature.

Total running time of the script: ( 0 minutes 1.063 seconds)

launch binder

Download Python source code: plot_forest_importances.py

Download Jupyter notebook: plot_forest_importances.ipynb

Gallery generated by Sphinx-Gallery

Toggle Menu

3 of 3 16/05/2023, 15:08

Scaling Factors and Scaling Parameters
75% (4)
Scaling Factors and Scaling Parameters
22 pages
En-12697-22-2003-Bituminous Mixtures Test Methods For Hot Mix Asphalt Part 22 Wheel Tracking
No ratings yet
En-12697-22-2003-Bituminous Mixtures Test Methods For Hot Mix Asphalt Part 22 Wheel Tracking
32 pages
EGPCL-NPL-PEL-KEC-PPL-RPT-00007 Wall Thickness Calculation Report C01
No ratings yet
EGPCL-NPL-PEL-KEC-PPL-RPT-00007 Wall Thickness Calculation Report C01
13 pages
Mechanisms in Modern Engineering Design PDF
100% (3)
Mechanisms in Modern Engineering Design PDF
618 pages
Aumr + Cadx-A Series: Split Air Conditioners
No ratings yet
Aumr + Cadx-A Series: Split Air Conditioners
24 pages
PCI PTS POI - SRED v4.x
No ratings yet
PCI PTS POI - SRED v4.x
51 pages
Axial Piston Variable Pump A4VG Series 32: Europe
No ratings yet
Axial Piston Variable Pump A4VG Series 32: Europe
94 pages
Yamada Diaphragm Pump 80 Series Manual
No ratings yet
Yamada Diaphragm Pump 80 Series Manual
18 pages
Electrical Performance Testing of AC Motors
No ratings yet
Electrical Performance Testing of AC Motors
3 pages
Caleb M. Lemmons: Research & Development Summer Internship
No ratings yet
Caleb M. Lemmons: Research & Development Summer Internship
2 pages
Data Migration in Fiori
No ratings yet
Data Migration in Fiori
22 pages
Cmcp700s-Cvt Manual v1.1
No ratings yet
Cmcp700s-Cvt Manual v1.1
8 pages
Soil Nailing For Failed Slope Stabilization On Hilly Terrain
No ratings yet
Soil Nailing For Failed Slope Stabilization On Hilly Terrain
7 pages
Effective Supply Chain Management
No ratings yet
Effective Supply Chain Management
20 pages
ADA Flanger Manual
No ratings yet
ADA Flanger Manual
11 pages
ICT 9 7.2 Design
No ratings yet
ICT 9 7.2 Design
70 pages
Bauer New Filling Valves
No ratings yet
Bauer New Filling Valves
4 pages
CQF Brochure
No ratings yet
CQF Brochure
24 pages
K Agitation
No ratings yet
K Agitation
6 pages
Dissertation Final Lusungu Munthali
No ratings yet
Dissertation Final Lusungu Munthali
48 pages
New Low Rank Optimization Model and Convex Approach For Robust Spectral Compressed Sensing
No ratings yet
New Low Rank Optimization Model and Convex Approach For Robust Spectral Compressed Sensing
13 pages
ITAT Efiling Portal Guidelines and FAQs - 0
No ratings yet
ITAT Efiling Portal Guidelines and FAQs - 0
2 pages
1 s2.0 S0306261924004148 Main
No ratings yet
1 s2.0 S0306261924004148 Main
20 pages
Colgate OpenCore ComputerVision
No ratings yet
Colgate OpenCore ComputerVision
8 pages
K-Means and K-NN Methods For Determining Student Interest
No ratings yet
K-Means and K-NN Methods For Determining Student Interest
13 pages
Introduction To Finite Element Analysis and Its Application Using ANSYS
No ratings yet
Introduction To Finite Element Analysis and Its Application Using ANSYS
2 pages
Homecharger: Type 1 Plug Type 2 Plug Type 2 Socket
No ratings yet
Homecharger: Type 1 Plug Type 2 Plug Type 2 Socket
2 pages
Unit 11: Travel Planning
No ratings yet
Unit 11: Travel Planning
6 pages
'402735339 Application Form 2024
No ratings yet
'402735339 Application Form 2024
1 page
Sika Antisol - 90
No ratings yet
Sika Antisol - 90
2 pages
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Feature Importances With A Forest of Trees - Scikit-Learn 1.2.2 Documentation

Uploaded by

Feature Importances With A Forest of Trees - Scikit-Learn 1.2.2 Documentation

Uploaded by

Feature importances with a forest of trees — scikit-learn 1.2.2 documentation https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.

Feature importances with a forest of trees

import matplotlib.pyplot as plt

Data generation and model fitting

from sklearn.datasets import make_classification

A random forest classifier will be fitted to compute the feature importances.

from sklearn.ensemble import RandomForestClassifier

feature_names = [f"feature {i}" for i in range(X.shape[1])]

Feature importance based on mean decrease in impurity

print(f"Elapsed time to compute the importances: {elapsed_time:.3f} seconds")

Out: Elapsed time to compute the importances: 0.007 seconds

forest_importances = pd.Series(importances, index=feature_names)

Feature importance based on feature permutation

from sklearn.inspection import permutation_importance

forest_importances = pd.Series(result.importances_mean, index=feature_names)

Out: Elapsed time to compute the importances: 0.640 seconds

Total running time of the script: ( 0 minutes 1.063 seconds)

Download Python source code: plot_forest_importances.py

Download Jupyter notebook: plot_forest_importances.ipynb

Gallery generated by Sphinx-Gallery

You might also like