100% found this document useful (1 vote)
349 views41 pages

A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science

The document discusses 21 different methods and packages for assessing feature importance in machine learning models. It covers interpretability packages like SHAP and LIME as well as feature selection methods categorized as filter, wrapper, and embedded. It demonstrates the techniques on a heart disease dataset and encodes categorical features.

Uploaded by

Nadhiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
349 views41 pages

A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science

The document discusses 21 different methods and packages for assessing feature importance in machine learning models. It covers interpretability packages like SHAP and LIME as well as feature selection methods categorized as filter, wrapper, and embedded. It demonstrates the techniques on a heart disease dataset and encodes categorical features.

Uploaded by

Nadhiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Member-only story

A Guide to 21 Feature Importance Methods and


Packages in Machine Learning (with Code)
From the OmniXAI, Shapash, and Dalex interpretability packages to the Boruta,
Relief, and Random Forest feature selection algorithms

Theophano Mitsa · Follow


Published in Towards Data Science
19 min read · Dec 19, 2023

Listen Share More

Open in app

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 1/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Image created by the author at DALL-E

“We are our choices.” —Jean-Paul Sartre

We live in the era of artificial intelligence, mostly because of the


incredible advancement of Large Language Models (LLMs). As
important as it is for an ML engineer to learn about these new technologies, equally
important is his/her ability to master the fundamental concepts of model selection,
optimization, and deployment. Something else is very important: the input to the
above, which consists of the data features. Data, like people, have characteristics
called features. In the case of people, you must understand their unique
characteristics to bring out the best in them. Well, the same principle applies to

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 2/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

data. Specifically, this article is about feature importance, which measures the
contribution of a feature to the predictive ability of a model. We have to understand
feature importance for many essential reasons:

Time: Having too many features slows down the training model time and also
model deployment. The latter is particularly important in edge applications
(mobile, sensors, medical diagnostics).

Overfitting. If our features are not carefully selected, we might make our model
overfit, i.e., learn about noise, too.

Curse of dimensionality. Many features mean many dimensions, and that makes
data analysis exponentially more difficult. For example, k-NN classification, a
widely used algorithm, is greatly affected by dimension increase.

Adaptability and transfer learning. This is my favorite reason and actually the
reason for writing this article. In transfer learning, a model trained in one task can
be used in a second task with some finetuning. Having a good understanding of
your features in the first and second tasks can greatly reduce the fine-tuning you
need to do.

We will focus on tabular data and discuss twenty-one ways to assess feature
importance. One might wonder: ‘Why twenty-one techniques? Isn’t one enough?’ It
is important to discuss all twenty-one techniques because each one has unique
characteristics that are very worthwhile learning about. Specifically, there are two
ways I will indicate in the article why a particular technique is worthwhile learning
about (a) Sections titled: “Why this is important” and (b) Highlighting the word
unique, to indicate that I am talking about a special and unique characteristic.

The techniques we will discuss come from two distinct areas of machine learning:
interpretability and feature selection. Specifically, we will discuss the following:

Interpretability Python packages. These libraries help to make a model’s decision-


making process more transparent by providing insights into how input features
affect the model’s predictions. We will discuss the following: OmniXAI, Shapash,
DALEX, InterpretML, and ELI5.

Feature selection methods. These methods focus on reducing the model’s features
by identifying the most informative features, and they generally fall into the filter,

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 3/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

embedded, and wrapper categories. The characteristics of each category will be


discussed in the next section. From each category, we will discuss the following:

Wrapper methods: Recursive Feature Elimination, Sequential Feature Selection,


Boruta algorithm.

Embedded methods: Logistic Regression, RandomForest, LightGBM, CatBoost,


XGBoost, SelectFromModel

Filter methods: Mutual information, MRMR algorithm, SelectKBest, Relief


algorithm.

Other: Featurewiz package, Selective package, PyImpetus package.

Data
To demonstrate the above feature-importance-computation techniques, we will use
tabular data related to heart failure prediction from Kaggle:
https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction/data

The dataset has 918 rows and 12 columns corresponding to the following features:

‘Age,’ ‘Sex’(M, F), ‘ChestPainType’ (TA, ATA, NAP, ASY), ‘RestingBP,’

‘Cholesterol’, ‘FastingBS’[(0,1), ‘RestingECG’(Normal,ST, LVH), ‘MaxHR’,

‘ExerciseAngina’(Y,N), ‘Oldpeak’, ‘ST_Slope’ (Up, Flat, Down),

‘HeartDisease’(0,1). This is the target variable. 0 indicates the absence of heart


disease, and 1 indicates the presence.

The dataset has no missing values, and the target variable is relatively balanced with
410 ‘0‘ ’instances and 508 ‘1’ instances. Five of the features are categorical: ‘Sex’,
‘ChestPainType,’ ‘ExerciseAngina,’ ‘RestingECG,’ ‘ST_Slope.’ These features are
encoded with the one-hot-encoding Pandas method:

1 import pandas as pd
2 heart2 = pd.get_dummies(heart, columns=['Sex', 'ChestPainType',
3 'RestingECG','ExerciseAngina',
4 'ST_Slope'])

gistfile1.txt hosted with ❤ by GitHub view raw

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 4/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Then, the data is split into training and test sets. Finally, scikit-learn’s StandardScaler
is applied to the numerical data of the train and test data sets. Now, we are ready to
proceed to feature importance assessment.

Feature Importance Assessment

A. Interpretability Packages
Why they are important
In recent years, the interpretability of machine learning algorithms has attracted
significant attention. Machine learning algorithms have recently found use in many
areas, such as finance, medicine, environmental modeling, etc. This broad use of
ML algorithms by people who are not necessarily ML experts begs for more
transparency because:

Trust issues. Black boxes make people nervous and unsure as to whether they
should trust them.

Regulatory and ethical concerns. Governments around the world are


increasingly concerned about AI use and passing legislation to ensure that AI
systems make their decisions in a fair way without any biases. Understanding
how ML systems work under the hood is an important prerequisite to fair and
unbiased AI.

Interpretability packages are based on the model-independent interpretation


frameworks SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable
Model-agnostic Explanations) [2]. SHAP uses a game theory approach and provides
global explanations. On the other hand, LIME provides local explanations. They
offer transparency through the idea of “explainers.” An explainer is a wrapper-type
of object; it can wrap around a model and provide a door to the internal intricacies
of the model.

A.1 Shapash Package


Shapash [1] is such an interpretability package. Below, we see the implementation of
Shapash’s ‘SmartExplainer’ that takes as input an ML model of type
RandomForestClassifier, and the feature names. Then, Shapash’s ‘compile’ function is
invoked, which is the ‘workhorse’ of the whole process: (a) it binds the model to the
data, (b) computes the feature importances, and (c ) prepares the data for
visualization. Finally, we invoke the interactive web application.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 5/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 from shapash.explainer.smart_explainer import SmartExplainer


2 from sklearn.ensemble import RandomForestClassifier
3 model2 = RandomForestClassifier(n_estimators=200, criterion='entropy', max_features=
4 'sqrt',random_state=12,max_depth=5)
5 sw=model2.fit(X_train,y_train)
6 model2_predicted = model2.predict(X_test)
7 heart_features_dict={'Age': "patient age",
8 'Sex_F': "Sex F",
9 'Sex_M': 'Sex M',
10 'ChestPainType_ASY': 'chest_pain_type ASY',
11 'ChestPainType_ATA': 'chest_pain_type ATA',
12 'ChestPainType_NAP': 'chest_pain_type NAP',
13 'ChestPainType_TA' : 'chest_pain_typeTA',
14 'RestingBP': 'resting blood pressure',
15 'Cholesterol':'cholesterol',
16 'FastingBS': 'fasting blood pressure',
17 'RestingECG_LVH':'EKG LVH',
18 'RestingECG_Normal':'EKG Normal',
19 'RestingECG_ST': 'EKG ST',
20 'MaxHR': 'max heart rate',
21 'ExerciseAngina_N':'angina N',
22 'ExerciseAngina_Y': 'angina Y',
23 'Oldpeak':'oldpeak',
24 'ST_Slope_Down':'ST_segment Down',
25 'ST_Slope_Up': 'St Slope up',
26 'ST_Slope_Flat': 'St Slope Flat'
27
28 }
29 xpl = SmartExplainer(model=sw, features_dict=heart_features_dict)
30 xpl.compile(x=X_test)
31 app = xpl.run_app(title_story='Heart')

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 1, from Shapash’s application interface, shows the feature importances. The
length of the horizontal bar corresponds to the importance of the feature. Thus,
‘ST_Slope_up’ is the most important feature, and ‘chest_pain_typeTA’ is the least
important.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 6/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 1. Feature importances

Figure 2 shows an important type of plot provided by Shapash, the feature


contribution. The feature examined in the plot is ‘ST_Slope_up’. The upper part
contains cases where the contribution of ST_Slope_up is positive, whereas the
bottom part contains cases where the contribution of ‘ST_Slope_up’ is negative.
Also, the upper graph part corresponds to cases where ‘ST_Slope_up’ is 0, and the
bottom corresponds to cases where ‘ST_Slope_up’ is 1. When we click on one of the
circles in the middle of the displayed structures, the following information is
shown: the case number, the ‘ST_Slope_up’ value, predicted class, and contribution
of ‘ST_Slope_up’.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 7/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 2. Feature contributions

Figure 3 shows the local explanations for slice 131, where the predicted class is 1
with probability of 0.8416. Bars to the right show a positive contribution, and bars to
the left show a negative contribution to the result. ‘St_Slope_up’ has the highest
positive contribution, while ‘max_heart_rate’ has the highest negative contribution.

Figure 3. Local explanations

In summary, Shapash is a very useful package to know because (a) it offers a great
interface where the user can gain a deep understanding of global and local
explanations, (b) it offers the unique feature of displaying feature contributions
across cases

A.2. OMNIXAI Package


OMNIXAI [(Open-source eXplainable AI) [3], like Shapash, also offers visualization
tools, but its unique strength lies in the significant breadth of its explanation
techniques. Specifically, it offers methods to explain predictions for various data
types, i.e., tabular data, text, and images. Some of its unique features are (a) the
NLPExplainer, (b) the bias examination module, (c) Morris sensitivity analysis for
tabular data, (d) the VisionExplainer for image classification, and (e) counterfactual
Explainers.

The code below shows the creation of an OMNIXAI explainer. The essential steps
are (a) the creation of an OMNIXAI-specific data type (‘Tabular’) to hold the data, (b)
Data pre-processing through the ‘TabularTransform,’ (c) data splitting into training
https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 8/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

and test sets, (d) training of an XGBClassifier model (e) data inversion back to their
original format (f) setting up a ‘TabularExplainer’ of the XGBClassifier with both
SHAP and LIME methods. The explanation will be applied to ‘test_instances’ [130–
135] (g) generation and display of the predictions

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85a… 9/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 from omnixai.data.tabular import Tabular


2 from omnixai.preprocessing.tabular import TabularTransform
3 from omnixai.explainers.tabular import TabularExplainer
4 from omnixai.visualization.dashboard import Dashboard
5 cc=(categorical_data.columns.values).tolist()
6 Xomni=heart.drop("HeartDisease",axis=1)
7 column_headers_orig = list(Xomni.columns.values)
8 tabular_data = Tabular(
9 data=heart,
10 categorical_columns=cc,
11 target_column='HeartDisease'
12 )
13 np.random.seed(1)
14 transformer = TabularTransform().fit(tabular_data)
15 class_names = transformer.class_names
16 x = transformer.transform(tabular_data)
17 train, test, train_labels, test_labels = \
18 sklearn.model_selection.train_test_split(x[:, :-1], x[:, -1], train_size=0.80)
19
20 import xgboost as xgb
21 from xgboost import XGBClassifier
22 gbtree = xgb.XGBClassifier(n_estimators=300, max_depth=5)
23 gbtree.fit(train, train_labels)
24
25 # Convert the transformed data back to Tabular instances
26 train_data = transformer.invert(train)
27 test_data = transformer.invert(test)
28 preprocess = lambda z: transformer.transform(z)
29 explainers = TabularExplainer(
30 explainers=["lime", "shap"],
31 mode="classification",
32 data=train_data,
33 model=gbtree,
34 preprocess=preprocess,
35 params={
36 "lime": {"kernel_width": 3},
37 "shap": {"nsamples": 100},
38 }
39 )
40 test_instances = test_data[130:135]
41 local_explanations = explainers.explain(X=test_instances)
42
43 index=1
44 print("LIME results:")
45 local_explanations["lime"].ipython_plot(index, class_names=class_names)
46 print("SHAP results:")
47 local_explanations["shap"].ipython_plot(index, class_names=class_names)

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 10/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …
gistfile1 txt hosted with ❤ by GitHub view raw
Figure 4 shows the aggregate local explanation for slices between [130:135] using
LIME. The green bars in the right part show a positive contribution to label class 1,
whereas the red bars in the left part show negative contributions to class 1. The
longer the bar, the more significant the contribution.

Figure 4. LIME explanations

Figure 5 shows the aggregate local explanations for slices [130:135] using SHAP. The
meaning of the green/red bars is the same as in the above graph.

Figure 5. SHAP explanations

A.3. The InterpetML Package

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 11/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

The InterpretML XAI interpretability [4] package has the unique feature of ‘glassbox
models,’ which are inherently explainable models.

The implementation of such an inherently explainable model, the


‘ExplainableBoostingClassifier,’ is shown in the code snippet below. Global
explanations and local explanations at slice 43 are also set up.

1 from interpret.glassbox import ExplainableBoostingClassifier


2 from interpret import show
3 ebm = ExplainableBoostingClassifier(random_state=42)
4 ebm.fit(X_train, y_train)
5 ebm_global = ebm.explain_global(name='EBM')
6 show(ebm_global)
7 ebm_local = ebm.explain_local(X_test[:43], y_test[:43], name='EBM')
8 show(ebm_local)

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 6 shows the computed global feature importances.

Figure 6. Global feature importances

Figure 7 shows the computed local explanations for slice at 43. Most features
contribute positively to the prediction of class 1, while only ‘Cholesterol’ and
‘FastingBS’ contribute negatively.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 12/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 7. Local explanations

A. 4 The Dalex Package


The Dalex package[5] is a library designed to explain and understand machine
learning models. Dalex stands for “Descriptive mAchine Learning EXplanations.” It
has the following unique characteristics:

It is compatible with both R and Python.

The Aspects module. This allows us to explain a model taking into account
feature inter-dependencies.

The Fairness module. It allows us to evaluate the fairness of a model.

The code snippet below shows the implementation of Dalex’s ‘Explainer.’

1 import dalex as dx
2 model_dx = LogisticRegression(C=0.25,max_iter=50)
3 heart_exp = dx.Explainer(model_dx, X_train, y_train,
4 label = "Heart Logistic Pipeline")
5 s=model_dx.fit(X_train, y_train)
6
7 mp_rf = heart_exp.model_parts()
8 mp_rf.result
9 mp_rf.plot()

gistfile1.txt hosted with ❤ by GitHub view raw

The feature importances produced by Dalex are shown below in Figure 8.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 13/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 8. Feature importances

A.5 The Eli5 package


The final interpretability package we will discuss is Eli5 [5]. It has the following
unique features:

The permutation importance measure. In this technique, the values of each


feature are randomly shuffled, and then the resulting drop in model
performance is measured. The bigger the drop, the more important the feature.

It works with text data. Specifically, it provides a ‘TextExplainer’ that can explain
predictions of text classifiers.

It is compatible with Keras.

In the code snippet below, the ‘PermutationImportance’ method is applied to the


Support Vector Classification (‘svc’) estimator.

1 import eli5
2 from eli5.sklearn import PermutationImportance
3 from sklearn.svm import SVC
4 perm = PermutationImportance(SVC(), cv=5)
5 perm.fit(X_train, y_train)
6 importances=perm.feature_importances_

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 9 shows the computed feature importances for the ‘svc’ estimator.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 14/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 9.

B. Feature Selection Techniques


Wrapper Methods
As the name suggests, these algorithms wrap the feature selection process around a
machine learning algorithm. They continuously evaluate subsets of features until
they find the subset that yields the best performance according to a criterion. This
criterion can be one of the following: model accuracy, number of subset features,
information gain, etc.

Why they are important


The very nature of these algorithms (criterion optimization, comprehensive search)
suggests that these methods can have very good performance in terms of selecting
the best features. Another very useful characteristic of them is that they consider
feature interactions. However, again, their very nature suggests that they can be
computationally intensive and might overfit. So, if you do not have computational
limitations and accuracy is essential, these are a good choice.
B.1. Sequential Feature Selection
Sequential Feature Selection (SFS) evaluates feature subsets in two modes: forward
selection, which starts with no features and adds them iteratively, and backward
elimination, which starts with all features and removes them one by one.

The code snippet below shows the implementation of SFS wrapped around a
‘KNeighborsClassifier’ model. It also shows how to output the selected features and
their names.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 15/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 from sklearn.feature_selection import SequentialFeatureSelector


2 from sklearn.neighbors import KNeighborsClassifier
3 knn = KNeighborsClassifier(n_neighbors=3)
4 SFS = SequentialFeatureSelector(knn, direction='forward',cv=2,n_features_to_select="aut
5 SFS.fit(X_train, y_train)
6 SFSDF = pd.DataFrame(columns = ['FeatureName', 'Filter'])
7 SFSDF['FeatureName'] = X.columns.values
8 SFSDF['Filter'] = SFS.get_support().tolist()
9 SFSDF_n = SFSDF[SFSDF['Filter']==True]
10 SFS_top_features = SFSDF_n['FeatureName'].tolist()
11 print(SFS_top_features)

gistfile1.txt hosted with ❤ by GitHub view raw

The selected features are:

1 ['Cholesterol', 'FastingBS', 'MaxHR', 'Sex_F', 'Sex_M',


2 'ChestPainType_ASY', 'RestingECG_ST', 'ST_Slope_Down',
3 'ST_Slope_Flat', 'ST_Slope_Up']

gistfile1.txt hosted with ❤ by GitHub view raw

B.2 The Boruta Algorithm


Boruta is one of the most effective feature selection algorithms. Most impressively, it
does not require any input from the user [7]! It is based on the brilliant idea of
‘shadow features’ (randomized duplicates of all original features). Then, a random
forest classifier is applied to assess the importance of each real feature against these
shadow features. The process is repeated until all important features are identified.

The snippet below shows the implementation of Boruta using the BorutaPy package
and the selected features.

1 from boruta import BorutaPy


2 model2 = RandomForestClassifier(n_estimators=500, criterion='gini', max_features=
3 'sqrt',random_state=12,max_depth=10)
4 boruta = BorutaPy(estimator = model2, n_estimators = 'auto',max_iter = 50,)
5 boruta.fit(np.array(X_train), np.ravel(y_train))
6 green_area = X.columns[boruta.support_].to_list()
7 print('Selected Features:', green_area)

gistfile1.txt hosted with ❤ by GitHub view raw

The selected features from Boruta are:

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 16/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 Selected Features: ['Age', 'Cholesterol', 'MaxHR', 'Oldpeak',


2 'ChestPainType_ASY', 'ExerciseAngina_N', 'ST_Slope_Flat',
3 'ST_Slope_Up']

gistfile1.txt hosted with ❤ by GitHub view raw

B.3 The RFECV Algorithm


RFECV (Recursive Feature Elimination with Cross-Validation) is a feature selection
technique that iteratively removes the least important features from a model, using
cross-validation to find the best subset of features. The code implementation is
shown in the snippet below.

1 from sklearn.feature_selection import RFECV


2 select_engine = RFECV(model1, step=1, cv=5)
3 select_engine.fit(X_train, y_train)
4 mask = select_engine.get_support()
5 features=np.array(column_headers)
6 print("Selected Features: ", sel_features.shape[0])
7 sel_features = features[mask]
8 print(sel_features)

gistfile1.txt hosted with ❤ by GitHub view raw

The selected features are:

1 Selected Features: 15
2 ['Cholesterol' 'FastingBS' 'MaxHR' 'Sex_F' 'Sex_M'
3 'ChestPainType_ASY''ChestPainType_ATA' 'ChestPainType_NAP'
4 'ChestPainType_TA' 'RestingECG_LVH' 'ExerciseAngina_N'
5 'ExerciseAngina_Y' 'ST_Slope_Down'
6 'ST_Slope_Flat' 'ST_Slope_Up']

gistfile1.txt hosted with ❤ by GitHub view raw

Embedded Methods
These refer to algorithms that have the built-in ability to compute feature
importances or select features, such as Random Forest and lasso regression,
respectively. An important note for these methods is that they do not directly select
features. Instead, they compute feature importances, which can be used in a post
hoc process to choose features. Such a post hoc process is ‘SelectFromModel’
discussed in section B.9.

Why they are important

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 17/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

High-dimensional data are very common today in the form of unstructured text,
images, and time series, especially in the fields of bioinformatics, environment
monitoring, and finance. The greatest advantage of embedded methods is their
ability to handle high-dimensional data. The reason for this ability is that they do
not have separate modeling and feature selection steps. Feature selection and
modeling are combined in one single step, which leads to a significant speed-up.
B.4 Logistic Regression
Logistic Regression is a statistical method used for binary classification. The
coefficients of the model relate to the importance of features. Each weight indicates
the direction (positive or negative) and the strength of feature’s effect on the log
odds of the target variable. A larger absolute value of a weight indicates that the
corresponding feature is more important in predicting the outcome. The code
snippet below shows the creation of the logistic regression. The hyper-parameters
‘C’ (regularization strength) and ‘max_iter’ are learned by applying scikit-learn’s
‘GridSearchCV.’

1 from sklearn.linear_model import LogisticRegression


2 model1 = LogisticRegression(C=0.25,max_iter=50)
3 s=model1.fit(X_train, y_train)

gistfile1.txt hosted with ❤ by GitHub view raw

The logistic regression coefficients are shown below.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 18/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1
2 coef
3 ChestPainType_ASY 1.091887
4 FastingBS 0.863633
5 ST_Slope_Flat 0.829780
6 Sex_M 0.463849
7 ExerciseAngina_Y 0.382341
8 Oldpeak 0.294517
9 ST_Slope_Down 0.277960
10 RestingECG_LVH 0.207710
11 Age 0.027053
12 RestingBP 0.000638
13 RestingECG_ST -0.097332
14 RestingECG_Normal -0.110399
15 ChestPainType_TA -0.193232
16 MaxHR -0.298876
17 ExerciseAngina_N -0.382362
18 ChestPainType_ATA -0.388795
19 Sex_F -0.463869
20 Cholesterol -0.475150
21 ChestPainType_NAP -0.509880
22 ST_Slope_Up -1.107760

gistfile1.txt hosted with ❤ by GitHub view raw

B.5 Random Forest


Random Forest is an ensemble machine learning method used for classification and
regression. It works by building many decision trees and merging their results. It
uses the bagging technique where sampling-with-replacement is applied to the
dataset. Then, each sample is used to train a separate decision tree. A significant
feature of Random Forest is its ability to compute feature importances during the
training process. It does this by randomizing a feature (while keeping all other
features constant) and then checking how much the error increased. The most
common criterion for computing feature importance is the mean decrease in
impurity (MDI) when a feature is used to split a node [8]. The code snippet below
shows the computation of the scikit-learn ‘RandomForestClassifier,’ where the
hyperparameters have been determined as above using scikit-learn’s ‘GridSearchCV.’

1 model_rf = RandomForestClassifier(n_estimators=500, criterion='gini', max_features=


2 'sqrt',random_state=12,max_depth=10)
3 model_rf.fit(X_train,y_train)

gistfile1.txt hosted with ❤ by GitHub view raw

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 19/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

The code for the computation and display of feature importances is shown below.
The computed feature importances are shown in Figure 10.

1 importances = model_rf.feature_importances_
2 forest_importances = pd.Series(importances, index=column_headers)
3 sortindices = np.argsort(forest_importances)
4 plt.title(' Random Forest Feature Importances')
5 plt.barh(range(len(sortindices)), forest_importances[sortindices], color='lime', align='
6 plt.yticks(range(len(sortindices)), [column_headers[i] for i in sortindices])
7 plt.show()

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 10. Feature importances

B.6 The LightGBM algorithm


LightGBM (Light Gradient Boosting Machine) is a gradient-boosting algorithm that
combines speed and performance. Developed by Microsoft, it is known for handling
large datasets and for its efficiency in terms of memory and speed. Some of its
unique features are (a) its ability to filter out data instances with small gradients
and focus on more critical instances, (b) ‘Exclusive Feature Bundling’(EFB):
LightGBM reduces the number of features by bundling mutually exclusive features
(those that very infrequently are non-zero at the same time). In this way, the
algorithm increases the efficiency of high-dimensional data [9].

The snippet below shows the implementation of LightGBM. The hyperparameters


(‘learning rate,’ ‘max_depth,’ and ‘n_estimators’) were chosen using scikit-learn’s

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 20/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

‘GridSearchCV’ algorithm. The feature importances computed from LightGBM are


shown in Figure 11.

1 model_lgb = lgb.LGBMClassifier(device="gpu",learning_rate=0.05,max_depth=12,n_estimators
2 model_lgb.fit(X_train, y_train)

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 11.

B.7 The XGBoost Algorithm


XGBoost, which stands for eXtreme Gradient Boosting, is an advanced implementation
of gradient boosting. It has the following unique characteristics:

It can effectively use all available CPU cores or clusters to create the tree in
parallel. It also utilizes cache optimization.

Compared to LightGBM, XGBoost grows trees depth-wise (level-wise), while


LightGBM grows trees leaf-wise. This makes XGBoost less efficient with large
datasets.

The code snippet shows the implementation of XGBoost, where the


hyperparameters [10] shown below were chosen based on Bayesian optimization
implemented in the ‘hyperopt’ package. These hyperparameters are:

‘gamma’ (min loss reduction for a split),

‘min_child_weight’ (min required sum of weights of all observations in a child)

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 21/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

‘max_depth’ (max tree depth)

‘reg_lambda’ (L2 regularization handle)

Finally, the hyperparameter ‘reg_alpha,’ which controls L1 regularization, was set


manually after experimentation.

1 from xgboost import XGBClassifier


2
3 model_xgb = XGBClassifier(colsample_bytree=0.9137467413171512 ,
4 gamma=4.316993662876307, min_child_weight= 2.0,
5 max_depth=8, reg_alpha= 5.0, reg_lambda=0.4414490256174322 )
6 model_xgb.fit(X_train, y_train)

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 12 shows the feature importances. Note that some importances are set to
zero because of L1 regularization.

Figure 12. Feature importances

B.8 The CatBoost Algorithm


CatBoost [11] is a high-performance, open-source gradient boosting library,
particularly well-suited for categorical data. Specifically, it does not require any pre-
processing of categorical variables, such as label-encoding or one-hot-encoding.
Instead, it handles categorical variables natively. CatBoost employs symmetric trees
as its base predictors and supports GPU acceleration. Regarding CatBoost
implementation in Python, it is important to note that all non-numeric features

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 22/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

must be declared as type ‘category.’ Then, as shown in the snippet below, the
categorical features are provided as input to the model’s fit function.

Figure 13 shows the feature importance computed by CatBoost. It is important to


note that the names of the features are the ones in the original data set (not the one-
hot-encoded). Because CatBoost handles categorical data natively, the input to the
CatBoost algorithm was the original data (not one-hot-encoded).

1 categoricalcolumns = Xcat_train.select_dtypes(include=["category"]).columns.tolist()
2 cat_feat = [Xcat_train.columns.get_loc(col) for col in categoricalcolumns]
3 model7 = cb.CatBoostClassifier(loss_function="Logloss",iterations=1000,eval_metric="AUC"
4 model7.fit(Xcat_train, ycat_train, cat_features=cat_feat, plot=True)
5 importances = model7.feature_importances_

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 13. Feature importances

B.9 The SelectFromModel Method


‘SelectFromModel,’ is offered by scikit-learn’s feature.selection package. Its unique
characteristic is that it is a meta-transformer that can be used with models that assign
importances to features, either through coef_ or feature_importances_.

In contrast to the previous embedded methods we discussed, which just computed


feature importances, ‘SelectFromModel’ actually selects features. The snippet below
shows the code for feature selection using this method.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 23/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 from sklearn.ensemble import ExtraTreesClassifier


2 from sklearn.feature_selection import SelectFromModel
3
4 model_etc = ExtraTreesClassifier(n_estimators=50)
5 model_etc.fit(X_train, y_train)
6 selecti = SelectFromModel(model_etc, prefit=True)
7 new_F = pd.DataFrame(columns = ['Feature', 'Selection'])
8 new_F['Feature'] = X_train.columns
9 new_F['Selection'] = selecti.get_support().tolist()
10 new_FF = new_F[new_F['Selection']==True]
11 selected_features= new_FF['Feature'].tolist()
12 print(selected_features)

gistfile1.txt hosted with ❤ by GitHub view raw

The selected features are:

1 ['Age', 'RestingBP', 'Cholesterol', 'MaxHR', 'Oldpeak',


2 'ChestPainType_ASY', 'ExerciseAngina_N', 'ST_Slope_Flat',
3 'ST_Slope_Up']

gistfile1.txt hosted with ❤ by GitHub view raw

Filter Feature Selection Methods


These are independent of any machine learning model. They typically rely on
statistical measures to evaluate each feature, such as correlation and mutual
information between the target and predictor variables.

Why they are important


Filter methods are straightforward and very easy to compute and, therefore, are
used as an initial feature selection step in many fields with large amounts of data,
such as bioinformatics[12], environmental studies, and healthcare research[13].

B. 10 Mutual information
The mutual information measures the reduction in uncertainty (entropy) in one
variable, given knowledge of the other. The mutual information between the
predictors and the target variable is computed using scikit-learn’s
mutual_info_classif. The mutual information score of each predictor is shown in
Figure 14.

1 from sklearn.feature_selection import mutual_info_classif as MIC


2 mi_scores = MIC(X,y)

gistfile1.txt hosted with ❤ by GitHub view raw

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 24/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 14. Mutual information scores.

B.11 The MRMR Algorithm


MRMR stands for Maxium-Relevancy-Maximum-Redundancy. As the name
indicates, the MRMR algorithm selects features that are (a)Maximally relevant, i.e.,
strongly correlated with the target variable, (b) Minimally redundant, i.e., exhibit
high dissimilarity among them. Redundancy can be computed using correlation or
mutual information measures, and relevance can be calculated using the F-statistic
or mutual information[15]. MRMR is a minimal-optimal method because it selects a
group of features that, together, have maximum predictive power [14]. This is in
contrast to the Boruta algorithm, discussed in section B.2, which is an all-relevant
algorithm because it identifies all features relevant to the model’s prediction.

The code snippet below shows the implementation of MRMR with the ‘mrmr’
Python library.

1 import mrmr
2 from mrmr import mrmr_classif
3 selected_features = mrmr_classif(X, y, K=5)
4 print(selected_features)

gistfile1.txt hosted with ❤ by GitHub view raw

The minimal-optimal set of selected features is shown below:

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 25/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 ['ST_Slope_Up', 'ChestPainType_TA', 'Sex_M',


2 'ChestPainType_ASY', 'ST_Slope_Flat']

gistfile1.txt hosted with ❤ by GitHub view raw

B.12 The SelectKBest Method


As the name suggests, this algorithm selects the K best features according to a user-
defined score. The number K is also user-defined. The algorithm can be applied to
both classification and regression tasks, and it offers a variety of scoring functions.
For example, for classification, the user can apply the following: (a) ‘f_classif,’ which
computes the ANOVA F-value, (b) ‘mutual_info_classif,’ which computes mutual
information, and (c) chi2, which computes chi-squared statistics, between the
predictors and the target variable [16]. The code snippet below shows the
computation of SelectKBest for k=5 and score function ‘f_classif’.

1 from sklearn.feature_selection import SelectKBest


2 from sklearn.feature_selection import f_classif
3
4 f_selector = SelectKBest(score_func=f_classif, k=5)
5 f_selector.fit(X_train, y_train)

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 15 below shows the scores (importances) of the features according to the
scoring function ‘f_classif.’ Note that although we chose K=5, Figure 15 displays the
scores for all features.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 26/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Figure 15. Feature importances.

B.13 The Relief Algorithm


Relief ‘s unique characteristic is the following idea: For a data sample, find its closest
neighbor in the same class (‘near hit’) and the closest neighbor in the other class
(‘near miss’). Features are weighted according to how well they are similar to the
‘near hit’ and how well they differentiate from the ‘near miss’ sample. Relief is
particularly useful in biomedical informatics because of its sensitivity to complex
feature associations [17]. Here, we used an extension of the original Relief
algorithm, the ReliefF algorithm, which can be applied to multi-class classifications.
In contrast, the original Relief algorithm can only be applied to binary classification
cases. The snippet below shows the invocation of the ‘ReliefFselector’ from the
‘kydavra’ Python package.

1 from kydavra import ReliefFSelector


2 selector = ReliefFSelector(10,5)
3 selected_cols = selector.select(heart2, 'HeartDisease')
4 print(selected_cols)

gistfile1.txt hosted with ❤ by GitHub view raw

The selected features from the algorithm are shown below.

1 ['ChestPainType_NAP', 'ExerciseAngina_Y',
2 'ChestPainType_ASY', 'ST_Slope_Flat', 'Sex_F']

gistfile1.txt hosted with ❤ by GitHub view raw

Misc Feature Selection Techniques


In this final category, we will discuss the Featurewiz, Selective, and PyImpetus
packages.
Why they are important
Each package is important for its unique reasons: (a) Featurewiz is a very convenient
AutoML package. It selects features with one line of code; (b) The Selective package
offers a wide variety of filter and embedded filter selection methods that can be
easily invoked with one line of code; (c) The PyImpetus package is based on an
algorithm that is very different from all other feature selection techniques, the
Markov Blanket.

B.14 The Featurewiz Package

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 27/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

This is an automated feature selection tool [18][19]. Its invocation is as simple as


shown in the code snippet below. Under the hood, it uses the ‘SULOV’ algorithm
(Searching for Uncorrelated List Of Variables), whose basis is the MRMR algorithm
described above in section B.11. ‘SULOV’ selects the features with the highest
mutual information score and the smallest correlation among them. Then, the
features are passed recursively through XGBoost to find the best subset.

1 from featurewiz import featurewiz


2 feats = featurewiz(heart2, 'HeartDisease')

gistfile1.txt hosted with ❤ by GitHub view raw

The features selected from Featurewiz are shown below.

1 Selected 11 important features:


2 ['ST_Slope_Up', 'Cholesterol', 'MaxHR', 'Oldpeak', 'ChestPainType_ASY',
3 'Sex_F', 'ExerciseAngina_Y', 'RestingECG_LVH', 'RestingECG_ST',
4 'RestingECG_Normal', 'ST_Slope_Down']

gistfile1.txt hosted with ❤ by GitHub view raw

B.15. The Selective Feature Selection Library


This library provides numerous feature selection methods for classification and
regression tasks [20]. Some of the methods offered are: correlation, variance,
statistical analysis (ANOVA f-test classification, chi-square, etc.), linear methods
(linear regression, lasso, and ridge regularization, etc.), and tree-based methods
(Random Forest, XGBoost, etc.). An example of this library’s usage is shown below.

1 from feature.selector import SelectionMethod, Selective


2 # Various feature selectors offered
3 #selector = Selective(SelectionMethod.Variance(threshold=0.0))
4 #selector = Selective(SelectionMethod.Correlation(threshold=0.5, method="pearson"))
5 #selector = Selective(SelectionMethod.Statistical(num_features=3, method="anova"))
6 #selector = Selective(SelectionMethod.Linear(num_features=3, regularization="none"))
7 selector = Selective(SelectionMethod.TreeBased(num_features=5))
8
9 # Feature selection
10 subset = selector.fit_transform(X_train, y_trainorig)
11 print("Selected Features:", list(subset.columns))

gistfile1.txt hosted with ❤ by GitHub view raw

The selected features using the ‘TreeBased’ method are:


https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 28/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

1 Selected Features: ['Cholesterol', 'MaxHR', 'Oldpeak',


2 'ChestPainType_ASY', 'ST_Slope_Up']

gistfile1.txt hosted with ❤ by GitHub view raw

B.16. The PyImpetus Package


The unique idea of this algorithm is the Markov Blanket, which is the minimal
feature set needed to predict the target variable[21][22]. It can be used for both
classification and regression tasks. Its implementation for classification is shown
below.

1 # Import the algorithm. PPIMBC is for classification


2 from PyImpetus import PPIMBC
3
4 from sklearn.svm import SVC
5 # Initialize the PyImpetus object
6 model = PPIMBC(model=SVC(random_state=27, class_weight="balanced"), p_val_thresh=0.05,
7 # The fit_transform function is a wrapper for the fit and transform functions, individu
8 # The fit function finds the MB for given data while transform function provides the pr
9 df_train = model.fit_transform(X_train, y_train)
10
11 # Plot of the feature importance scores
12 model.feature_importance()

gistfile1.txt hosted with ❤ by GitHub view raw

Figure 16 shows the selected features and their relative importance.

Figure 16.

Discussion and Conclusion

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 29/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

In this article, we discussed a broad spectrum of feature importance


assessment techniques from two distinct realms: interpretability and
feature selection. Given the diversity of the discussed algorithms, a question that
arises naturally is:” How similar are the features selected as most important by the
various algorithms?”

Let us take a look at Table 1 below. This table has two columns corresponding to the
features ‘ST_Slope_up’ and ‘ST_Slope_flat.’ The rows correspond to the algorithms
and packages we used in the article. The numbers 1,2,3 indicate whether the feature
was selected as the best, second best, or third best by the algorithm.

Table 1.

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 30/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

As discussed in the article, some algorithms simply output a set of features without
any order. In this case, the X in the table indicates that the algorithm selected the
feature. In the case of a gap in the table, the feature was not in the three best
features selected by the corresponding algorithm. For logistic regression, the
absolute value of the coefficients was considered. For CatBoost, we assigned a 1
value to both ‘ST_Slope_up’ and ‘ST_Slop_flat’ because CatBoost selected ‘ST_Slope’ as
the most important feature. Finally, the OMNIXAI package results were not included
because they provided local explanations for a few rows.

An interesting fact emerges from the observation of Table 1. Except for LightGBM,
the feature ‘ST_Slope_up’ had the highest or second highest importance in the
algorithms that report feature importances. It was also selected by most algorithms
that reported selected features without importances. The feature ‘ST_Slope_Flat’
also performed pretty well because it was either in the first three highest-
importance features or in the selected feature groups for most algorithms.

Now, let us delve into another interesting insight. These two features had the highest
and second-highest mutual information scores. As we saw in section B.10, this is a
straightforward feature computed in one line of code. So, with one line of code, we
gained insights into the most important features of our data, as reported by the other
significantly more computationally complex algorithms.

This article discussed twenty-one packages and methods that compute feature
importance, a measure of a feature’s contribution to a model’s predictive ability. For
further reading, I recommend [23], which discusses another role of features,
specifically their error contribution to a model.

The entire code can be found at


https://github.com/theomitsa/Feature_importance/tree/main.

Thank you for reading!

Footnotes:

Dataset license: Open Data Commons Open Database License (ODbL) v1.0 as
mentioned in https://www.kaggle.com/datasets/fedesoriano/heart-failure-
prediction and described in https://opendatacommons.org/licenses/odbl/1-0/

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 31/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

I certify that all visualizations/images/graphs in the article are created by the


author and can be reproduced in the github directory above.

References
1. The Shapash package,https://shapash.readthedocs.io/en/latest/

2. Molnar, C., Interpretable Machine Learning, 2023.


https://christophm.github.io/interpretable-ml-book/

3. The OMNIXAI package,


https://opensource.salesforce.com/OmniXAI/latest/omnixai.html

4. The InterpretML package, https://interpret.ml/

5. The Dalex package, https://dalex.drwhy.ai/

6. The Eli5 package, https://eli5.readthedocs.io/en/latest/index.html

7. Mazzanti, S., Boruta Explained Exactly How You Wished Someone Explained to
You, Medium: Towards Data Science, March 2020.

8. Scornet E., Trees, Forests, and Impurity-Based Variable Importance, 2021,


ffhal-02436169v3f, https://hal.science/hal-
02436169v3/file/importance_variable.pdf

9. Ke, G. et al., LightGBM: A Highly Efficient Gradient Boosting Decision Tree,


NIPS Conference, pp. 3149–3157, December 2017.

10. Banerjee, P., A Guide on XGBoost Hyperparameters Tuning,


https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-
hyperparameters-tuning

11. Prokhorenkova, L. et al., CatBoost: Unbiased Boosting With Categorical


Features, NIPS’18: Proceedings of the 32nd International Conference on Neural
Information Processing Systems, pp. 6639–6649, December 2018.

12. Urbanowicz, R.J. et al., Benchmarking Relief-Based Feature Selection Methods


for Bioinformatics Data Mining, Journal of Biomedical Informatics, vol.85,
pp.168–188, Sept. 2018.
https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 32/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

13. Raju, S.K., Evaluation of Mutual Information and Feature Selection for SARS-
CoV-2 Respiratory Infection, Bioengineering (Basel), vol. 10, no. 7, July 2023.

14. Mazzanti, S., “MRMR” Explained Exactly How You Wished Someone Explained
to You, Medium: Towards Data Science, February 2021.

15. Radovic, M. et al., Minimum Redundancy Maximum Relevance Feature


Selection Approach for Temporal Gene Expression Data, BMC Bioinformatics,
January 2017.

16. Kavya, D., Optimizing Performance: SelectKBest for Efficient Feature Selection
in Machine Learning, Medium, February 2023.

17. Urbanowicz, R. J. et al., Relief-Based Feature Selection: Introduction And


Review, Journal of Biomedical Informatics, vol. 85, pp. 189–203, Sept. 2018.

18. The Featurewiz package, https://github.com/AutoViML/featurewiz

19. Sharma, H., Featurewiz: Fast Way to Select the Best Features in Data, Medium,
Medium: Towards Data Science, Dec.2020.

20. The Selective Feature Selection Library, https://github.com/fidelity/selective

21. The PyImpetus package, https://github.com/atif-hassan/PyImpetus

22. Hassan, A. et al., PPFS: Predictive Permutation Feature Selection,


https://arxiv.org/pdf/2110.10713.pdf

23. Mazzanti, S., Your Features Are Important? It Doesn’t Mean They Are Good,
Medium: Towards Data Science, August 2023.

Machine Learning Data Science Interpretability Tips And Tricks

Feature Importance

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 33/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Follow

Written by Theophano Mitsa


1K Followers · Writer for Towards Data Science

I am a data scientist, ECE Ph.D. My interests are in AI, machine learning, and evidence-based medicine.

More from Theophano Mitsa and Towards Data Science

Theophano Mitsa in Towards Data Science

How Do You Know You Have Enough Training Data?


There is some debate recently as to whether data is the new oil [1] or not [2]. Whatever the
case, acquiring training data for our machine…

· 9 min read · Apr 23, 2019

768 5

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 34/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Sheila Teo in Towards Data Science

How I Won Singapore’s GPT-4 Prompt Engineering Competition


A deep dive into the strategies I learned for harnessing the power of Large Language Models

· 24 min read · 5 days ago

2.5K 33

Mariya Mansurova in Towards Data Science

Can LLMs Replace Data Analysts? Building An LLM-Powered Analyst


https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 35/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Part 1: empowering ChatGPT with tools

19 min read · Dec 11, 2023

1.2K 12

Theophano Mitsa in Towards Data Science

Need for Speed: Comparing Pandas 2.0 with Four Python Speed-Up Libs
(with Code)
Polars, Dask, RAPIDS.ai cuDF, and Numba are compared against Pandas 2.0 with pyarrow in
the backend, vectorization, and itertuples()…

· 9 min read · May 20, 2023

244 2

See all from Theophano Mitsa

See all from Towards Data Science

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 36/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Recommended from Medium

Martin Thissen

How I Built My $10,000 Deep Learning Workstation


In this article, I’ll show you how I built my $10,000 deep learning rig.

· 14 min read · Dec 6, 2023

608 11

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 37/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Anmol Tomar in CodeX

Top 10 Data Visualizations of 2023 Worth Looking at!


Level Up Your Visualization Game!

· 4 min read · Dec 27, 2023

422 4

Lists

Predictive Modeling w/ Python


20 stories · 746 saves

Practical Guides to Machine Learning


10 stories · 865 saves

Natural Language Processing


1053 stories · 529 saves

data science and AI


38 stories · 31 saves

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 38/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Salih Salih in Towards Data Science

Curse of Dimensionality: An Intuitive Exploration


An Intuitive Introduction to the Core Challenges When Working with High-Dimensional Data

11 min read · 3 days ago

125 3

Pralabh Saxena in Level Up Coding

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 39/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

10 Python Libraries You Should Be Using for Data Science (But Probably
Aren’t)
In the rapidly evolving field of data science, Python has emerged as the lingua franca, thanks to
its simplicity, readability, and…

· 4 min read · Nov 27, 2023

519 2

Mokkup.ai

The Future of Data Visualization: 2024 and Beyond


In this blog post, we will discuss the latest trends in data visualization and explore how these
trends will shape the future of the field.

7 min read · Dec 15, 2023

342 2

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 40/41
03/01/2024, 17:57 A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano …

Yanick Andrade in Python in Plain English

Automated Project Documentation — Python


Document Your Python Code Like a Pro

· 10 min read · Dec 26, 2023

259 2

See more recommendations

https://towardsdatascience.com/a-guide-to-21-feature-importance-methods-and-packages-in-machine-learning-with-code-85… 41/41

You might also like