0% found this document useful (0 votes)

47 views10 pages

Practical Aplication 2

This paper aims to predict patient survival from liver cirrhosis using probabilistic and non-probabilistic classification models. It evaluates algorithm performance using original patient variables and after feature selection. It assesses the impact of univariate, multivariate, and wrapper feature selection methods on model performance and interpretability. The study uses data from a Mayo Clinic trial on primary biliary cirrhosis, comprising clinical features of 293 patients. Objectives include evaluating classifier performance with and without feature selection, and comparing probabilistic and non-probabilistic models.

Uploaded by

minionp32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views10 pages

Practical Aplication 2

Uploaded by

minionp32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Practical application 2: probabilistic supervised classification

Álvaro Garcı́a Barragán

[email protected]
Universidad Politécnica de Madrid
December 4, 2023

Abstract
This paper proposes a comprehensive study aimed at predicting the survival states of patients
diagnosed with liver cirrhosis with probabilistic and non-probabilistic supervised classification
models. Data are sourced from a Mayo Clinic study on primary biliary cirrhosis. The objec-
tives include evaluating algorithm performance with original variables, employing feature selection
methods, assessing their influence on classification outcomes and evaluating the interpretability of
the models.

1 Introduction
The data provided are sourced from a Mayo Clinic [1] study on primary biliary cirrhosis (PBC)
of the liver carried out from 1974 to 1984. The dataset is a multivariate dataset [2], which, after
preprocessing, comprises 293 instances, each corresponding to an individual patient. It encompasses
25 distinct clinical features offering insights into demographics, comorbidities, treatment, and various
other factors. The target class has two possible values: 0 (Ceased) and 1 (Death), with frequencies of
168 and 125, respectively. The study aims to achieve the following objectives:

• Evaluate the performance of probabilistic classification algorithms, for binary classification.

• Conduct a comprehensive analysis of the classification algorithms using all original variables as
input features.
• Perform a second analysis using a univariate filter feature subset selection method to determine
its impact on classification performance.
• Conduct a third analysis using a multivariate filter feature subset selection technique to assess
its influence on the algorithms’ performance.
• Perform a fourth analysis employing a wrapper feature subset selection method to evaluate its
effect on the classification results.
• Analyze interpretability of probabilistic supervised classifiers.

• Finally, the above models are compared with those not based on probabilities.

2 Methodology
In this section, the problem of evaluating the models is described in subsection 2.1, the FSS methods
are described in subsection 2.2, the explanation of the model selection subsection 2.3 and the software
tools used are detailed in subsection 2.4. The proposed solution is introduced in subsection 2.5.
Additionally, a comprehensive description of the experiments are presented in subsection 2.6. Finally,
the importance of model interpretability is underscored in subsection 2.7.

1
Table 1: Data-set Analysis
Feature Mean Std Min Max Type
Age 50.6271 10.5698 26.2958 78.4931 continuous
Albumin 3.5169 0.4229 1.96 4.64 continuous
Alk Phos 2011.6709 2195.9565 289.0 13862.4 continuous
Ascites N 0.918089 0.274699 0.0 1.0 binary
Ascites Y 0.081911 0.274699 0.0 1.0 binary
Bilirubin 3.264164 4.648182 0.3 28.0 continuous
Cholesterol 365.2108 212.7557 120.0 1775.0 continuous
Copper 95.9395 84.2078 4.0 588.0 continuous
Drug D-penicillamine 0.5051 0.5008 0.0 1.0 binary
Drug Placebo 0.4948 0.5008 0.0 1.0 binary
Edema N 0.8395 0.3676 0.0 1.0 binary
Edema S 0.0921 0.2897 0.0 1.0 binary
Edema Y 0.0682 0.2526 0.0 1.0 binary
Hepatomegaly N 0.4948 0.5008 0.0 1.0 binary
Hepatomegaly Y 0.5051 0.5008 0.0 1.0 binary
N Days 2038.6655 1137.3298 41.0 4556.0 continuous
Platelets 259.5417 95.5279 62.0 563.0 continuous
Prothrombin 10.7488 1.0219 9.0 17.1 continuous
Sex F 0.8873 0.3166 0.0 1.0 binary
Sex M 0.1126 0.3166 0.0 1.0 binary
SGOT 122.0661 57.7574 26.35 457.25 continuous
Spiders N 0.70989 0.4545 0.0 1.0 binary
Spiders Y 0.2901 0.4545 0.0 1.0 binary
Stage 3.0170 0.8854 1.0 4.0 nominal
Tryglicerides 124.1343 61.5558 44.0 598.00 continuous

2.1 Evaluation problem

The evaluation of machine learning models stands as a pivotal aspect in Data Science, given the multiple
configurations that can be applied. Comparing two models trained with different datasets, whether
involving distinct features or preprocessing methods, presents a challenge in creating a pipeline that
can robustly compare models employing various types of feature selection. The vast space of possible
configurations introduces complexities. The questions within this space are:

• How many features will the Feature Subset Selection (FSS) select?
• What methods of univariate and multivariate FSS should be employed?
• How much data should be utilized for wrapper FSS?
• Which hyperparameters do we intend to select for the model? For Artificial Neural Networks
(ANN), how many layers, what type of layers, and the number of neurons per layer?
• What constitutes the most equitable validation approach for comparing models with different
subsets of the data-set?
• How should the data be partitioned to facilitate the selection of the best configuration in tuning
parameters, evaluating models, and choosing the most effective features?
• All the models can be compared with the same preprocesing?

To address these challenges, I propose a simplified scenario to ensure fair evaluation, as depicted
in subsection 2.5. This scenario is built on the following assumptions:

1. Models can only be compared when using the same dataset, with an identical number of features
and instances except wrapper.

2
2. Given that FSS with wrapper methods involve training the model using a portion of the data
and subsequently selecting features with the testing proportion, we can compare wrapper models
even when utilizing different datasets beacuse all model have the oportunity to select their best
features. However, it is essential to maintain consistency by employing the same split of the
dataset for wrapper selection in every model. This ensures that each model starts from the same
initial point.

2.2 Feature Subset Selection

The process of selecting an appropriate data subset is crucial for a model’s learning capability. Models
have diverse characteristics that may render certain methods more effective than others. In this paper,
we have selectively identified one exemplary method for each type of feature selection taking into
account their robustness and versatility for various modeling contexts:

• Univariate Filtering: We employ the Mutual Information method [3], which quantifies the
amount of information gained about one random variable through observing another. This
method assesses the dependency between each feature and the target variable independently,
which is particularly useful for identifying individual features that have the strongest correlations
with the outcome.

• Multivariate Filtering: We use the Relief algorithm [4]. This method extends beyond the
scope of univariate methods by considering the interaction between features. It is adept at
detecting subtle feature interactions, making it valuable for scenarios where the predictive power
comes from the synergy between features rather than isolated effects.
• Wrapper-Based Filtering: The Sequential Feature Selector (SFS) [5] is implemented as our
wrapper method. SFS is a search strategy that systematically assesses model performance across
different feature combinations to identify the optimal subset in a greedy fashion. Our approach
adopts a backward elimination strategy, beginning with the full set of features and iteratively
removing variables until the best-performing set of k features is determined. It will select the
best subset of features that maximizes the F-score, over the 5 stratified folds, in Algorithm 1 is
depicted the process.

Algorithm 1 Wrapper Method for Feature Selection

Input: Dataset (X, y), Model (model), Number of Features (n f eatures)
Output: Selected Data Set X selected wrapper
Initialize SequentialFeatureSelector(model, n f eatures, scoring=”f1”, direction=”backward”,
CV=5)
Fit the selector with Dataset (X, y)
Transform X using the selector to obtain x selected wrapper
Return X selected wrapper

2.3 Model Selection

In this section, its explain the base estimators of the meta-classifiers. The decision tree is chosen as
the estimator in the Bagging ensemble model due to its inherent instability. In the stacking ensemble
model, Support Vector Machine (SVM) and Decision Tree serve as base classifiers, leveraging their
strengths in capturing complex patterns, while Logistic Regression is selected as the final classifier
for its simplicity and effectiveness in combining outputs. The Voting Model combines SVM, Logistic
Regression, and RandomForest to capitalize on the unique strengths of each classifier and mitigate their
individual weaknesses. This ensemble strategy aims to improve overall performance by integrating
diverse modeling capabilities.

3
2.4 Software Tools
The software utilized for the execution of this project was implemented within the Python Jupyter
Notebooks environment to systematically document all experimental procedures. The codebase and
corresponding experiment records are accessible via the provided repository link: https://github.
com/Alvaro8gb/PracticalApplicationsML. The primary libraries utilized are [6, 7, 8].

2.5 Evaluation
In this section, we explain the pipeline through which the models are evaluated. This pipeline represents
a simplified version of the complex space, taking into account the following statements:
• The FSS only consider the best 15 features, due to high computational cost, this number is
selected making a Principal Component Analysis. For more details see source code.
• The evaluation is configured with 5 folds, and the 7-fold option is excluded due to the dataset
size, as it would result in a test set with only 15 samples, which appears to be insufficient.
• The models chosen for the experiments serve as representatives of the taxonomy elucidated in
class, with the exception of those for which I did not find Python implementations, such as TAN
or Naive Bayes Tree.
• The hyperparameters grid is estimated in a random homogeneous manner.
The automated pipeline (Algorithm 2) takes as input a binary classifier model, each with its
parameter grid, and the dataset. It then outputs the mean and standard deviation of the F-score and
Brier score for each fold.

Algorithm 2 Pipeline to evaluate models

Input: Dataset (X, y), Model(model), Parameter Grid (params grid)
Output: Model Evaluation Metrics (F Score and BrierScore)

Initialize f 1 score and brie score lists

Prepare Stratified K-Fold Cross-Validator with K splits

for each train-test split in K-Fold Cross-Validator do

Split data into X train, X test and y train, y test
Perform Grid Search with cross-validation on X train, y train, over model and params grid
Select the best model based on the Grid Search
Train the best model on X train, y train
Make predictions on X test
Evaluate predictions to compute F Score and BrierScore
Append scores to the respective lists (f 1 score and brie score)
end for
Compute and return the arithmetic mean and the standard deviation of F Score and BrierScore

2.6 Experiments
For each final dataset derived from the feature selection process outlined in Table 3, a 5 Cross-Validation
validation procedure will be employed as explained in subsection 2.5. The chosen metric for evaluating
the model is the F-score (see Equation 3). This choice is motivated by the presence of class imbalance,
as the F-score enables a balanced assessment of both false positives and false negatives. Furthermore,
in the realm of medicine, where precise probability estimation is crucial, model calibration becomes
paramount. To assess the accuracy of these probability estimates, we utilize the Brier Score (refer
to Equation 4). Lower Brier Scores indicate better-calibrated models, making this metric valuable in
evaluating the reliability of probability estimates generated by the model.
Each type of model exhibits distinct characteristics that necessitate different parameter configura-
tions. While the primary objective of this project is not to obtain the optimal configuration, in order to

4
ensure a fair comparison, model parameters and hyperparameters have been selected through a 5-fold
cross-validation, with the training set of the upper validation and scoring F-score. The configuration
of every model are displayed in Table 2.
True Positives
Precision = (1)
True Positives + False Positives
True Positives
Recall = (2)
True Positives + False Negatives
Precision × Recall
F-score = 2 × (3)
Precision + Recall
N
1 X
Brier-score = (pi − oi )2 (4)
N i=1
Where:

− N is the number of samples.

− pi is the predicted probability of the positive class for the i-th sample
− oi is the actual binary outcome for the i-th sample (0 or 1).

2.7 Model Visualization and Interpretability

In the following section, we present visualizations depicting intrepretability of probabilistics of models.
This models are trainned with all the data and with all the features. In a multivariate dataset with
more than three features, visualizing boundary regions becomes impractical. The only viable approach
for interpreting the model is to examine feature importance. Except for models like Decision Trees
and RIPPER, whose decision-making processes can be traced and understood.
Logistic Regression, as illustrated in Figure 1, allows for the examination of feature importance
by showcasing the coefficients of the model. A feature with a coefficient close to 0 indicates that the
model does not consider the feature significant for decision-making. In this particular instance, it is
evident that Age holds a positive importance, implying that as age increases, the likelihood of the
model predicting death also increases. On the other hand, a high Platelets coefficient suggests a
lower likelihood of being classified as deceased (Death).
Gaussian Naive Bayes also exhibits a form of interpretability. In this instance, we provide two
examples wherein we take a single patient, alter one feature within the range from the minimum to
the maximum, and demonstrate the resulting changes in probabilities, while fixing the other features.
Figure 2 illustrates that from an SGOT level of 200, the models begin to incrementally raise the proba-
bility of Death (class 1) in a Gaussian-like pattern. Finally, in Figure 3 display the Bilirubin feature,
which exhibits a behavior similar to the one observed previously but with a steeper curve.

3 Results
In this section, it is show the results of evaluating the models with 4 diferents FSS: All the features,
univariate subset, multivariate subset and wrapper subset. A fair comparison can only be established
for each subset due to differences in the data. This results are displayed in Table 4.

4 Discussion
In this section, we discuss the results in three ways: the subsets selected for the FSS (subsection 4.1),
model performance (subsection 4.2), and model interpretability (subsection 4.3).

5
Table 2: Model configurations and Hyperparameter grid
Model Hyperparameter Values
sklearnMLP [6] hidden layer sizes [50], [100], [50, 50]
activation logistic, tanh, relu
solver adam
alpha 0.0001, 0.001, 0.01
learning rate constant
max iter 300, 400
kerasMLP [7]: 3 layers (input lin- epochs 500, 750, 1000
ear layer, 32 neurons ReLU layer, batch size 64, 128
output linear layer) learning rate 0.001
lambda regu 0.01, 0.1
hidden neurons [50], [100], [50, 50]
RandomForest [6] n estimators 50, 70, 90
max features auto, sqrt, log2
max depth 4, 8, 12
min samples split 2, 5
min samples leaf 1, 2, 3
bootstrap True, False
AdaBoost [6] n estimators 50, 70, 90
learning rate 0.01, 0.1
algorithm SAMME, SAMME.R
Bagging [6], with Deci- n estimators 10, 50, 100
sionTree as estimator max samples 0.5, 1.0
max features 0.5, 1.0
base estimator max depth 3, 5, 10
DecisionTree [6] max depth None, 4, 8, 10
min samples split 2, 8, 16, 32
min samples leaf 2, 8, 16, 32, 48
max features None, sqrt, log2
criterion gini, entropy, log loss
KNearNeighbors [6] n neighbors 10, 25, 50, 65, 80
metric minkowski
p 2
LDA [6] solver svd, lsqr, eigen
priors None, [0.1, 0.9], [0.5, 0.5],
[0.9, 0.1]
tol 0.0001, 1e-05, 1e-06
LogisticRegression [6] C 0.001, 0.01, 0.1, 1
penalty l1, l2
max iter 1000, 2000, 3000, 4000,
5000
QDA [6] priors None, [0.1, 0.9], [0.5, 0.5],
[0.9, 0.1]
reg param 0.0, 0.1, 0.5
store covariance False, True
tol 0.0001, 1e-05
RIPPER [8] prune size 0.1, 0.3, 0.5
k 2, 3, 4
alpha 0.1, 1.0, 2.0
n discretize bins 10, 15, 20
SVM [6] C 0.1, 1
kernel linear, rbf, sigmoid
gamma scale, auto
Stacking [6], two base classifiers: svm C 0.1, 1
SVM and DT, final classifier: Lo- dt max depth 3, 5, 7
gistic Regression final estimator C 0.1, 1
Voting [6], 3 classifiers: SVM, lr C 0.1, 1
Logistic Regression and Ran- svc C 0.1, 1
domForest rf n estimators 10, 50, 100
voting soft, hard

6
Table 3: Selected features by each method
Wrapper
Feature Univariate Multivariate
LDA QDA LR GNB RF AB Bagging Stacking Voting
Age ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Albumin ✓ ✓ ✓ ✓ ✓ ✓ ✓
Alk Phos ✓ ✓ ✓ ✓ ✓ ✓
Ascites N ✓ ✓ ✓ ✓ ✓ ✓
Ascites Y ✓ ✓ ✓ ✓ ✓ ✓ ✓
Bilirubin ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Cholesterol ✓ ✓ ✓ ✓ ✓
Copper ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Drug D-penicillamine ✓ ✓ ✓ ✓
Drug Placebo ✓ ✓ ✓
Edema N ✓ ✓ ✓ ✓ ✓ ✓
Edema S ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Edema Y ✓ ✓ ✓ ✓ ✓ ✓ ✓
Hepatomegaly N ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Hepatomegaly Y ✓ ✓ ✓ ✓ ✓ ✓ ✓
N Days ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Platelets ✓ ✓
Prothrombin ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Sex F ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Sex M ✓ ✓ ✓ ✓ ✓ ✓ ✓
SGOT ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Spiders N ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Spiders Y ✓ ✓ ✓ ✓ ✓ ✓
Stage ✓ ✓ ✓ ✓ ✓
Tryglicerides ✓ ✓ ✓
* LDA: Linear Discriminant Analysis; QDA: Quadratic Discriminant Analysis; LR: Logistic Regression;
GNB: Gaussian Naive Bayes; RF: Random Forest; AB: Ada Boost;

Figure 1: Interpretability Logistic Regression

7
Figure 2: Interpretability GNB with Feature SGOT

Figure 3: Interpretability GNB with Feature Billirubin

4.1 Feature Subset Selection

Some features, such as Age, Albumin, Bilirubin, and Prothrombin, consistently appear across multi-
ple methods, indicating that these features are likely strong predictors and provide valuable information
for the models. Additionally, there are features selected exclusively by specific methods; for instance,

8
Table 4: Models Comparative Results
All Univariate Multivariate Wrapper
Model
F-score Brier-score F-score Brier-score F-score Brier-score F-score Brier-score
DT 0.6881 ± 0.06 0.1908 ± 0.03 0.6877 ± 0.06 0.1951 ± 0.04 0.6988 ± 0.05 0.1895 ± 0.02 0.6644 ± 0.05 0.2162 ± 0.03
KNN 0.7190 ± 0.07 0.1747 ± 0.02 0.7134 ± 0.07 0.1736 ± 0.02 0.7134 ± 0.07 0.1744 ± 0.02 0.7190 ± 0.07 0.1752 ± 0.02
RIPPER 0.5595 ± 0.11 0.2251 ± 0.05 0.5461 ± 0.09 0.2078 ± 0.03 0.6264 ± 0.07 0.1970 ± 0.03 0.6067 ± 0.07 0.1905 ± 0.02
kerasMLP 0.6485 ± 0.03 − 0.6586 ± 0.04 − 0.5983 ± 0.10 − 0.6900 ± 0.05 −
sklearnMLP 0.7081 ± 0.03 0.1543 ± 0.02 0.6945 ± 0.05 0.1611 ± 0.03 0.7012 ± 0.06 0.1660 ± 0.01 0.7109 ± 0.05 0.1547 ± 0.02
SVM 0.7715 ± 0.04 − 0.7604 ± 0.05 − 0.7340 ± 0.06 − 0.7244 ± 0.05 −
LR 0.7593 ± 0.05 0.1484 ± 0.02 0.7596 ± 0.06 0.1484 ± 0.02 0.7507 ± 0.04 0.1591 ± 0.02 0.7728 ± 0.04 0.1563 ± 0.02
LDA 0.7516 ± 0.04 0.1467 ± 0.01 0.7452 ± 0.06 0.1491 ± 0.02 0.7577 ± 0.04 0.1483 ± 0.01 0.7714 ± 0.05 0.1464 ± 0.01
QDA 0.7351 ± 0.07 0.1800 ± 0.04 0.7563 ± 0.08 0.1765 ± 0.05 0.7422 ± 0.02 0.1885 ± 0.03 0.6583 ± 0.09 0.2700 ± 0.13
GNB 0.6885 ± 0.03 0.2115 ± 0.03 0.7098 ± 0.03 0.1969 ± 0.03 0.7355 ± 0.04 0.1874 ± 0.02 0.7659 ± 0.05 0.1756 ± 0.04
RF 0.7497 ± 0.04 0.1360 ± 0.02 0.7461 ± 0.04 0.1388 ± 0.02 0.7589 ± 0.05 0.1355 ± 0.02 0.7881 ± 0.05 0.1389 ± 0.02
AB 0.7247 ± 0.06 0.1972 ± 0.01 0.7380 ± 0.08 0.1932 ± 0.01 0.7467 ± 0.04 0.1890 ± 0.01 0.7215 ± 0.09 0.2026 ± 0.01
Bagging 0.7521 ± 0.03 0.1462 ± 0.01 0.7340 ± 0.04 0.1437 ± 0.02 0.7332 ± 0.04 0.1408 ± 0.02 0.7586 ± 0.04 0.1495 ± 0.02
Stacking 0.7261 ± 0.06 0.1704 ± 0.03 0.7133 ± 0.06 0.1674 ± 0.02 0.7086 ± 0.07 0.1694 ± 0.02 0.7520 ± 0.04 0.1630 ± 0.03
Voting 0.7351 ± 0.04 − 0.7601 ± 0.05 − 0.7471 ± 0.04 − 0.7524 ± 0.06 −
* DT: Decision Tree, MLP: Multi Layer Perceptron; SVM: Supper Vector Machine;
LDA: Linear Discriminant Analysis; QDA: Quadratic Discriminant Analysis; LR: Logistic Regression;
GNB: Gaussian Naive Bayes; RF: Random Forest; AB: Ada Boost;

Drug Placebo is chosen by LDA, QDA, and some ensemble methods but not by others, suggesting
that certain features may only demonstrate their predictive power under specific model conditions or
when interacting with other features in complex ways. Furthermore, ensemble methods like Random
Forest (RF), Ada Boost (AB), bagging, stacking, and voting display a broader range of feature selec-
tion compared to many individual models. This diversity is expected, as ensemble methods combine
the predictions of several base estimators to enhance generalizability and robustness over a single
estimator.

4.2 Models Performance

Marchine learning classifiers models were evaluated across different feature selection methods, with
performance measured by F-score and Brier-score. The results displayed in Table 4, shows that
the Random Forest (RF) model exhibited superior performance in the Multivariate and Wrapper
feature selection categories, achieving the highest F-score and Brier-score, indicating both accuracy
and reliability. The Support Vector Machine (SVM) demonstrated the highest F-scores in the All
and Univariate categories, although Brier-scores were not reported because can no be estimate in
a proper way. The consistency of model performance varied, with some models like KNN showing
stability across different feature selection methods, while others like RIPPER were more variable. Fur-
thermore, the Voting model with univariate filter exhibits an F-score of 0.7601, achieving a metric very
close to that of SVM.
However, this project exhibits certain limitations demonstrating the statistical superiority of one
model over another requires rigorous testing beyond what has been presented.
Ultimately, the results reveal biases in evaluation metrics across folds, suggesting the presence of
outliers in the data and indicating a non-normally distributed dataset. Consequently, I assert that
a personalized preprocessing approach should be applied for each model, depending on its inherent
nature.

4.3 Models Interpretability

The model visualization indicates that the SGOT feature plays a critical role in the classification
decision. Elevated SGOT levels can serve as an indicator of potential liver damage because the liver
cells release SGOT into the bloodstream, leading to its association with liver-related issues [9].
On the contrary, an elevated platelet count is associated with a diminished likelihood of death due
to its crucial role in promoting blood clotting and preventing excessive bleeding. Platelets are essential
for preserving vascular barrier function in the absence of injury or inflammation, emphasizing their
importance in maintaining overall health and reducing the risk of fatal outcomes [10].

9
5 Conclusion
This project has successfully demonstrated the predictability of outcomes for patients with biliary
cirrhosis using state-of-the-art classification models. Our findings reveal that the Support Vector
Machine (SVM) achieves optimal performance in terms of F-score when employing all available features
and when using the univariate feature subset. Conversely, the RandomForest model excels in scenarios
involving multivariate and wrapper feature subsets, as evidenced by its lowest Brier scores, indicating
effective calibration and reliability.
Furthermore, our experiments highlight the significant impact that different feature selection al-
gorithms can have on a model’s performance, either enhancing or diminishing its effectiveness. This
observation underscores the importance of careful feature selection in model optimization.
A key takeaway from this study is the ability of certain models, particularly Logistic Regression and
Gaussian Naive Bayes, to identify which features are most influential in the classification decision. This
capability is especially valuable in the medical domain, where understanding the factors contributing
to a model’s predictions is crucial for clinical decision-making.
In summary, this research emphasizes the importance of model selection, feature selection, and
the interpretability of models in medical applications. The nuanced differences in model performance
across various feature selection techniques offer insights into the complexities of predictive modeling
in healthcare.

References
[1] M. Clinic, 2023. https://www.mayoclinic.org.
[2] D. E., G. P., F. T., F. L., and L. A., “Cirrhosis patient survival prediction.” UCI Machine Learning
Repository, 2023. https://doi.org/10.24432/C5R02G.
[3] A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Physical review
E, vol. 69, no. 6, p. 066138, 2004.
[4] R. J. Urbanowicz, R. S. Olson, P. Schmitt, M. Meeker, and J. H. Moore, “Benchmarking relief-
based feature selection methods for bioinformatics data mining,” Journal of biomedical informat-
ics, vol. 85, pp. 168–188, 2018.
[5] F. J. Ferri, P. Pudil, M. Hatef, and J. Kittler, “Comparative study of techniques for large-scale
feature selection,” in Machine intelligence and pattern recognition, vol. 16, pp. 403–413, Elsevier,
1994.
[6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret-
tenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in python,” Journal of
machine learning research, vol. 12, no. Oct, pp. 2825–2830, 2011.
[7] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,
J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Joze-
fowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah,
M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va-
sudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
“TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. Software available
from tensorflow.org.
[8] I. Moscovitz, “How to perform explainable machine learning classification — without any trees.”
https://github.com/imoscovitz/wittgenstein, 2019.
[9] J. A. Cohen and M. M. Kaplan, “The sgot/sgpt ratio—an indicator of alcoholic liver disease,”
Digestive diseases and sciences, vol. 24, pp. 835–838, 1979.
[10] S. Gupta, C. Konradt, A. Corken, J. Ware, B. Nieswandt, J. Di Paola, M. Yu, D. Wang, M. T.
Nieman, S. W. Whiteheart, et al., “Hemostasis vs. homeostasis: Platelets are essential for pre-
serving vascular barrier function in the absence of injury or inflammation,” Proceedings of the
National Academy of Sciences, vol. 117, no. 39, pp. 24316–24325, 2020.

Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Male Basic Round Neck Tee
100% (1)
Male Basic Round Neck Tee
21 pages
Review of Data Mining Classification Techniques
No ratings yet
Review of Data Mining Classification Techniques
4 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Cosc2753 A1 MC
No ratings yet
Cosc2753 A1 MC
8 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Model Selection On ML
No ratings yet
Model Selection On ML
49 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
No ratings yet
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
9 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Model Combination in Multiclass Classification
No ratings yet
Model Combination in Multiclass Classification
182 pages
7073 21560 2 PB
No ratings yet
7073 21560 2 PB
9 pages
4 Classification
No ratings yet
4 Classification
20 pages
Feature Extraction Techniques Using Support Vector Machines in Disease Prediction
No ratings yet
Feature Extraction Techniques Using Support Vector Machines in Disease Prediction
8 pages
APA Chapter3 T20
No ratings yet
APA Chapter3 T20
24 pages
Mini Project 2024
No ratings yet
Mini Project 2024
48 pages
Predicting The Presence of Heart Diseases Using Comparative Data Mining and Machine Learning Algorithms
No ratings yet
Predicting The Presence of Heart Diseases Using Comparative Data Mining and Machine Learning Algorithms
5 pages
Module 3
No ratings yet
Module 3
33 pages
CDT B1 Lab06 MondayWeek2
No ratings yet
CDT B1 Lab06 MondayWeek2
6 pages
Machine Learning Project
No ratings yet
Machine Learning Project
12 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Article Eda
No ratings yet
Article Eda
7 pages
Hepatitis C Virus Prediction Based On Machine Learning Framework: A Real-World Case Study in Egypt
No ratings yet
Hepatitis C Virus Prediction Based On Machine Learning Framework: A Real-World Case Study in Egypt
23 pages
Icml 2005
No ratings yet
Icml 2005
8 pages
CP1407 Prac6-9
No ratings yet
CP1407 Prac6-9
45 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
ML Report2
No ratings yet
ML Report2
21 pages
Feature Selection
No ratings yet
Feature Selection
56 pages
Class 2a-Decision Trees
No ratings yet
Class 2a-Decision Trees
28 pages
Abstarct New
No ratings yet
Abstarct New
2 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
10 - Chapter 3
No ratings yet
10 - Chapter 3
15 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Feature Subset Selection: A Correlation Based Filter Approach
No ratings yet
Feature Subset Selection: A Correlation Based Filter Approach
4 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Using Data Complexity Measures For Thresholding in Feature Selection Rankers
No ratings yet
Using Data Complexity Measures For Thresholding in Feature Selection Rankers
11 pages
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
No ratings yet
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
46 pages
Genes 11 01446 s001
No ratings yet
Genes 11 01446 s001
9 pages
Embedded Methods: Isabelle Guyon André Elisseeff
No ratings yet
Embedded Methods: Isabelle Guyon André Elisseeff
12 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
TP ComparacaoClassificadores
No ratings yet
TP ComparacaoClassificadores
3 pages
From Dummy Regression To Prior Probabili
No ratings yet
From Dummy Regression To Prior Probabili
8 pages
Recall, Precision
No ratings yet
Recall, Precision
7 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Using Reinforcement Learning To Select An Optimal
No ratings yet
Using Reinforcement Learning To Select An Optimal
11 pages
JIFS233511
No ratings yet
JIFS233511
22 pages
ml2 PDF
No ratings yet
ml2 PDF
5 pages
CH 4
No ratings yet
CH 4
21 pages
My Project 1 AI
No ratings yet
My Project 1 AI
3 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Feature Selection Engineering
No ratings yet
Feature Selection Engineering
72 pages
Machine Learning Project 1
No ratings yet
Machine Learning Project 1
19 pages
Lua Chon Dac Trung
No ratings yet
Lua Chon Dac Trung
18 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Statistical Models and Methods for Reliability and Survival Analysis
From Everand
Statistical Models and Methods for Reliability and Survival Analysis
Vincent Couallier
No ratings yet
Microstructural Characterization of Materials
From Everand
Microstructural Characterization of Materials
David Brandon
No ratings yet
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
An Introduction To GeoPandas. Everything You Need To Get You Started - by Mark Friese - Aug, 2022 - Medium
No ratings yet
An Introduction To GeoPandas. Everything You Need To Get You Started - by Mark Friese - Aug, 2022 - Medium
9 pages
Raj Lab Basic Adobe Illustrator (CC) Guide
No ratings yet
Raj Lab Basic Adobe Illustrator (CC) Guide
28 pages
Xtremax Company Profile 2009
100% (2)
Xtremax Company Profile 2009
13 pages
Grandstream Catalogo 2024
No ratings yet
Grandstream Catalogo 2024
12 pages
Module 5 - Computer Basics
No ratings yet
Module 5 - Computer Basics
64 pages
Central Station EX PSC-A128EX3 Installation Manual
No ratings yet
Central Station EX PSC-A128EX3 Installation Manual
96 pages
Understanding DNS Records
No ratings yet
Understanding DNS Records
4 pages
Changing Colors Tutorial
No ratings yet
Changing Colors Tutorial
6 pages
More About Spreadsheet Errors and Fixes
100% (1)
More About Spreadsheet Errors and Fixes
3 pages
Chapter 7 Computer-Assisted Audit Tools and Techniques
100% (1)
Chapter 7 Computer-Assisted Audit Tools and Techniques
14 pages
‏لقطة شاشة 2024-01-14 في 3.44.33 ص
No ratings yet
‏لقطة شاشة 2024-01-14 في 3.44.33 ص
36 pages
CC 2023
No ratings yet
CC 2023
2 pages
Copy of DP Research Report - Sources Guided
No ratings yet
Copy of DP Research Report - Sources Guided
6 pages
Fast Python High Performance Techniques For Large Datasets Meap V10 All 10 Chapters Tiago Rodrigues Antao PDF Download
No ratings yet
Fast Python High Performance Techniques For Large Datasets Meap V10 All 10 Chapters Tiago Rodrigues Antao PDF Download
89 pages
Datasheet 708E Series v1 02 20221205
No ratings yet
Datasheet 708E Series v1 02 20221205
3 pages
INFORMATION SECURITY - Information Security Unit 1-5
No ratings yet
INFORMATION SECURITY - Information Security Unit 1-5
35 pages
Udemy Strategic Plan
No ratings yet
Udemy Strategic Plan
27 pages
GRD - 8 - RECORDING - 2019 - ORACLE SENIOR SECONDARY SCHOOL (NPC) Nov2019
No ratings yet
GRD - 8 - RECORDING - 2019 - ORACLE SENIOR SECONDARY SCHOOL (NPC) Nov2019
46 pages
MSA 4th Edition
No ratings yet
MSA 4th Edition
54 pages
CS 20 - Discrete Structure 2
No ratings yet
CS 20 - Discrete Structure 2
12 pages
Etech Q1 Lesson 2
No ratings yet
Etech Q1 Lesson 2
43 pages
Auto-Sensing 3G/HD/SD Multiplexer With Up To 8 AES Inputs: Features
No ratings yet
Auto-Sensing 3G/HD/SD Multiplexer With Up To 8 AES Inputs: Features
5 pages
21-2 p4 Canopen Gateway For Truck Bodybuilders Iveco Holger Zeltwanger Cia
No ratings yet
21-2 p4 Canopen Gateway For Truck Bodybuilders Iveco Holger Zeltwanger Cia
5 pages
Real Time Emotion Detection
No ratings yet
Real Time Emotion Detection
10 pages
F-Sim-Ldr Arinc 615A SDK User's Manual
No ratings yet
F-Sim-Ldr Arinc 615A SDK User's Manual
49 pages
Course Outline 2024
100% (1)
Course Outline 2024
18 pages
Fire Alarm System - Notifier PDF
No ratings yet
Fire Alarm System - Notifier PDF
19 pages
NW Lab
No ratings yet
NW Lab
10 pages

Practical Aplication 2

Uploaded by

Practical Aplication 2

Uploaded by

Practical application 2: probabilistic supervised classification

Álvaro Garcı́a Barragán

• Evaluate the performance of probabilistic classification algorithms, for binary classification.

2.1 Evaluation problem

2.2 Feature Subset Selection

Algorithm 1 Wrapper Method for Feature Selection

2.3 Model Selection

Algorithm 2 Pipeline to evaluate models

Initialize f 1 score and brie score lists

for each train-test split in K-Fold Cross-Validator do

− N is the number of samples.

2.7 Model Visualization and Interpretability

Figure 1: Interpretability Logistic Regression

Figure 3: Interpretability GNB with Feature Billirubin

4.1 Feature Subset Selection

4.2 Models Performance

4.3 Models Interpretability

You might also like