3 Selecting The Best Machine Learning Algorithm To Support The Diagnosis
3 Selecting The Best Machine Learning Algorithm To Support The Diagnosis
Summary:
This study aimed to find the best machine learning algorithm to support the diagnosis of non-
alcoholic fatty liver disease (NAFLD) using easily available clinical and laboratory parameters.
The researchers compared eight different machine learning algorithms (Boosting Tree Classifier,
Decision Tree Classifier, Naive Bayes Classifier, K-Nearest Neighbors Classifier, Neural
Network Classifier, Random Forest Classifier, Regularized Multinomial Classifier, and Support
Vector Machine Classifier) using three different predictive models: 1) Fatty Liver Index (FLI)
plus glucose, age, and sex; 2) Abdominal Volume Index (AVI) plus glucose, gamma-glutamyl
transferase (GGT), age, and sex; and 3) Body Roundness Index (BRI) plus glucose, GGT, age,
and sex.
The study included 2,970 subjects, with 2,920 in the training set and 50 randomly selected for
the test phase. The researchers compared the algorithms' accuracy, variance, model weight, and
the number of ultrasound examinations that could be avoided.
The results showed that the Support Vector Machine (SVM) algorithm performed best across all
three models, with the lowest variance and highest model weight, despite not having the highest
accuracy percentage. Specifically, the SVM algorithm using the AVI plus glucose, GGT, age,
and sex model had 68% accuracy, 1% variance, and a model weight of 32.62%, and could
potentially avoid 81.9% of unnecessary ultrasound examinations.
Key Findings:
1. The Support Vector Machine (SVM) algorithm emerged as the best machine learning
algorithm for predicting NAFLD using easily available clinical and laboratory
parameters. Although it did not have the highest accuracy percentage in all three models,
it had the lowest variance and highest model weight, indicating better performance and
reliability.
2. The model composed of Abdominal Volume Index (AVI) plus glucose, gamma-glutamyl
transferase (GGT), age, and sex, when used with the SVM algorithm, performed the best
overall. This model had 68% accuracy, 1% variance, and a model weight of 32.62%,
which was the highest among the three models tested.
3. The SVM algorithm with the AVI plus glucose, GGT, age, and sex model had the
potential to avoid 81.9% of unnecessary ultrasound examinations. This finding is
significant because it could reduce healthcare costs and waiting times associated with
ultrasound scans for NAFLD diagnosis.
4. The study showed that machine learning algorithms, particularly the SVM algorithm, can
be an effective tool for supporting NAFLD diagnosis using readily available clinical and
laboratory data. This approach could be valuable in epidemiological studies and
screening programs, where cost-effectiveness and efficiency are important
considerations.
5. The researchers found that the SVM algorithm made fewer prediction errors in the test
phase compared to other algorithms, even when the accuracy percentages were similar.
This finding highlights the importance of considering not only accuracy but also variance
and model weight when evaluating machine learning algorithms.
6. The study demonstrated the potential of using a meta-learner approach to compare and
select the best machine learning algorithm for a specific task. This approach allowed the
researchers to evaluate the performance of various algorithms across different models and
choose the most suitable one based on predefined criteria.
Limitations:
1. The sample size was relatively small, with only 2,970 subjects included in the study, and
a test set of 50 randomly selected individuals. A larger sample size could have provided
more robust results and allowed for better generalization of the findings.
2. The diagnosis of NAFLD was performed using ultrasound scans, which have limited
sensitivity in detecting mild cases of fatty liver disease. More accurate diagnostic
methods, such as magnetic resonance imaging (MRI) or liver biopsy, were not used due
to ethical considerations and the population-based nature of the study.
3. The study was conducted in a specific geographical region (district of Bari, Apulian
Region, Italy), which may limit the generalizability of the results to other populations
with different demographic and clinical characteristics.
4. The study focused on identifying the best algorithm and model for predicting NAFLD
based on readily available parameters. However, it did not explore the potential
integration of these algorithms into clinical decision support systems or their impact on
patient outcomes.
5. The researchers did not investigate the potential impact of different feature selection
methods or data preprocessing techniques on the performance of the machine learning
algorithms, which could have influenced the results.
In conclusion, while the study provides valuable insights into the use of machine learning
algorithms for NAFLD diagnosis, further research with larger and more diverse populations, as
well as more rigorous diagnostic methods, is necessary to validate and extend these findings.