Team 2
Team 2
1088/2057-1976/ad5db2
PAPER
Keywords: retinal disorders, support vector machine, K-nearest neighbor, decision tree, ensemble model, optical coherence tomography
Abstract
The prevalence of vision impairment is increasing at an alarming rate. The goal of the study was to
create an automated method that uses optical coherence tomography (OCT) to classify retinal
disorders into four categories: choroidal neovascularization, diabetic macular edema, drusen, and
normal cases. This study proposed a new framework that combines machine learning and deep
learning-based techniques. The utilized classifiers were support vector machine (SVM), K-nearest
neighbor (K-NN), decision tree (DT), and ensemble model (EM). A feature extractor, the InceptionV3
convolutional neural network, was also employed. The performance of the models was evaluated
against nine criteria using a dataset of 18000 OCT images. For the SVM, K-NN, DT, and EM classifiers,
the analysis exhibited state-of-the-art performance, with classification accuracies of 99.43%, 99.54%,
97.98%, and 99.31%, respectively. A promising methodology has been introduced for the automatic
identification and classification of retinal disorders, leading to reduced human error and saved time.
This accumulation of fluid causes swelling and thick- the classification of retinal abnormalities based on the
ening of the macula, leading to vision impairment. In technique used and a set of criteria. Section 4 high-
OCT images, DME manifests as cystoid spaces or lights and discusses the research outcomes and creates
pockets of fluid within the retina, resulting in a benchmark based on the pertinent studies. Section 5
increased retinal thickness and loss of normal retinal concludes the paper and discusses future research.
architecture.
Drusen are small, yellowish deposits that accumu-
late beneath the retina and are often associated with Related works
age-related macular degeneration (AMD). These
deposits are composed of lipids, proteins, and cellular Numerous studies have proposed diverse approaches
debris and can interfere with normal retinal function, to address this underlying challenge through the
leading to vision loss. In OCT images, drusen appear implementation of various artificial intelligence tech-
as distinct, hyper-reflective deposits between the ret- niques (Saleh and Salaheldin 2022). Hussain et al
inal pigment epithelium and Bruch’s membrane, with (Hussain et al 2018), for instance, introduced a method
varying sizes and shapes depending on the stage and for classifying OCT images into three categories:
severity of AMD. normal, age-related macular degradation (AMD), and
The question that arises here is whether combin- diabetic macular edema (DME). Employing the ran-
ing machine learning and deep learning improves the dom forest (RF) algorithm and validation through a
detection and classification of retinal disorders. Addi- 15-fold cross-validation method, their model achieved
tionally, whether introducing a deep learning algo- an accuracy of 95% and an impressive area under the
rithm enhances the detection and/or classification curve (AUC) of 0.99. Alsaih et al (Alsaih et al 2017)
results is unclear. According to the related works men- presented a study focusing on the classification of
tioned, Inception V3 is rarely considered. Therefore, spectral domain OCT images into DME and normal
this study proposed answering two questions, con- classes. Employing three distinct classifiers—linear
sidering Inception-v3 as the second question. There- support vector machine (SVM), RF, and kernel SVM
fore, the aim of this study was to propose a hybrid —alongside principal component analysis (PCA) for
model for the classification of retinal disorders, speci- dimensional reduction, their most effective model,
fically CNV, DME, Drusen, and normal categories. linear SVM with PCA, demonstrated a sensitivity and
This approach introduces an innovative combination specificity of 87.5%.
of machine learning and deep learning methods. Four In yet another contribution, Abdulrahman and
potential classifiers are considered for addressing the Khatib (Abdulrahman and Khatib 2020) designed an
highlighted challenge. This study proposed an auto- algorithm to classify OCT images into four categories:
mated diagnostic tool that holds promise for facilitat- CNV, DME, Drusen, and normal. Utilizing a genetic
ing early recognition and diagnosis of retinal algorithm for feature extraction and an SVM-based
abnormalities by ophthalmologists. In doing so, this classifier, their model exhibited an accuracy of 90.65%
solution has the potential to significantly mitigate through local and global feature extraction. The pur-
human error, saving both time and effort. The study’s suit of accurate classification techniques continued
contributions are multifaceted, encompassing several with Srinivasan et al (Srinivasan et al 2014), who pro-
significant advancements. These include: (i) the devel- posed a method for categorizing AMD, DME, and
opment of an automated detection system that inte- normal classes. Employing a one-versus-one SVM
grates both machine learning and deep learning classifier and cross-validation, their model achieved
algorithms for enhanced diagnostic accuracy; (ii) the the identification of 100% of AMD patients, 100% of
introduction of the InceptionV3 deep learning algo- DME patients, and 86.67% of normal individuals.
rithm to effectively address the classification problem Liu et al (Liu et al 2011) introduced a novel compu-
at hand; (iii) the establishment of new performance terized approach for diagnosing macular pathologies.
evaluation criteria to provide a more comprehensive Their model categorized spectral domain OCT images
assessment of the model’s effectiveness; (iv) the imple- into three groups: normal macula, macular disease,
mentation and comparative analysis of four distinct and glaucoma. Leveraging a nonlinear SVM, their
machine learning methods, leading to the identifica- model demonstrated robust performance, as evi-
tion of the most optimal approach; (v) the demonstra- denced by high cross-validation AUC values on both
tion of superior performance outcomes, notably in the development and testing datasets. In a similar vein,
terms of accuracy and sensitivity, surpassing existing Dash and Sigappi (Dash and Sigappi 2018) developed a
methodologies; and (vi) the proposal of a user-friendly model for DME classification utilizing OCT images.
platform designed to assist ophthalmologists, thereby This study utilized two techniques, local binary pat-
reducing human error and optimizing time manage- tern (LBP) with RF and scale-invariant feature trans-
ment in clinical practice. form (SIFT) with RF, which achieved accuracy rates of
The following is an outline of the article’s struc- 100% and 88%, respectively.
ture. Section 2 discusses the materials and methods In that work (Saleh et al 2022a, 2022b), the authors
used. Section 3 depicts the study’s findings, defining devised a hybrid strategy to categorize retinal
2
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
disorders. The classification of CNV, DME, drusen, there is a pressing need for further research to develop
and normal was carried out by amalgamating the interpretable, comprehensive, and generalizable AI
SqueezeNet CNN for feature extraction and employ- models capable of accurately classifying a wide spec-
ing diverse classifiers. This approach achieved 97.47% trum of retinal disorders while ensuring clinical rele-
classification accuracy with the K-NN classifier. Khan vance and practical utility.
et al (Khan et al 2023) introduced a deep learning
model tailored for the automatic identification of dis-
tinct retinal disorders, encompassing age-related
Materials and methods
macular degeneration, branch retinal vein occlusion,
The present methodology aims to categorize retinal
central retinal vein occlusion, central serious chorior-
disorders across four distinct classes: CNV, DME,
etinopathy, and diabetic macular edema. Their model
Drusen, and the normal state. This endeavor employs
entails three key stages: initial feature extraction by
both machine learning-based and deep learning-based
training pretrained models—DenseNet-201, Incep-
approaches. To ascertain the optimal approach, four
tionV3, and ResNet-50; subsequent enhancement of
machine learning-based methods, namely, SVM,
features through ant colony optimization; and even-
K-NN, DT, and EM, are evaluated. The selection of
tual classification training using the K-NN classifier.
optimal hyperparameters is facilitated through Baye-
Notably, the proposed model achieved a remarkable
sian optimization, thus streamlining the optimization
accuracy of 99.1%.
process. For feature extraction rooted in deep learning,
Cen et al (Cen et al 2021) harnessed the power of
Inception V3 is harnessed. A schematic representation
CNN techniques for the detection of a diverse spectrum
of the proposed methodology is illustrated in figure 1,
of 39 distinct retinal diseases from fundus photographs.
depicting the sequential stages of significance detailed
Following this, the authors employed a heat map to pin-
in the subsequent subsections.
point the regions contributing to the diagnostic process.
The performance of their system was meticulously
assessed through various evaluation criteria, including Data preparation
an impressive area under the curve (AUC) score of The dataset employed in this study originates from
0.9984, a sensitivity rate of 97.8%, and an exceptional Kermany et al (Kermany et al 2018) and spans the
specificity rate of 99.6%. In another study, Farazdaghi timeframe from July 1st, 2013, to March 1st, 2017. The
et al (Farazdaghi et al 2021) introduced a strategy for dis- authors implemented the preprocessing methodology
tinguishing between papilledema and pseudo papille- advocated by the study (Saleh et al 2022a, 2022b). An
dema. Their study incorporated both B-scan ultrasound example of the targeted classes for the classification
(US) images and OCT images. Through the extraction process is shown in figure 2. This procedure entails a
of Retinal Nerve Fiber Layer (RNFL) thickness from the sequence of stages, commencing with the enhancement
OCT images, medical experts were able to make disease of image contrast, followed by the application of
diagnoses without the need for a Computer-Aided anisotropic diffusion filtration (Perona and Malik 1990),
Detection (CAD) model. In evaluating the B-scan ultra- prioritizing high-contrast edges over their low-contrast
sound images, the study achieved an 86% sensitivity and counterparts. This technique offers the advantage of
an 88% specificity. In the context of RNFL thickness noise reduction without significant loss of image
measurement, the reported sensitivity was 83%, accom- content. Notably, edges, lines, and other distinctive
panied by a specificity of 76%. features are crucial for comprehending OCT images in
He et al's study (He et al 2023) employed an inter- their entirety, ultimately facilitating the creation of
pretable Swin poly transformer for retinal disorders multidimensional images tailored to match the input
classification but also demonstrates remarkable per- layer dimensions of the CNN. An illustrative instance of
formance across various evaluation criteria. Song et al a processed image is illustrated in figure 3.
(Song et al 2021) proposed deep learning model for
retinal disorders detection including glaucoma using Feature extraction
deep transformer mechanism which attained 88.3% A method for automated feature extraction is
classification accuracy. employed, leveraging the transfer learning capabilities
Despite the significant advancements in utilizing of the InceptionV3 CNN, as recommended by the
artificial intelligence (AI) techniques for the classifica- authors of that work (Saleh et al 2022a, 2022b). Speci-
tion of retinal disorders in OCT images, there still fically, the network operates as a feature extractor,
exists a notable research gap in the development of concentrating on the global average pooling layer.
models capable of handling a broader range of retinal With each image, this network yields a substantial
pathologies with high accuracy and robustness. Fur- count of 2,048 features tailored to the context. The
thermore, there remains a lack of comprehensive feature map of the training set consists of 12,240
comparative analyses among different AI methodolo- sample for each sample there are 2,048 features as
gies, making it challenging to identify the most effec- shown in the illustrated table 1. However, due to the
tive approach for retinal disorder classification. Thus, high volume of features generated, a thoughtful
3
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
Results
4
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
Figure 3. (A) sample of the CNV class (a) before processing and (b) after processing.
Table 1. An illustrative example for the extracted feature map. false-positive rate, false-negative rate, negative pre-
dictive value, and F1-score, using equations (5)–(9)
Sample . . . . .
F1 F2 F3 FN (Taylor 1997, Salaheldin et al 2022).
S1 0.13 0.47 0.52 . . . . . 0.33 FP + FN
−0.8
Error rate = (5)
S2 0.09 0.44 . . . . . 0.36 TP + FN + FP + TN
S3 0.20 0.50 0.63 . . . . . −0.3
FP
. . . . . . . . . . False Positive Rate = (6)
. . . . . . . . . . FP + TN
SM 0.09 0.82 −0.4 . . . . . 0.96 FN
False Negative Rate = (7)
FN + TP
TN
Salaheldin et al 2022, Salaheldin et al 2024a). Negative Predictive Value = (8)
TN + FN
TP + TN 2 ´ Sensitivity ´ Precision
Accuracy = (1) F1 - Score = (9)
TP + FN + FP + TN Sensitivity + Precision
TP
Sensitivity = (2) The total performance of the model for each classi-
TP + FN
fier is determined considering the nine criteria, as
TN
Specificity = (3) shown in table 4. The classification accuracy, sensitiv-
TN + FP ity, specificity, and precision were calculated for all the
TP classes for each classifier. In addition, another evalua-
Precision = (4)
TP + FP tion measure has been considered by calculating AUC
from the receiver operating characteristic (ROC) curve
The system demonstrated remarkable classifica-
as shown in figures 5–8.
tion performance, achieving overarching accuracies of
To ensure the reliability and robustness of the pro-
99.43% for SVM, 99.56% for K-NN, 97.58% for DT,
posed models, thorough testing was carried out using
and 99.31% for EM classifiers. SVMs demonstrated
a variety of publicly available datasets. Following a
consistent performance for all the classes, with the
review of the literature, an externally validated dataset
CNV class exhibiting an exceptional accuracy of
(Sotoudeh-Paima et al 2021) was selected for this pur-
99.67%, along with noteworthy specificity and preci-
pose. The dataset was meticulously organized into pri-
sion values of 99.75% and 99.26%, respectively. K-NN
mary classes relevant to the targeted issue, including
has exceptional accuracy and is prominent in the CNV
CNV, Drusen, and normal cases, with respective dis-
class, with an impressive sensitivity of 99.78%, specifi-
tributions of 161 CNV cases, 160 drusen cases, and 120
city of 99.80%, and precision of 99.41%. DTs maintain
normal cases. The testing sample comprised a
consistent performance, as they experience slight
balanced selection of 1200 OCT images across these
decreases in accuracy and precision in the Drusen and
specified classes. The comprehensive performance
normal classes.
results of the proposed models were obtained through
EM demonstrated consistent accuracy levels, with
this rigorous validation process and the balanced dis-
the CNV class showing noteworthy specificity (99.65%)
tribution of the dataset, contributing to a comprehen-
and precision (98.96%). Collectively, the classifiers
sive understanding of the models’ capabilities and
demonstrated favorable performance, with the CNV
their potential for generalizability across various sce-
class consistently attaining the highest accuracy, under-
narios. The obtained results are shown in table 5.
scoring its pivotal role in the classification process. The
variation in performance metrics across classifiers
underscores the need for a comprehensive evaluation Discussion
approach encompassing diverse criteria to holistically
judge the efficacy of these methods. The total perfor- This research introduces an innovative model for
mance evaluation of each classifier is determined by diagnosing and categorizing prevalent ophthalmologi-
adding the other five criteria, which are the error rate, cal disorders: CNV, DME, Drusen, and normal cases.
5
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
SVM K-NN
DT EM
Employing SVM, K-NN, DT, and EM as classifiers, Delving into class-specific performance, the CNV
each model’s performance undergoes comprehensive class emerges as a standout, boasting the highest acc-
evaluation across nine criteria. The criteria were uracy across all classifiers. Remarkably, this class con-
accuracy, sensitivity, specificity, precision, error rate, sistently commands the best sensitivity across
false positive rate, false negative rate, and F1 score. classifiers, while the DME class tends to yield the high-
Hyperparameter tuning employs the Bayesian optim- est specificity values, except for CNV. The CNV class
ization algorithm, with validation encompassing 3060 also attained the highest precision, confirming its
OCT images and testing involving 2700 OCT images, superiority over the other classes. This dominance is
categorized into four classes. attributed to the distinct hyperreflective appearance of
The proposed model adopts a cascaded approach, CNV on OCT images, which distinguishes it from its
comprising two distinct phases aimed at optimizing counterparts.
performance. Initially, feature extraction is executed After analyzing the overall performance, the
utilizing the sophisticated InceptionV3 CNN. This K-NN classifier prevailed across all nine criteria, as
stage is pivotal for discerning intricate patterns and indicated in table 3. Moreover, table 4 shows the
subtle nuances within retinal images. Subsequently, results of a comparative analysis with related studies,
the extracted features undergo classification via four encompassing factors such as classifier selection, sam-
diverse classifiers, further refining the identification ple size, accuracy, sensitivity, and specificity. Impress-
process. The amalgamation of these phases yields a ively, the proposed models exhibit elevated accuracy
model of remarkable efficacy, significantly bolstering levels compared to those of cited works (Liu et al 2011,
the detection capabilities for targeted retinal disorders. Srinivasan et al 2014, Alsaih et al 2017, Dash and
Notably, the proposed DT model not only demon- Sigappi 2018, Hussain et al 2018), with SVM notably
strates exceptional accuracy in classification but also surpassing its counterparts. Furthermore, in contrast
boasts an impressive reduction in training time. This to the findings of Srinivasan et al (Srinivasan et al
dual feat not only underscores its reliability but also 2014), our SVM model achieves a 4% improvement in
positions it as a valuable assistive tool for expedited accuracy. This is due to the use of histograms of orien-
and accurate diagnosis across a spectrum of retinal ted gradients, as they use a sliding window technique
pathologies. to extract features from every pixel of an image.
6
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
Criteria
Hence, the accuracy of these methods is not highly Table 4. Overall performance evaluation matrix per classifier.
reliable compared to that of CNNs. Criteria SVM K-NN DT EM
In addition to its comprehensive approach, this
study expands the horizons of related research. While Classification 99.43% 99.56% 97.98% 99.31%
accuracy
existing studies primarily deploy SVMs and RFs with
Sensitivity 98.85% 99.11% 95.96% 98.63%
limited sample sizes, our work introduces four classi- Specificity 99.62% 99.70% 98.65% 99.54%
fiers—three novel classifiers—and leverages a sub- Precision 98.71% 98.93% 95.15% 98.45%
stantial number of 2700 OCT images. Notably, with a Error Rate 0.57% 0.44% 2.02% 0.69%
four-class setting, we achieved remarkable classifica- False Positive Rate 0.38% 0.30% 1.35% 0.46%
tion accuracies of 99.43%, 99.54%, 97.98%, and False Negative Rate 1.15% 0.89% 4.04% 1.37%
Negative Predictive 99.62% 99.70% 98.66% 99.54%
99.31%. Compared to Abdulrahman & Khatib.
Value
(Abdulrahman and Khatib 2020) and Liu et al (Liu
F1-Score 98.78% 99.02% 95.55% 98.54%
et al 2011), who achieved 90.65% and 90.5%, respec-
tively, our models shine brighter. Obviously, the large
number of OCT images obtained in the proposed Table 5. Summary of the external Validation results.
study compared with related studies positively impacts
the results. Criteria SVM K-NN DT EM
Additionally, the study introduces five evaluation Classification 94.46% 94.58% 93.08% 94.34%
criteria that have not yet been employed in relevant accuracy
works. The criteria were the error rate, false positive Sensitivity 93.91% 94.15% 91.16% 93.70%
rate, false negative rate, negative predictive value, and Specificity 94.64% 94.72% 93.72% 94.56%
1-score. Almost all of the related works use only accur- Precision 93.77% 93.98% 90.39% 93.53%
Error Rate 0.54% 0.42% 1.92% 0.66%
acy, sensitivity, specificity, and precision. In another
False Positive Rate 0.36% 0.29% 1.28% 0.44%
context, while Dash and Sigappi achieve 100% accur- False Negative Rate 1.09% 0.85% 3.84% 1.30%
acy using LBP, their scope is limited to two classes and Negative Predictive 94.64% 94.72% 93.73% 94.56%
only 40 OCT images (Dash and Sigappi 2018). This Value
study, however, demonstrates a pioneering leap in F1-Score 93.84% 94.07% 90.77% 93.61%
complexity and accuracy across a multiclass scenario,
demonstrating significant progress in retinal disorder
classification. prioritizing high-level attributes that significantly con-
Through meticulous comparison with pertinent tribute to the classification process.
studies that adopted hybrid techniques for classifica- According to table 6, Model (1), in which the SVM
tion, our proposed models notably outperform the was implemented as a classifier, attained superior eva-
results attained in the study (Saleh et al 2022a, 2022b). luation metrics, such as accuracy and sensitivity, in
This achievement can be attributed to the enhanced comparison to those of Khan et al's (Khan et al 2023)
complexity and the advanced utilization of the Incep- study. Additionally, it yields a greater number of tested
tionV3 model as a feature extractor. This facet empow- samples. It is important to note that the increased
ers the extraction of an extensive array of features, sample size might influence the outcome, warranting
7
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
8
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
Tested
Study Classes Classifier samples Accuracy Sensitivity Specificity
further investigation into the interplay between sam- four models achieved significant results when com-
ple size and performance. pared to those of relevant studies. According to the
Moreover, expanding the research to include multi- results, the K-NN model achieved the best classifica-
modal data (De Fauw et al 2018, Yoo et al 2019), such as tion accuracy of 99.56%. This study introduces nine
combining OCT images with fundus photography or performance evaluation criteria, some of which were
integrating patient demographic and clinical data, could not used for this purpose. This study provides a novel
enhance the model’s diagnostic capabilities. This holistic platform to assist ophthalmologists in the diagnosis
approach would allow for a more comprehensive under- and classification of retinal disease, thereby reducing
standing of retinal disorders and improve the robustness effort and time. Future research directions for this
of the AI models in real-world clinical settings. study are abundant and varied, extending its applic-
Studying diagnostic techniques with less data is ability to several areas of interest. One potential
crucial for developing robust and efficient AI models application is in the field of neuro-ophthalmic dis-
that can operate effectively even when large datasets eases, such as Papilledema and Pseudo-Papilledema.
are not available (Yoo et al 2021). This approach By adapting the proposed AI models, we can explore
emphasizes the importance of data augmentation, their effectiveness in detecting and diagnosing these
transfer learning, and few-shot learning methods to conditions, which are characterized by optic disc
enhance model performance. By focusing on techni- swelling and can lead to significant vision impairment
ques that maximize the utility of limited data, if not properly managed. This extension would involve
researchers can create diagnostic tools that are more training the models on datasets specific to neuro-
accessible and practical for real-world applications, ophthalmic disorders and validating their perfor-
particularly in resource-constrained settings where mance to ensure accuracy and reliability. Additionally,
acquiring extensive datasets may be challenging. This the study can benefit from experimenting with a wider
can lead to more adaptable and scalable solutions for array of classifiers on the same dataset to benchmark
diagnosing various medical conditions. the results. This comparative analysis would provide
valuable insights into the strengths and weaknesses of
different classifiers, guiding the development of more
Conclusion sophisticated and precise diagnostic tools.
9
Biomed. Phys. Eng. Express 10 (2024) 055005 A M Salaheldin et al
10