Automated Deep Learning in Ophthalmology AI That.4
Automated Deep Learning in Ophthalmology AI That.4
CURRENT
OPINION Automated deep learning in ophthalmology: AI that
can build AI
Ciara O’Byrne a,b, Abdallah Abbas a,c, Edward Korot a,d,
and Pearse A. Keane a,e
Purpose of review
The purpose of this review is to describe the current status of automated deep learning in healthcare and to
explore and detail the development of these models using commercially available platforms. We highlight
key studies demonstrating the effectiveness of this technique and discuss current challenges and future
directions of automated deep learning.
Recent findings
There are several commercially available automated deep learning platforms. Although specific features
differ between platforms, they utilise the common approach of supervised learning. Ophthalmology is an
exemplar speciality in the area, with a number of recent proof-of-concept studies exploring classification of
retinal fundus photographs, optical coherence tomography images and indocyanine green angiography
images. Automated deep learning has also demonstrated impressive results in other specialities such as
dermatology, radiology and histopathology.
Summary
Automated deep learning allows users without coding expertise to develop deep learning algorithms. It is
rapidly establishing itself as a valuable tool for those with limited technical experience. Despite residual
challenges, it offers considerable potential in the future of patient management, clinical research and
medical education.
Video abstract
http://links.lww.com/COOP/A44
Keywords
artificial medical intelligence, automated deep learning, code-free deep learning, deep learning
1040-8738 Copyright ß 2021 Wolters Kluwer Health, Inc. All rights reserved. www.co-ophthalmology.com 407
FIGURE 1. The development process of an automated deep learning image classifier model.
necessary. Metrics are also supplied alerting the user AUTOMATED DEEP LEARNING IN THE
to the distribution of images per label. The dataset LITERATURE
can then be either manually or automatically split
into three parts. The training set receives approxi- Ophthalmology
mately 60–80% of the images and is important for In 2019, our group published one of the earliest
network parameter selection. The remainder are demonstrations of automated deep learning for
&&
divided between a validation set, used to optimise medical imaging classification [6 ]. Two clinicians
the model parameters, and a held-out independent with no coding expertise developed automated deep
test set, which ultimately assesses the model perfor- learning models using five publicly available data-
mance. sets of retinal fundus images, optical coherence
Once the user is satisfied with the uploaded tomography (OCT) images, dermatological skin
dataset, the model may be trained. Following the lesion images and chest x-ray images. The models
training process, detailed statistics for the model were developed using Google AutoML Vision. Sen-
performance are provided, which vary between plat- sitivity (recall), specificity, positive predictive value
forms. Confidence threshold is provided by Ama- (precision) and area under the precision recall curve
zon, Clarifai, Google and Microsoft. This can be were used to evaluate model discriminative perfor-
altered to generate new precision and recalls by all mance and diagnostic properties. Aside from the
of the above except for Amazon. Confusion matrices multilabel model trained using one of the chest
(Apple, Clarifai, Google and MedicMind) allow the x-ray datasets, we were able to demonstrate compa-
user to visualise the true positives and false negatives rable accuracy to state-of-the-art bespoke deep learn-
and are essential for model evaluation. Precision ing systems. Similarly, Kim and colleagues used
recall curves are provided by Amazon, Clarifai and Google AutoML Vision to train two models in the
&
Google. MedicMind is the only automated deep classification of pachychoroid disease [11 ]. A data-
learning platform examined that offers saliency set of 783 ultra-widefield indocyanine green (ICG)
maps, an approach being explored within the sub- angiography images was curated, labelled by two
field of explainable AI [15]. Although both Medi- retina specialists. The model performance was
cMind and Google offer external validation, Google assessed using precision and recall, with accuracy
is the only platform familiar to the authors that levels then compared against both ophthalmic res-
allow for external validation via batch prediction, idents and retinal specialists. The authors reported
allowing predictions to be generated on a large that their second model demonstrated better preci-
external dataset efficiently. Download of the final sion and accuracy than the retina specialists with
model is facilitated by Google and Microsoft plat- comparable recall and specificity. In comparison
forms. to the ophthalmic residents, the second model
demonstrated inferior recall and specificity but ML models are limited to the local computer, Goo-
greater precision and accuracy. gle AutoML Vision utilises Google Cloud resulting in
More recently, our group published a comprehen- computing fees.
sive performance and feature set review of six com-
mercially available platforms using four open-source
ophthalmic imaging datasets, including two retinal Tabular data
fundus photograph datasets and two OCT datasets Current automated deep learning-based ophthal-
&&
[10 ]. Twenty four automated deep learning models mology research has focused on interpreting fundus
were trained by clinicians with none to limited coding photographs, ICG angiography and OCT scans
&& &
expertise, and the specific features and performance of [10 ,11 ,22,23,24]. Structured data, based on a tab-
each application programming interface were evalu- ular format of columns and rows, represents an
ated. Notably, only Amazon, Apple, Google and Micro- additional rich source of information relating to
soft had the ability to process large imaging datasets patient histories, diagnoses and prognoses. The
and of these, Apple’s performance was considerably potential benefit of such data within ophthalmol-
worse than Amazon. We postulate that this may be due ogy research is exemplified by projects such as the
to Apple Create ML running locally, rather than utilis- Intelligent Research in Sight (IRIS) Registry. This
ing large cloud computing resources. We also observed contains information from nearly 66 million
an improved performance with OCT classification patients [25]. Thus, the diversification of automated
models across all platforms in comparison to the fun- deep learning-based ophthalmology research to
dus photograph models. We suspect this may be due to include models that take advantage of structured
the increased dimensionality of colour fundus photo- inputs represents a significant step forward.
graphs. As we have previously highlighted, Google Though the current literature is scarce, initial
AutoML Vision is the only commercial deep learning models built using structured datasets have shown
platform allowing the user to carry out external vali- promise. A recent study by Antaki et al. demon-
dation via batch prediction. The caveat is that this strated that ophthalmologists with no program-
must be carried out using the command line interface, ming experience could use electronic health
thus requiring some degree of coding experience [16]. record data to build predictive models for prolifer-
ative vitreoretinopathy, using an interactive appli-
cation in MATLAB [26]. These code-free models
Other specialities achieved comparable F1 scores to manually coded
Automated deep learning has also been applied in a models built on the same datasets. Moreover, novel
number of other specialities. As discussed, the auto- tools specifically engineered for structured data
mated deep learning model we described in 2019 have now been developed, such as Google Cloud’s
demonstrated impressive results in the classification AutoML Tables. This platform enables clinicians to
of chest x-rays, with one model showing comparable build classification, regression and time-series
&&
performance to bespoke deep learning models [6 ]. machine learning models without needing to code.
More recently, chest x-ray classification has been Early work using this platform includes a model that
explored by other research groups using Microsoft predicts visual outcomes in patients receiving treat-
&
Custom Vision [7 ] and Google AutoML Vision [17]. ment for neovascular age-related macular degenera-
Google AutoML Vision has also been used to tion. This achieved an area under the receiver
develop image classifier models in histopathology operating characteristic curve of 0.892 [27]. To assist
[18], neuro-histopathology [8] and otolaryngology in scheduling, our group has trained a cataract
[19], whereas Wang et al. utilised the Google surgery time prediction model which predicts oper-
AutoML object detection tool to develop a system ating time with a mean absolute error of 5 min.
capable of identifying and risk stratifying high-risk Future work should aim to further examine the
mutations in thyroid nodules [20]. Borkowski and feasibility of such tools in comparison to conven-
colleagues performed a comparison between Google tional machine learning methods, emulating the
AutoML Vision versus Apple Create ML in a variety numerous comparative studies exploring auto-
&&
of different lung and colon diagnostic pathology mated deep learning for image classification [6 ].
scenarios [21]. The authors trained twelve deep
learning models in total (six on each platform) to
differentiate between a variety of lung and colon LIMITATIONS OF AUTOMATED DEEP
pathologies. Although the authors did not deter- LEARNING
mine any statistically significant differences in Automated deep learning is not a panacea. Though
terms of model performance between both plat- outside the scope of this article, there are barriers
forms, they observe that although Apple Create common to all AI applications in healthcare. These
1040-8738 Copyright ß 2021 Wolters Kluwer Health, Inc. All rights reserved. www.co-ophthalmology.com 409
include dataset curation, ethical and medicolegal effectiveness improves and ethical and governance
considerations, data governance and regulatory regulations are established. Clinicians are use-case
issues as well as patient and clinician acceptance experts, who are best suited to train models, speci-
of these systems. fied for patient-relevant endpoints. Consequently,
The ‘black box’ phenomena is well documented clinicians are best suited to apply the relevant labels
as a limitation in the implementation of artificial for model training. By allowing physicians to inde-
medical intelligence tools [28–30]. This is further pendently devise and develop deep learning models,
intensified by the inability to select or obtain infor- patient needs may be uniquely and efficiently
mation about the neural architecture framework addressed. Image recognition models may greatly
chosen for the model. Given the fact that minimal enhance screening programmes, particularly in
technical expertise is required for the development under-resourced areas [31]. Structured data
of these automated systems, it is imperative that approaches may prove useful in the prediction of
robust tools are developed to allow the clinician to patient outcomes, whereas natural language proc-
understand how the model has reached its decision. essing may alleviate the significant administrative
Discriminatory bias is another issue that must be burdens clinicians are faced with at present. Despite
highlighted to clinicians with limited deep learning these advantages, hospital management and clini-
experience when developing these models. Discrim- cians must be aware that the use of such models for
inatory bias describes the situation in which a model direct patient care would also be subject to the same
is selected to optimally represent the majority pop- clinical validation and regulatory requirements as
ulation. This may result in an inferior performance bespoke deep learning systems.
with under-represented groups. Although public
datasets represent a valuable resource, they may
be particularly prone to discriminatory bias depend- Clinical research
ing on how the dataset was collected and who the Automated deep learning may radically enhance the
deep learning model is being developed for. Clini- clinical research landscape. With the capacity to
cians must be aware of the perils associated with play a number of different roles within the research
overfitting and develop models with their target toolkit, it has the potential to both alleviate the
population in mind. External validation, with strain of laborious administrative tasks while also
datasets representing various real-world image identifying new patterns within data previously
acquisition environments with varying patient unknown to humans, leading the way towards clin-
demographics, remains key. ical trial selection, drug discovery and development.
There are also limitations specific to automated To first ascertain if there is sufficient signal to invest
deep learning platforms. These platforms do not in further custom model development via coding,
offer flexibility in selecting between model archi- automated deep learning models may be trained as a
tectures used to train the model. The evaluation proof of concept.
metrics vary between platforms, which can make
it difficult to accurately assess and compare the
performance of models between platforms. Many
Improved technology
platforms do not offer the facility to externally
validate the model. This is an essential step in the Although cloud computing has alleviated some of
process, and one which must be incorporated into a the challenges associated with deep learning, it still
system if it is to be considered for implementation. depends on high bandwidth, low latency and robust
Finally, there are costs associated with these com- privacy safeguards [32]. Further advances in mobile
mercial platforms, particularly those that are cloud- technology, such as 5G, may address these issues
based. Although automated deep learning is an through the use of automated deep learning systems
important step towards the democratisation of AI via local edge models (i.e., compact low power
in healthcare, it still presents financial burdens models which do not require a continuous internet
which may be challenging for small research groups connection to run) [33,34]. Combined with tele-
with minimal funding. medicine and wearable sensors, these models may
considerably improve the quality of healthcare in
under-resourced communities.
FUTURE DIRECTIONS
FIGURE 2. Overview of the potential applications of automated deep learning in medical education.
1040-8738 Copyright ß 2021 Wolters Kluwer Health, Inc. All rights reserved. www.co-ophthalmology.com 411