1 s2.0 S0010482523001142 Main

Computers in Biology and Medicine 155 (2023) 106649
Contents lists available at ScienceDirect
Computers in Biology and Medicine

journal homepage: www.elsevier.com/locate/compbiomed
Natural Language Processing in Electronic Health Records in relation to

healthcare decision-making: A systematic review
Elias Hossain a ,∗, Rajib Rana b , Niall Higgins c,d,e , Jeffrey Soar f , Prabal Datta Barua f , Anthony
R. Pisani g , Kathryn Turner d
a
School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh
b School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
c School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia
d School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
e
Metro North Mental Health, Herston QLD 4029, Australia
f
School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
g
Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
ARTICLE INFO ABSTRACT
Keywords: Background: Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic
Machine learning Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder
Electronic Health Records the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques
Medical natural language processing
are studied and compared to understand the limitations and opportunities in this space comprehensively.
Artificial intelligence in medicine
Methodology: After screening 261 articles from 11 databases, we included 127 papers for full-text review
Automated tools
State-of-the-art deep learning
covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text
summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6)
Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items
for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
Result and Discussion: EHR was the most commonly used data type among the selected articles, and the
datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification
being the most common application of ML or DL. The most common use cases were: the International
Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity
recognition (NER) for clinical descriptions and research on psychiatric disorders.
Conclusion: We find that the adopted ML models were not adequately assessed. In addition, the data
imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future
studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts,
perinatal self-harmed and ICD-9 classification.
1. Introduction notes are input as free text into EHRs, offering a complete picture of
the patient’s condition.
Electronic Health Records (EHRs), which are automated compila- The adoption of EHRs has increased rapidly around the world. In the
tions of health care activities and assessments, are increasingly preva- United States, it has increased dramatically from 10% to nearly 96% in
lent and essential for healthcare provision, administration, and re- just 10 years (2008–2017). In China, this increase is slightly more than
search [1]. The data found in EHRs can be both structured and un- 85% [3]. A similar trend has been observed in General Practices, large
structured [2]. Structured EHR data comprises heterogeneous sources hospitals, and health services in Australia [4,5].
in fixed numerical or categorical areas, such as diagnoses, prescriptions, With the increased adoption of EHRs, the volume of information
and laboratory values. On the other hand, Produced by healthcare can now be considered ‘‘Big Data’’, extending to the modification and
personnel, clinical documentation or note and discharge summaries application of substantial data accumulated in EHRs. However, the ca-
represent instances of unstructured data. Clinical documentation or pacity of human cognition to study, comprehend, and interpret data is
∗ Corresponding author.
E-mail address: [email protected] (E. Hossain).
https://doi.org/10.1016/j.compbiomed.2023.106649
Received 3 October 2022; Received in revised form 4 January 2023; Accepted 7 February 2023
Available online 10 February 2023
0010-4825/© 2023 Elsevier Ltd. All rights reserved.
E. Hossain et al. Computers in Biology and Medicine 155 (2023) 106649
constrained; therefore, there is a need to contrive computer-based tools ML and DL models with their feature extraction or word embed-
that can organise, evaluate, and recognise patterns within these data. ding and evaluation matrix; (3) various applications of NLP, including
The subsequent step is to convert all of these extensive healthcare data transformer model, applied in the EHRs; (4) commonly used data
into knowledge by implementing data mining and natural language types, clinical free text preprocessing pipeline and study settings; and
processing methods as essential components in data analytics on EHRs (5) existing automated ML-enabled tools used by health professionals
big data to aid the development of an EHR ecosystem. and healthcare industries. We further highlight the core challenges of
Most innovations for this growing number of unstructured free texts medical NLP, the trend of current research, and the shortcomings of
in the medical domain are based on novel Machine Learning (ML), the existing literature. We end this work by addressing our review’s
and Deep Learning (DL) techniques [6,7]. Numerous healthcare and findings and highlighting study shortcomings and future goals.
medical applications [8] include the detection of cardiovascular risk The remaining sections of the review article are structured as fol-
factors and heart conditions [9], the diagnosis [10] and prognosis of lows. We explain the literature search and selection strategy in Sec-
oral diseases [11], and the detection of cancer tumours from radiology tion 2.1. The techniques used for analysing EHR are illustrated in Sec-
images [12], have made extensive use of machine learning. Recently, tion 3. Discussions and research viewpoints are delineated in Sections 4
the concept of autoML, one of the types of ML integrated tool [13], and 5. Finally, the paper is concluded in Section 6.
has been presented as a way to expand the applications of ML al-
gorithms and simplify the implementation of those algorithms in a 2. Methodology
range of industries, including in healthcare [14]. Although AutoML is
still an emerging technology, it has already been applied in bioinfor- 2.1. Literature search and selection strategy
matics, translational medicine, diabetes diagnosis, Alzheimer diagnosis,
electronic health record (EHR) analysis, and imaging for medical pur- We searched eleven electronic databases from 2016 to 2022. Sig-
poses [15]. However, it has not been extensively investigated for how nificant contributions have been made to NLP research in the last
it might be used to process clinical notes, a significant component of six years [30]. This search was done through well-established outlets
EHR. In addition, patient data protection is a significant concern when hosting a wide range of high-quality peer-reviewed articles. We have
employing ML-enabled automated systems; yet, no research has iden- searched Google Scholar, PubMed, Elsevier, IEEE, Springer, Oxford Uni-
tified patient data protection difficulties or comprehensively explored versity Press, Nature Publishing Group, Wiley Online Library, BioMed
the strategies that may be implemented to assure medical data privacy. Central, and American Medical Informatics Association for literature
from 2016–2022. These databases are rich and have high-quality peer-
Aside from these, several solutions have been developed for EHRs
reviewed articles for NLP research. We have developed several key
to handle clinical tasks; however, there remain challenges for health
terms to identify the studies, like ‘‘NLP in Clinical Narratives’’, ‘‘Medical
information research because of the unique language and clinical id-
NLP’’, ‘‘ML in EHRs’’, ‘‘DL in Medical Text’’, ‘‘Automated ML in EHRs’’.
ioms used by clinicians [16,17]. Natural Language Processing (NLP),
a subfield of Artificial Intelligence (AI) techniques (such as entity
2.2. Selection criteria
recognition), has been used for clinical text mining [18,19], which is
a notably clinical note analysis. Theoretically, these techniques are in
their conception stage, and it will take some time for them to be able (A) Inclusion Criteria: We have included literature that described:
to select an accurate and precise model for real-world applications. ML/DL-based free text classification, word-embedding approach in the
This leads to the most significant problem in the field of NLP: the context of medical text data, automatic clinical narratives summarisa-
processing of medical text data and decision-making utilising computer tion, healthcare dialogue system, medical concept embedding, Delirium
technologies. There is a need for novel ways to classify NLP to facilitate risk identification, ICD-9 multi-label classification, clinical entity recog-
its effective use in contemporary healthcare. This project’s first and nition, machine learning and deep learning architecture for EHRs. Only
foremost objective is to solve the identified shortcomings in EHRs-NLP peer-reviewed journal articles or full conference papers were included.
applications for healthcare and find effective methods for analysing To be included, a study must have used an ML or DL-based model or
EHRs, which will have a positive influence on the research community. framework designed solely for analysing EHRs. Studies must also have
This study provides a thorough review of the numerous healthcare been focused on analysing and identifying clinical narratives through
uses of NLP. The objectives of our review are as follows. First, we ML or DL methods.
aim to review the NLP technique in EHRs with a specific focus on (B) Exclusion Criteria: Research works that were published as a
different state-of-the-art models. Second, we explain the DL and ML preprint, with preliminary work or without peer review, were excluded.
paradigms used to analyse EHRs, mainly clinical free text. Third, we Editorials and review papers were also on the exclusion list. After the
identify core challenges in categorising clinical notes. Last, we examine initial screening, the articles’ retrieved for full-text analysis also was
how researchers have implemented their models for managing clinical examined for quality.
notes in the healthcare industry.
We present the difference between our review and existing ones in 2.3. Search output
Table 1. It is noted that a high proportion of review articles used NLP
and EHRs. Still, few seem to have adhered to the PRISMA structure, Fig. 1 illustrates that in the initial search, 261 titles were identified
an evidence-based minimal set of reporting elements for comprehen- for the title and abstract screening, comprising 15 from Springer, 12
sive meta-analyses and reviews. Recent reviews have covered DL and from PubMed, 15 from IEEE, 51 from Elsevier, 17 from Oxford Uni-
ML-based strategies, but reviews published before 2020 have not em- versity Press, 10 from Nature Publishing Group, 25 from American
phasised the use of state-of-the-art models or explained many potential Medical Informatics Association (AMIA), 7 from BioMed Central, 8 from
challenges in clinical NLP. Similarly, information regarding model val- Wiley Online Library. Of these, 119 papers were excluded based on our
idation or evaluation matrix was missing. Finally, none of the existing exclusion criteria. An additional 101 papers were identified from the
reviews discussed clinical tools or settings, let alone advanced NLP reference lists of retrieved articles. Four of these titles were duplicates
methods such as the transformer model. and thus excluded, 1 article was not available for review, and ten did
This paper makes a significant contribution in that by covering not match our criteria. The remaining 127 papers were retrieved for
a comprehensive systematic review that fills a gap in the existing full-text review.
research. We focus on (1) the commonly utilised ML and DL-based Studies included in the review described different traditional and
models, including their importance in healthcare NLP; (2) the popular hybrid methods for analysing clinical free text. Some proposed new
2
Table 1
Comparison of our paper with that of the existing articles.
Year Authors PRISMA ML? DL? NLP? EHR? Evaluation Word Feature Clinical Transformer
Review? Metrics? Embeddings? Extraction? Tools? Model?
√ √ √ √ √ √ √ √ √ √
2022 This paper
√ √ √ √ √ √
2022 Tyagi et al. [20] 𝜒 𝜒 𝜒 𝜒
√ √ √ √ √
2021 Chowdhury et al. [21] 𝜒 𝜒 𝜒 𝜒 𝜒
√ √
2020 Juhn et al. [22] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √ √ √
2020 Ahmed et al. [23] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √ √ √ √ √
2020 Wu et al. [24] 𝜒 𝜒 𝜒 𝜒
√ √ √ √ √
2019 Alzoubi et al. [25] 𝜒 𝜒 𝜒 𝜒 𝜒
√ √
2019 Juhn et al. [26] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √
2019 Koleck et al. [27] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √
2018 Wang et al. [28] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √ √
2017 Luo et al. [29] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
Fig. 1. Literature exclusion and inclusion results were followed by the PRISMA method.
approaches and others tried to improve existing techniques. The major- 3.1. Patient risk analysis/prediction
ity of studies originated from North America, followed by the United
Kingdom, Asia, Europe and Australia, as illustrated in Fig. 2. No articles This section presents an overview of the essential concepts of ma-
from the African continent or South America met our criteria. The chine learning and natural language processing applied to analyse
sample size used in the retrieved articles varied between 150 and and predict the risk condition of patients. We will discuss articles
823,627. EHR was the most used data type and was used as the sole concerning patient risk analysis and prediction, including (1) perinatal
data source by almost all studies. Most datasets were unstructured, and self-harm detection, (2) suicide attempt detection, (3) automated HIV
two studies only used structured free-text data. The study designs were risk assessment, (4) delirium detection, and finally, (5) diagnosing
mostly experimental (𝑛 = 24), cohort (𝑛 = 9), case study (𝑛 = 1). Among lupus nephritis.
the existing articles we reviewed, some studies (𝑛 = 11) used diagnostic
tools without specifying the name of the diagnostic tool, while others (A) Perinatal Self-harm
(𝑛 = 9) explicitly stated that they used the International Classification Mental illness, drug abuse, singleness, and obstetric and neonatal
of Diseases (ICD) tool, which is used to classify disease and mortality complications are major risk factors for self-harm throughout preg-
codes. nancies and the first year following delivery [31], which is likely to
be the case of perinatal self-harm. It is worthwhile to note that NLP
3. Techniques used in the literature for Analysing EHR has been used to identify suicidality in EHRs [32], including those of
kids suffering from autism spectrum disorders [33] and primary care
Techniques used in EHR can be categorised into the following physicians [34].
groups: patient risk analysis/prediction, advanced architectures to anal- Several studies have focused on women with Serious Mental Illness
yse EHRs, medical text summarising and other NLP applications. (SMI) during the perinatal stage. Ayre et al. [35] proposed a Natural
3
Fig. 2. Medical NLP research around the globe.
Language Processing (NLP) tool that can effectively identify those who the number of input variables in order to lower the computational cost
have perinatal self-harmed. Clinical Record Interactive Search (CRIS) of modelling, thus improving the predictive model’s performance. Note
was used in this study, which enabled researchers to access women’s that the proposed algorithms were not compared with more established
de-identified medical health records. The tool investigated by Ayre models for predicting suicide, making it hard to gauge the improvement
takes a text as input (‘‘She took an overdose’’, ‘‘Previous episodes of it offers (see Fig. 3).
self-harm’’, ‘‘Current episode of self-harm") and sequentially runs it Using a different approach, Carson et al. [40] developed and evalu-
through five processing layers before generating an Extensible Markup ated an ML method using NLP in EHR to detect suicide in adolescents.
Language (XML) file in which XML tags annotate each detected instance In order to categorise teenagers according to their history of suicide
of self-harm and its associated attributes. The processing layer includes attempts, the authors illustrate the implementation of an ML system
linguistic preprocessing, lexical rules, token sequence rules, negation that creates a classification model from codes created by NLP analysis
detection and contextual search. The developed tool was validated of EHRs notes. In this work, Invenio software was employed to encode
through precision, recall and f1-score, and the tool’s performance in the unstructured textual information of EHRs. Invenio is built upon an
detecting perinatal self-harm was found satisfactory. However, there re- open-source Apache cTAKES platform [41] and analyses unstructured
main several shortcomings in this study. The sample size of the dataset descriptions of medical notes. Compared to the cTAKES system, Invenio
was relatively small, which may create the possibility of overfitting was found to be more successful when converting free clinical text;
problems. Overfitting occurs when a model acquires so much informa- More specifically, this system uses several features such as a sentence
tion and noise from the training data that it impairs its performance on boundary detector, tokeniser, normaliser and part-of-speech tagger,
new data. In addition, this tool is currently running as a beta version, shallow parser, and named entity recognition annotation. In addi-
and full development is not yet complete making it difficult to measure tion, Invenio’s performance in capturing negative phrases in electronic
efficiency. health records proved to be satisfactory. Furthermore, a random forest
(RF) classifier was also applied to classify individual patients according
(B) Identify Suicide Attempts
to the history of prior suicide attempts. Again, the authors used the five-
Identifying first-time suicide attempts has always been challenging
fold cross-validation technique to optimise the features of the proposed
since prediction models generally demand huge data sets [36]. In
model. Finally, Area Under the Curve (AUC) was used to assess the
addition, risk assessment mainly depends on patient-reported data [37]
performance of the model.
and, and patients may be prone to hide suicidal notions [38]. These
have hampered the accurate identification of suicide risk over the (C) Automated HIV Risk Assessment
years. In light of the underlined context, a group of researchers [39] Many studies have proposed solutions for automated Human Im-
conducted a study using electronic health records to detect patients munodeficiency Virus (HIV) risk assessment. These studies primarily
who are at-risk for their first attempt at suicide using machine learning rely on structured medical free text and have many limitations in
and natural language processing. capturing important information on HIV risk factors. Usually, narrative
An open-source NLP tool, ‘‘cTAKES’’, was used to extract clinical or semi-narrative formats are used to gather precise descriptions of
outcomes from medical notes, requiring no preprocessing. This tool has social and behavioural factors, such as sexual orientation and sexual
been widely used and thoroughly tested to process numerous descrip- activity.
tive notes, including discharge summaries, radiology notes, history and Utilising machine learning and natural language processing, Feller
physical progress. When extracting clinical concepts from large-scale et al. [42] suggested a strategy to detect persons at high risk for HIV
medical records, Concept Unique Identifiers (CUIs) from the Unified using EHRs. This automated diagnosis was carried out in four steps:
Medical Language System (UMLS) was used to annotate each concept. keyword identification, topic modelling, variable selection and statistic
Furthermore, Tsui et al. [39] employed traditional machine learning modelling. The first stage aims to identify words with potentially rich
models to predict the suicide risk by exploiting the retrieved features information value by representing each word in the clinical note based
of EHRs: Random Forest (RF), Least Absolute Shrinkage and Selection on its Term Frequency-Inverse Document Frequency (TF-IDF) weight.
(LASSO) regression, Naïve Bayes (NB), as well as the Ensemble of The second stage focuses on topic modelling, from which large amounts
Extreme Gradient Boosting (EXGB). This study adopts 3 frameworks of text can be analysed, and its content can be defined by focusing on
based on feature engineering concerning feature optimisation: wrapper, hidden features with a certain weight. This was done using the Latent
filter and embedded. The main idea of this optimisation is to reduce Dirichlet Allocation (LDA) algorithm, which takes a corpus of notes
4
Fig. 3. Clinical notes analysis’s architecture diagram.
as input and learns K clusters, representing the distribution of words and text string searches. Researchers have come up with innovative
in each corpus. The third step is essential when working with a large solutions to diagnose lupus nephritis, but their solutions are not very
number of clinical notes. It can be said that redundant variables often effective. A study conducted by Deng et al. [44] designed an NLP
reduce the performance of predictive models, so it is recommended system to analyse the clinical notes to detect the early onset of nephri-
to eliminate irrelevant features and keep the corpus simple before tis. In this study, the authors utilised two inpatient and outpatient
feeding the data to the ML system. Hence, Feller et al. identified a datasets and implemented 4 algorithms: a rule-based algorithm that
selection of the most valuable variables using mutual information cri- utilises only structured data (baseline algorithm) and other 3 algo-
teria. The relationship between two random variables is quantified by rithms utilising various NLP-based models. Each of the 3 NLP models
mutual information, which may account for both linear and non-linear is built on l2-regularised logistic regression utilising a separate feature
correlations. set, comprising positive mention of Concept Unique Identifiers (CUIs),
The final step illustrates the development and assessment of the number of occurrences of CUIs, and a blend of all three components,
statistical model. Feller et al. utilised random forest classifiers to predict respectively.
the risk factors of HIV as they are simple to tune and provide a Furthermore, Deng et al. preprocessed the medical records by re-
measure of variable importance; following the bagging method reduces moving identical entries and lemmatisation phrases. The MetaMap
the chance of being affected by outliers and thus allows interpretation. was used to tag medical terms within these phrases. In addition, the
Consequently, the suggested model was evaluated through the model SHAP decision plot was also applied for assessing feature relevance to
validation indicators such as precision, recall and f1-score. present which features were more important during diagnosis. Again,
(D) Delirium Identification the proposed algorithms were compared with the three independent
Delirium is a sudden onset of confusion that can last for hours or NLP models and a baseline algorithm to ensure that this method could
even days. Delirium is often not classified for billing and is under- efficiently identify individuals with lupus nephritis. Nonetheless, the
diagnosed in clinical practice. Although manual chart review can be suggested approach makes it easier to accurately diagnose this disease,
employed to detect the presence of delirium, it is time-consuming and which helps researchers better understand the SLE characteristics of
inappropriate for large-scale investigations. Since NLP has the ability to individuals. Yet, missing laboratory tests from EHRs with this small
process and determine raw text data, Fu et al. [43] were motivated to sample size (50) affected prediction accuracy.
implement and evaluate NLP algorithms to detect delirium events from
EHRs. 3.2. Advanced architectures to analyse EHRs
The authors developed two NLP techniques such as NLP-CAM and
NLP-mCAM, using the Confusion Assessment Method (CAM). CAM is Deep learning and transfer learning architectures have revolu-
a standardised, evidence-based method that allows physicians with- tionised clinical NLP in recent times. The recent results of several
out psychiatric training to effectively identify delirium in clinical and pre-trained models against known benchmarks solidify transfer learn-
research contexts. The NLP models examine patient charts for clear ing’s position as an indispensable technique in modern NLP. More
indications of delirium patients and its associated medical information specifically, the state-of-the-art DL and transformer-based model have
that fits the CAM standards to determine an individual’s delirium condi- been used in various NLP tasks over the past years. Examples include,
tion. The CAM has four features that help assess delirium: sudden onset automated ICD-9 coding, multi-classification problems, medical text
and variable course, lack of attention, disordered thoughts, and altered summarisation, language translation, clinical data de-identification,
level of awareness. Each delirium and its associated concept were nor- etc. The following subsection summarises deep learning and transfer
malised to the correct format based on these features. In addition, any learning architectures frequently applied to analyse EHRs.
erroneous examples found during training were inspected through a
(A) Convolutional Neural Networks
manual process and repeatedly corrected until all errors were rectified,
DL methods are beginning to lead a wide range of clinical NLP
paving the way for improvement of the model. Thus, the proposed NLP
applications due to their low complexity, fast processing, and state-
techniques have worked admirably to identify patients experiencing
of-the-art results in the automated the International Classification of
delirium using health records in a timely and cost-effective manner.
Diseases (ICD-9) classification. Li et al. [45] designed a deep learning
(E) Identify Lupus Nephritis system (DeepLabeler) to classify ICD-9 automatically. This framework
Lupus nephritis is a kind of kidney condition that is triggered by consists of Convolutional Neural Network (CNN) with Document to
Systemic Lupus Erythematosus (SLE or lupus). Much of the information Vector (D2V) technique to retrieve and encode local and global fea-
needed to detect Systematic Lupus Erythematosus (SLE or lupus), such tures. The proposed model performs its task by following two steps:
as histology notes for kidney biopsies, are only available in text- (1) feature extraction and (2) multi-label classification. In the feature
based notes, making it difficult analyse rule-based detection algorithms extraction phase, Li et al. effectively extracted global and local features
5
from the Medical Information Mart for Intensive Care (MIMIC) dataset A hybrid deep learning model was proposed in [52] to classify ICD-
with the recent success of D2V techniques. Li et al. demonstrated 9 codes towards multi-class classification. This study applied RNN,
that the adopted D2V approach keeps all words in one document for LSTM, baseline logistic regression, feed-forward neural network and
training; Thus, it does not eliminate any useful information. Compared Gated Recurrent Unit (GRU) as part of the hybrid approach. Nigam
to the CNN model, it is quite impossible to retain the full words in et al. collected the Medical Information Mart for Intensive Care (MIMIC
the document during training because it considers ignoring semantic III) dataset, consisting of de-identified medical records and various
information while extracting features. Secondly, the multi-label classifi- clinical abbreviations, common misspellings, clinical phrases, and so
cation steps utilise a Fully Connected Neural Network (FCNN), sigmoid on. This study preprocessed noise and less important information from
activation function and backpropagation technique to anticipate the the dataset in the first phase to create a clean corpus. Secondly,
likelihood of each ICD-9 code. On the other hand, due to the small creating vocabulary was a primary concern when selecting features.
number of documents used in this study [45], it is noticeable that the Note that abbreviations, misspellings and idiosyncrasies were ignored
F-measure was not exceptionally high. The imbalanced distribution of during feature selection as they were less significant. Nigam et al.
ICD-9 codes in the MIMIC dataset is primarily the reason for the low applied the Bag of Words (BOW) approach to extract features from
F-measure in ICD-9 automatic coding. the text document. In the case of model selection, first, the baseline
In addition to the wide variety of clinical NLP applications, re- Logistic Regression (LR) model was applied to train a separate model
searchers do not limit themselves to contemporary NLP solutions, as for each class, and each model accurately forecasted the predicted value
the research community continuously strives to solve several signifi- (0 or 1). Afterwards, the RNN model was created, and this model
performed differently than the baseline LR model. For example, instead
cant problems in the current NLP space. It can be said that among
of summing the entire BOW note vector, each vector was separated
the existing articles, clinical name entity recognition (NER) is not
and input a normalised vector at each time step. To avoid losing
widely available; However, recently, a customised NER model has
semantic information from previous notes, the labels of the notes were
been adopted to extract several medical entities from a large number
replaced with LSTM units. Although the proposed model successfully
of medical records. Kormilitzin et al. [46] proposed a NER model
classified ICD-9 codes, there is still a limitation. The model appears to
to identify clinical entities in seven categories: Drug Name, Route of
be overfitting the training data and probably requires a higher dropout
Administration, Frequency, Dosage, Strength, Form and Duration. The
value.
recommended model was developed based on the spaCY python-based
open source library. While several excellent libraries are available in (D) Transformer-based Architectures Recent advances in transfer
full versions, including NLTK [47], Stanford CoreNLP [48], Hugging learning have gained popularity in clinical NLP. More specifically,
Face [49], and NLP4J [50], the Spacy library is optimised for CPU the Bidirectional Encoder Representation from Transformer (BERT)
speed. The core architecture of the proposed NER model is based on a combines bidirectional transformers and transfer learning to create
CNN network. The token representations are hashed Bloom embeddings state-of-the-art models for various NLP tasks [53]. In recent years,
of specific word prefixes, suffixes and lemmatisations complemented by Mulyar et al. [54] proposed a Multitask-Clinical BERT (MT-Clinical
a transition-based chunking model. In the case of model performance, BERT), which is a unique model that combines individual activities. .
in seven areas, it received a micro-mean F1 score of 0.957. In addition, For example, it conducts multi-task learning on 8 different information
the transferability of the created model was evaluated utilising data retrieval tasks, including entity retrieval and identification of Personal
from the United Kingdom Secondary Care Mental Health Record (CRIS) Health Indicator (PHI), in addition to clinical text embedding learning.
from United States critical care facilities. These embeddings are fed as input to these prediction functions. This
multi-task strategy is competitive against task-specific information ex-
(B) Long Short-Term Memory traction algorithms due to its capacity to exchange data across several
A hybrid model of Gated Attention incorporated Bi-Directional Long inconsistently annotated datasets.
Short-Term Memory (ABLSTM), and attention-based bi-directional Another transformer model called BEHRT was suggested by Rao
LSTM was proposed by Li et al. to classify clinical text [51]. In this et al. [55], which learns about the previous illnesses of patients as well
study, Li et al. applied a three-stage hybrid system that incorporates as the interconnections between them. This algorithm is specifically
the threshold-gated neural network model with the attention-guided designed to forecast a patient’s future diagnosis (if any), given their
rule-based approach to solve a multi-class clinical text classification past symptoms. The proposed BEHRT creates a definitive embedding
problem. To begin with, Recurrent Neural Network (RNN) was applied through information on disease progression and care delivery as well
in this study to be effective for modelling time-sensitive sequences. as maintaining event timing. Compared to previous techniques such
On the other hand, the fundamental concept behind LSTM was to as RETAIN [56], an assessment of the model revealed that BEHRT
implement ‘‘gates’’ to regulate the data flow to RNN units. Furthermore, had greater predictive power, as shown by an increase of 8.0–13.2%
the attentive recurrent architecture was introduced because Li et al. in average accuracy scores for tasks such as sickness trajectories and
observed that when dealing with medical multi-class classification illness prediction.
problems, the notable drawbacks of ‘‘black box’’ methods cannot be In another case, it is often seen that actual clinical data is under-
ignored. In order to overcome this problem, the authors included a utilised in most studies because researchers often do not have access
bi-directional LSTM framework, including an attention layer to enable to actual data due to data scarcity and confidentiality. Note that most
the network to weigh the words in a phrase based on their perceived studies focused primarily on using the MIMIC corpus; nevertheless, MS-
relevance. Besides, the weighted average and occurrence filter method BERT was created by Costa et al. [57] as the first publicly accessible
was prioritised for calculating word weight. Finally, a three-stage transformer model that was trained on actual clinical data. The MS-
hybrid method was developed that applied three subsequent modules to BERT model is publicly accessible and has been trained on more than
obtain the final output: GATED ABLSTM classifier, regular expression- 70,000 consultation notes for Multiple Sclerosis (MS) patients. It is to
based classifier, and ABLSTM classifier. However, Li et al. found that be mentioned that the notes were de-identified before training. Further-
the traditional LSTM network limited the ability to receive significant more, the model was evaluated using a classification task in order to
scores for a particular word in the input document (see Fig. 4). forecast the Expanded Disability Status Scale (EDSS). In the macro-F1
score, the model outperforms competing models using word2vec, CNN
(C) Recurrent Neural Networks and rule-based techniques.
Another pioneering effort has shown remarkable achievement in the In addition, a group of researchers [58] developed CheXbert, which
context of multi-class classification problems. Recent breakthroughs in employs BERT to classify free text radiological records. Existing ma-
advanced deep learning models show great utility in the NLP space. chine learning models in this study use feature engineering or human
6
Fig. 4. Architecture of the attention-based bi-directional LSTM.
annotation. Although of excellent quality, annotations are limited, concern was drawing out technical reports’ content. This task was
and production is expensive. The CheXbert overcomes this issue by accomplished per the document’s formatting and logical structure. In
learning to classify radiography reports via annotations and current the case of scientific articles, the main body was extracted by elimi-
rule-based techniques. It first learns to anticipate the output of a rule- nating those parts of the text that seemed unnecessary to include in
based labeller, then fine-tunes an extensive set of expert comments. the abstract. The title, author information, abstract, keywords, section
It achieved a new state-of-the-art result by improving F1 scores for a and subsection headings, bibliography, and other elements were in-
report labelling task on the MIMIC-CXR dataset [59], which contains cluded in these parts. Subsequently, Moradi et al. use the Helmholtz
large-scale labelled chest radiographs. principle from Gestalt theory to determine the concepts that convey
primary information from the text and then construct a graph from
3.3. Medical text summarising that information. Finally, the degree of each node in that generated
graph was calculated by a summation, which then ordered the nodes in
Creating a summary system from medical narratives has become descending order. In addition, the ROUGE score was used to evaluate
challenging as few effective tools have been developed in the healthcare the proposed model, which calculates different ratings indicating the
sector. When large amounts of data are collected, such as in the content similarity between a reference summary and the summary
Intensive Care Unit (ICU), displaying efficient data becomes a key made by a machine learning model. Also, a comparative analysis of
concern for strategic planning. While the most common strategy would this method with other summaries was carried out during the eval-
be visually displaying information, text summarisation has already uation period. This model had the best ROGUE value among other
been found to aid strategic planning. This section will review the comparison techniques. However, the authors intend to extend this
abstract and extractive text summarisation approaches researchers have research by using increasingly advanced methods at different stages of
developed in the retrieved articles. the summarisation process.
Another extractive summarisation model developed by McInerney
(A) Extractive Summarisation Models
et al. [62] selects sentences most likely related to a potential diagnosis.
Portet et al. [60] proposed an extractive summarisation model
First, McInerney et al. compiled a list of individual reporting forms
called BT-45 utilising EHR to generate textual summaries of approxi-
and diagnostic codes from each individual, chronologically arranged
mately forty-five minutes of constant clinical information and random
by date and time in various collected EHR datasets. Then, their strategy
events. This study developed text summaries in four steps, each access-
was to train a deep learning model incorporating a transformer-based
ing area-specific knowledge containing conceptual content in neonatal
approach to select short phrases from EHRs. For this, the authors
intensive care. Signal analysis (1), which extracts the basic characteris-
developed and evaluated systems for remote supervision, which only
tics of the physiological time-series data, is the first phase of processing
require grouping diagnostic codes. Such systems employ medical BERTs
(artefact, pattern and trend). In order to comprehend more creative
to encode questions and comments simultaneously and then identify
clinical findings and linkages derived from data pertaining to signal
groups of ICD codes that correlate with specific illness diagnoses used to
features and random occurrences, data interpretation (2) employs a
train the model. Although the proposed remotely supervised model sig-
variety of time and rational reasonings. The third phase of docu-
nificantly outperforms unsupervised baseline models, McInerney et al.
mentation planning (3) organises the most relevant occurrences from
intend to expand this research to determine whether adding a little
the preceding phases into a tree of related occurrences. Eventually,
direct supervision can improve the model’s performance further.
this tree is transformed into a coherent text through microplanning
and realisation. However, human evaluators find model summaries (B) Abstractive Summarisation Model
inefficient because of the difficulty in integrating these disparate data Radiological reports have been increasingly summarised using
into one model. As a result, most of the following initiatives focus seq2seq and related models in abstractive summarisation research.
primarily on textual information. Despite these limitations, the authors The first study using seq2seq to develop radiological impressions was
demonstrate that it is feasible to construct concise paragraphs from conducted by Zhang et al. [63]. Zhang et al. suggest using the neural
huge, complicated information that may be used as useful planning seq2seq method for making radiological assessments. The authors also
instruments. propose a particular deep learning model for this activity that learns
Moradi et al. [61] introduced a graph-based algorithm that analysed to encode prior research knowledge and uses it to guide the decoder.
words and phrases using biomedical text. In the first phase, the primary Additionally, a pointer-generator model was used in the decoding part.
7
Consequently, this model outperforms state-of-the-art baseline models additional research is required to validate the proposed approach,
on larger datasets of radiological records collected from real hospital particularly in identifying outpatients’ goals-of-care discussions.
trials using the ROUGE system. Although the background part of the
(C) Clinical Chart Review
report was simplified by encoding the model as an abstract summary,
Periprosthetic joint infection (PJI) data elements exist in both un-
radiology specialists’ treatments are often excluded from the results
structured and structured EHR records and must be collected manually.
part and require an extensive understanding of research and field
This study [66] aims to create an NLP technique to simulate manually
expertise; the model often misses follow-up treatments.
annotated chart evaluation for data items of PJI. The suggested strategy
was based on expert rules that focused on textual cues (i.e., PJI-
3.4. Other NLP applications related terms) identified in orthopaedic surgeons’ or communicable
diseases experts’ clinical narration. The text preprocessing, concept
The additional applications of NLP can be clustered under the extraction, and classification processes are the three primary parts
following headings: blockchain-based EHRs, identifying goals-of-care of the NLP method. Sentence segmentation, assertion detection, and
conversations, clinical chart review and medical language translation. temporal extraction were the critical elements of the textual data
processing workflow. Additionally, concept extracting is a knowledge-
(A) Blockchain-based EHRs driven annotating and indexing technique that recognises phrases in the
The significance of having a reliable record-tracking and commu- unstructured text that correspond to topics of interest. Furthermore, in
nications mechanism has been highlighted more recently worldwide developing the NLP algorithm, a training sample of 1208 TJA surgeries
during the COVID-19, which indicates the current inadequacy in this (170 PJI cases) and a test sample of 1179 TJA surgeries (150 PJI cases)
field. Bharimalla et al. [64] focused on a blockchain and NLP-based were selected randomly. To successfully predict the state of PJI based
approach to make a communication and record tracking system [33]. on MSIS criteria, the NLP technique was applied to all consultation
The proposed prototype system is based on Hyperledger Fabric, a notes, surgical notes, pathology reports, and microbiological reports.
distributed ledger technology that is open source and designed for After extracting the existence of sinus tract, purulence, pathologic
enterprise use. It is a popular choice for private blockchains. Bharimalla evidence of inflammation, and growth of bacterial isolates from the
et al. categorised their methodology into system architecture, data affected TJA, the algorithm obtained an f1-score between 0.771 and
pulling and sharing, patient data management and converting paper 0.909.
prescriptions to text. To be more specific, Bharimalla et al. focused on
converting paper prescriptions into text using NLP methods to integrate (D) Medical Language Translation
old paper-based clinical records into the new system using a mobile Most researchers, except specialists, have limited knowledge of
EHRs because they contain specialised medical terms, acronyms, and a
application-based interface.
distinct structure and writing style. Translating medical writings into a
Turning to the data extraction phase, CNN, LSTM and Residual
more understandable form for laypeople is known as medical language
Networks (ResNet) were applied in terms of extracting handwritten
translation. For example, the term ‘‘peripheral edema’’ might be substi-
data. At the same time, the Google tesseract model was considered to
tuted with ‘‘ankle swelling’’. There are only a few research have been
extract printed prescriptions data. Moreover, Bharimalla et al. carried
conducted on the topic of EHR simplifying. Weng et al. [67] used an
out some preprocessing in the extraction process. Firstly, the image
unsupervised task of text simplification to medical documentation in or-
was converted to grayscale to create a more functional model. The
der to simplify them. Manually annotating text with simplified versions
next step was to apply the Otsu thresholding process, which turns
using unsupervised algorithms helps to alleviate the lack of text. They
the pixels into ones and zeros after the grayscale operation. Note
employ skip-gram embeddings learnt from 2 different clinical corpora:
that some pixels are usually lost during thresholding; therefore, to
MIMIC-III, which has a substantial amount of medical terminology, and
mitigate these problems, Erosion and Dilation techniques were used to
MedlinePlus [68], which is oriented towards laypeople. These complex
restore certain pixels, where Erosion enlarges some pixels and Dilation
and basic phrase embeddings are aligned using a bilingual dictionary
reduces some pixels. Bharimalla et al. explain that these images will be induction model, which also initialises a denoising autoencoder. This
forwarded to Google-Tesseract after completing preprocessing. Finally, autoencoder takes as input a sentence written by a doctor, converts
all text from the image will be extracted by tesseract and sent over the it into a simplified translation using a language model, and then
network. Fig. 5 illustrates the proposed framework for a blockchain- reconstructs the original sentence through the translation. On the other
based healthcare system. It shows the main participants, elements, and hand, a human-annotated medical language translation dataset called
transaction procedures. MedLane was introduced by Luo et al. [69]. It aligns professional med-
(B) Identify Goals of Care Conversations Goals-of-care conversations ical language with expressions that the average person can understand.
aid patients with severe illnesses in articulating what they value most For training, validation, and testing, it contains 12,801/1,015/1,016
and wish to occur with their medical treatment. Medical professionals samples, respectively. In addition, they presented the PMBERT-MT
can use this information to create a care plan based on the patient’s val- model, which employs the pre-trained PubMedBERT [70] and carries
ues and preferences. In light of the context, Lee et al. [65] developed an out translation training using MedLane.
automated method for identifying goals of goal-of-care discussion using (E) Medical Disease Prognosis
NLP approaches. In brief, a sample of 3183 EHR notes was collected Additionally, research in medical imaging is currently in the lime-
from 1426 patients with severe illnesses, and each note was manually light, which was very useful in the early stages of the COVID-19
evaluated for documentation of goals-care discussions. The EHR notes outbreak. It should be noted that CNN-based methods were came into
were randomly divided into 100 training and test set pairs. The NLP the attention to the research community and numerous concepts are
technique was used to tokenise each note in unigram (i.e., one-word currently being developed to solve specific cases. Recently, Bhosale
length tokens), removing common stop words and negation terms. In et al. [71] introduces a unique CNN model (PulDi-COVID) for the CXI-
this study, the logistic regression classifier was applied for each training based detection of nine illnesses (atelectasis, bacterial pneumonia, car-
set and measured the classifier’s performance using the Area under the diomegaly, covid19, effusion, infiltration, no-finding, pneumothorax,
receiver operating curve (AUC). The authors divided the data samples viral-Pneumonia). Utilising COVID-19 and CXI data for chronic lung
into inpatient or outpatient datasets and used the same methodology diseases, a variety of transfer-learning models are trained, including
for training and testing the model in both subgroups to investigate VGG16, ResNet50, VGG19, DenseNet201, MobileNetV2, NASNetMo-
the suggested model’s effectiveness. However, Lee et al. stated that bile, ResNet152V2, and DenseNet169. The complete dataset contains
8
Fig. 5. Framework for Blockchain-based Healthcare System [64].
Fig. 6. Analysing EHRs using DL and ML algorithms.
a subset of CXI associated with a variety of lung diseases and COVID- 4. Analysis of the literature
19 as well as healthy patients. Furthermore, Bhosale et al. [71] select
six illnesses from fourteen ChestX-ray8 classifications for the sake of This section will discuss findings based on the retrieved articles.
experimentation: atelectasis, cardiomegaly, effusion, infiltration, no- First, we discuss data types and quantities in 4.1. Second, the clinical
finding/healthy, and pneumothorax. The suggested framework has the free text preprocessing pipeline is shown in 4.1.2. The most frequently
used ML and DL models are illustrated in 4.2. A comparison of fre-
greatest achieved accuracy on the dataset utilised in the experiment,
quently used models is explained in 4.3. Model evaluation matrices and
with an accuracy of 99.70%, precision of 98.68%, recall of 98.67%,
commonly used feature extraction methods are illustrated in 4.4 and
F1-score of 98.67%, minimal zero–one loss of 12 and error rate of 4.5. Finally, clinical settings are presented in 4.6 (see Fig. 6).
1.33%. The proposed model PulDi-COVID has demonstrated superior
performance to earlier developed methods. In order to reduce patient 4.1. Data type and quantity
severity and mortality, the COVID-19 speedy detection requirements
with various lung diseases can be successfully met by the suggested The Clinical Practice Research Datalink (CPRD) dataset has been
SSE method with PulDi-COVID. used only in one article [55]; all other articles have used electronic
9
Table 2
Data type and size with relevant parameters.
Dataset Sample Funded Research Publicly Available Data Type Diagnostic Tool Design/Setting Structured
EHRs [72] 302 𝜒 𝜒 Clinical Notes ICD Cohort 𝜒
√
EHRs [18] 792 𝜒 Clinical Notes ICD Cohort 𝜒
EHRs [73] 681 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
EHRs [74] 16665 𝜒 Clinical Notes 𝜒 Experimental Research 𝜒
√
EHRs [75] 8000 𝜒 Clinical Notes Apache cTAKES Experimental Research 𝜒
√
EHRs [76] 1237 𝜒 Clinical Discharge Summaries Phenotyping Framework. Case Study 𝜒
Forensic EHRs [77] 6865 𝜒 𝜒 Case Notes 𝜒 Experimental Research 𝜒
√ √
EHRs [78] 291 𝜒 Clinical Notes ACS NSQIP Cohort
√ √
EHRs [79] 781 𝜒 Clinical Notes 𝜒 Experimental Research
EHRs [80] 7853 𝜒 𝜒 𝜒 ICD Cohort 𝜒
EHRs [81] 7149 𝜒 𝜒 Physician Notes 𝜒 Experimental Research 𝜒
√ √
I2B2 [82] 4605 Medication Administration Record 𝜒 Experimental Research 𝜒
√
EHRs [82] 6861 𝜒 Clinical Notes ICD Cohort 𝜒
EHRs [83] 𝜒 𝜒 𝜒 Medical Language 𝜒 Experimental Research 𝜒
√
CRIS [84] 𝜒 𝜒 Medication Administration Record 𝜒 Experimental Research 𝜒
EHRs [85] 154 𝜒 𝜒 Clinical Notes 𝜒 Experimental Research 𝜒
EHRs [86] 700,000 𝜒 𝜒 Medication Administration Record 𝜒 Experimental Research 𝜒
EHRs [87] 𝜒 𝜒 𝜒 Patient Demographic Data 𝜒 Experimental Research 𝜒
√
EHRs [65] 92151 𝜒 Mental Health Disorder 𝜒 Experimental Research 𝜒
√
EHRs [88] 76 643 𝜒 Lung Cancer ICD Cohort 𝜒
√
EHRs [43] 150 Clinical Notes 𝜒 Experimental Research 𝜒
√
EHRs [89] 1003 𝜒 Echocardiogram Reports 𝜒 Experimental Research 𝜒
√
√
EHRs [39] 798 665 𝜒 Clinical Notes ICD Cohort 𝜒
EHRs [64] 17 235 𝜒 𝜒 Clinical Narratives 𝜒 Experimental Research 𝜒
√
EHRs [90] 586 𝜒 Clinical Narratives 𝜒 Experimental Research 𝜒
EHRs [91] 𝜒 𝜒 𝜒 Clinical Narratives 𝜒 Experimental Research 𝜒
EHRs [92] 823,627 𝜒 𝜒 Clinical Narratives 𝜒 Experimental Research 𝜒
√
CPRD [55] 674 𝜒 Clinical Narratives 𝜒 Experimental Research 𝜒
EHRs [93] 1000 𝜒 𝜒 Medication Administration Record ICD Cohort 𝜒
EHRs [94] 300 𝜒 𝜒 𝜒 𝜒 Experimental Research 𝜒
√
√
EHRs [96] 198 𝜒 Echocardiography Reports ICD Cohort 𝜒
√ √
EHRs [95] 820 𝜒 Patient Demographics 𝜒 Experimental Research
√
EHRs [33] 𝜒 𝜒 Clinical Narratives 𝜒 Experimental Research 𝜒
health record data. This section describes two essential factors: (a) a clinical free text is vital to understanding the free text processing set-
comparison of overall insights of data and (b) details of preprocessing tings. Table 3 presents data preprocessing methods used in the reviewed
pipelines used. articles. We compare commonly used data preprocessing techniques
such as commercial, manual, electronic, and distributed. Our analysis
4.1.1. Comparison of overall insights of data reveals that the researchers did not explain any details of the clinical
Table 2 presents studies that reported the following eight parame- text preprocessing settings in the articles listed in Table 3. Although
ters: (1) dataset, (2) sample, (3) funding status, (4) universal availabil- structured data was used in two manuscripts [78,79], we did not find
ity, (5) data type, (6) diagnostic tool, (7) design/settings, and (8) data comprehensive approaches to clinical text preprocessing.
format. Most of these studies (𝑛 = 21) were well funded, indicating that
research on clinical NLP has become essential for improving clinical 4.2. Models
outcomes in recent years. Table 2 groups type of information recorded
in EHR, which include clinical notes (𝑛 = 14), clinical narratives (𝑛 = 5), 4.2.1. Frequently used ML models
echocardiography reports (𝑛 = 2), medication administration records Experimental techniques that were effective when implemented on
(𝑛 = 4), lung cancer data (𝑛 = 1), mental health (𝑛 = 1), demographic EHRs were Logistic Regression (LR), Support Vector Machine (SVM),
data (𝑛 = 2) and discharge summary (𝑛 = 1). Due to patients’ eXtreme Gradient Boosting (XGBoost), AdaBoost, Random Forest (RF),
confidentiality, the availability of clinical data is quite challenging. Linear Regression (LR), Naïve Bayes (NB), Gradient Boosting (GB), and
Most EHR data researched was not open to public access and was not Decision Tree (DT) models. These ML and DL-based algorithms were
available to access upon study completion. applied to conduct various NLP tasks, including classification, predic-
We also retrieved the parameters of the studies such as diagnostic tion, word embedding, text summarisation, language modelling, ICD-10
tool, research design/setting, and data format. These studies can further classification, clinical notes analysis, mental health issue identification
be clustered into observational studies (𝑛 = 8), experimental studies and medical dialogue analysis. The two most prominent NLP tasks in
(𝑛 = 24), and case studies (𝑛 = 1) with a variety of diagnostic tools recent years have been classification (𝑛 = 15) and prediction (𝑛 = 14).
utilised in both cohort and empirical research. The International Clas- The bar graph in Fig. 7 compares the widely accepted ML models for
sification of Diseases (ICD) was one of the most widely used approaches medical NLP used for EHRs. In short, Support Vector Machine (SVM)
to assign codes against a specific disease. The data format of each and boosting algorithms have been the most widely utilised models
observational and experimental study was unstructured when obtained applied to electronic health record data for many years. Now turning
from the various EHR sources. back to the details, Fig. 7 clearly explain that the use of the SVM model
for clinical free text analysis has increased rapidly by 95% in recent
4.1.2. Clinical free text preprocessing pipeline years, showing that scholars have concentrated more on utilising this
Understanding what techniques researchers frequently use in clin- approach in recent years. On the other hand, the use of the Decision
ical text processing is essential. Finding the right direction to process Tree (DT) model was the lowest among other classical ML algorithms
10
Table 3
Comparison of data preprocessing methods.
Dataset Data Type Structured Commercial Data Processing Manual Processing Electronic Processing Distributed Processing
EHRs [72] Clinical Notes 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [73] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [76] Clinical Discharge Summaries 𝜒 𝜒 𝜒 𝜒 𝜒
Forensic EHRs [77] Case Notes 𝜒 𝜒 𝜒 𝜒 𝜒
√
EHRs [78] Clinical Notes 𝜒 𝜒 𝜒 𝜒
√
EHRs [79] Clinical Notes 𝜒 𝜒 𝜒 𝜒
EHRs [81] Physician Notes 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [83] Medical Language 𝜒 𝜒 𝜒 𝜒 𝜒
CRIS [84] Medication Administration Record 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [86] Medication Administration Record 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [87] Patient Demographic Data 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [88] Lung Cancer 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [89] Physician-adjudicated echocardiogram reports 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [64] Clinical Narratives 𝜒 𝜒 𝜒 𝜒 𝜒
CPRD [55] Clinical Narratives 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [93] Medication Administration Record 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [33] Echocardiography Reports 𝜒 𝜒 𝜒 𝜒 𝜒
EHRs [95] Patient Demographics 𝜒 𝜒 𝜒 𝜒 𝜒
Fig. 7. Visual presentation of recognised machine learning models and mapping their cumulative frequencies.
at 40%. It is obvious that boosting strategies such as AdaBoost and articles. One possible reason is that the model’s performance was rather
XGBoost were used significantly in selected articles. It is also noticeable satisfactory; therefore, evaluation via metrics was not used.
that, in recent years, about four-fifths of the Logistic Regression (LR) The bar chart in Fig. 8 illustrates the recognised DL models and
model has been applied to analyse medical free text, as can be clearly their cumulative frequency mapping. Overall, it can be seen that Neural
seen from Fig. 7. Finally, model evaluation indicators and automated Network (NN) was the most frequently used model applied to the elec-
software tools were used in a very small number of articles. tronic health records for analysing clinical free text. Artificial neural
networks (ANN), commonly referred to as neural networks or neural
4.2.2. Frequently used DL models nets, are inspired by biological brain networks. An ANN is comprised of
In addition to ML models, various DL-based strategies from data a network of interconnected units or nodes known as artificial neurons,
concerning health are also applied to make automated decisions. Ta- which loosely resemble the neurons of a biological brain.
ble 4 explains the frequently used DL models and compares them The other common models identified in the included studies were
with the evaluation metrics techniques. Moreover, we were unable to Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Mem-
determine why evaluation metrics were not discussed in many research ory (BI-LSTM), Convolutional Neural Network (CNN), Residual Neural
11
Fig. 8. Deep learning models and their cumulative frequency mapping.
Network (ResNet), Transfer Learning (TL), Recurrent Neural Network Examples include TF-IDF, BOW, Word2vec, Glove, and FastText. The
(RNN), Gated Recurrent Units (GRU) and Representation Learning (RL). TF-IDF and BOW methods were incorporated as a word weighting
technique, while the Word2vec and Glove were favoured as a word
4.3. Comparison of frequently utilised models embedding approach. Among the selected papers, TF-IDF (𝑛 = 7), BOW
(𝑛 = 5), and Glove (𝑛 = 5) were more frequently utilised than the other
In this sub-section, we compare commonly used ML and DL-based methods, such as FastText or BERT transformer models.
models and discuss their advantages and disadvantages in general. It
will provide readers with an understanding of the core information of
4.6. Automated tools
each model in a clinical free text context.
4.4. Model evaluation metrics This sub-section will explain the automated tools currently being
utilised in healthcare. Tables 7 and 8 illustrate automated machine
Model evaluation metrics take a key role in evaluating the accu- learning integrated tools employed for commercial and open-source
racy and performance of a trained model. Our analyses reveal that applications. Overall, it is evident that several automated ML solutions
researchers focused primarily on AUC, Accuracy, Precision, Recall and produced by Google, Amazon, Microsoft, and JADBIO are chargeable
F1-score among the articles we reviewed. It is noteworthy that the and do not require coding. Likewise, technologies that do not require
AUC tends to differentiate between the classes of a dataset. The higher a fee necessitate minimum coding in the local environment and have
the AUC, the better the performance of a model that distinguishes be- some limits compared to fee-based solutions. On the other hand, exist-
tween positive and negative classes. Furthermore, the Confusion matrix ing AutoML technologies are not commercially available for structured
measures the precision of all classification techniques. The Confusion data and focus primarily on well-defined unstructured data.
Matrix has four distinct values: True Positive (TP), False Positive (FP), Observing Table 7, numerous platforms have created automated
True Negative (TN) and False Negative (FN). False Positive of Confusion ML-enabled solutions for a variety of activities, with Auto-Sklern [13]
Matrix is called Type 1 Error, and False Negative is called Type 2 being the most popular because it is integrated into the Sklearn library
Error. Several approaches are used to evaluate a model’s accuracy. and is designed to select algorithms and optimise hyperparameters. This
For example, TP, TN, FP, and FN are the main determinants of the approach also uses Bayesian optimisation techniques and meta-learning
model’s performance (see Table 5). The following equations (1), (2), to perform its tasks. Another open-source platform developed by the
(3) and (4) are primarily applied to determine the precision, recall, and University of British Columbia is Auto-Weka [115], sometimes known
f1-score [113]. as the Automated Waikato Environment for Knowledge Analysis. Auto-
𝑇𝑃 Weka uses Bayesian optimisation for hyperparameter optimisation. The
Precision = (1)
𝑇𝑃 + 𝐹𝑃 AutoML platform, which utilises statistical algorithms, can only analyse
𝑇𝑃 structured data, such as stock market prices, student grades, hotel
Recall = (2) occupancy, etc.
𝑇𝑃 + 𝐹𝑁
It can also be seen that the majority of unstructured data is pro-
Precision.Recall
𝐹1 = 2 ⋅ (3) cessed with commercial AutoML solutions, while structured data is
Precision + Recall
often processed with open-source tools and clinical AutoML platforms.
𝑇𝑃 + 𝑇𝑁 Due to its process-readiness, structured information is easier to manip-
Accuracy = (4)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
ulate, which is a matter of great concern in this regard. On the other
4.5. Word embedding/feature extraction methods hand, business organisations that benefit from AutoML platforms have
created more dynamic text and image processing capabilities. Some
Table 6 explains feature extraction approaches used with EHR. From examples are Amazon’s Recognizer, Apple’s CreateML, Microsoft’s Au-
the studies we reviewed, various feature extraction techniques were toML, and Google’s AutoML. These industries have invested heavily in
adopted, most of which are traditional approaches. Feature extraction the implementation of ML-enabled automated platforms. As a result,
methods of medical narratives such as word weighting, word embed- the majority of their products required no coding, and their tools
ding, and some open-source tools were considered in various studies. became relatively user-friendly.
12
Table 4
Top frequently utilised traditional models with evaluation metrics and validation techniques of the existing research articles.
Model used Evaluation metrics Validation
AUC Accuracy P R F1 Score
Elastic Net [73] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
NLP, EPA [74] 0.98 𝜒 𝜒 𝜒 𝜒 𝜒
XGBoost [75] 0.91 𝜒 𝜒 𝜒 𝜒 𝜒
√
BEHRT [76] 0.91 𝜒 𝜒 𝜒 𝜒
NLP, LCA [77] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
K-NN, SVM, NB, RF [78] 0.98 𝜒 𝜒 𝜒 𝜒 𝜒
UMLS, MedLEE, NV [79] 0.9 𝜒 𝜒 𝜒 𝜒 𝜒
RF [80] 0.92 𝜒 𝜒 𝜒 𝜒 𝜒
√
N/A [81] 𝜒 𝜒 𝜒 𝜒 𝜒
N/A [97] 𝜒 97 𝜒 𝜒 𝜒 𝜒
N/A [84] 𝜒 86.81 𝜒 𝜒 𝜒 𝜒
√
N/A [85] 𝜒 𝜒 𝜒 85 𝜒
N/A [98] 𝜒 93.8 𝜒 𝜒 𝜒 𝜒
√
N/A [99] 𝜒 𝜒 𝜒 𝜒 𝜒
CLM BR [87] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
N/A [100] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
N/A [101] 𝜒 𝜒 𝜒 𝜒 𝜒
NLP and supervised ML [65] 93.3 𝜒 𝜒 𝜒 𝜒 𝜒
LASSO [102] 0.58 𝜒 𝜒 𝜒 𝜒 𝜒
Sentiment Analysis, Cognition Engine and NLP techniques [103] 0.58 N/A 𝜒 𝜒 𝜒 𝜒
√
PheCAP [104] 𝜒 𝜒 𝜒 𝜒 𝜒
VHA [88] 𝜒 94.4 𝜒 𝜒 𝜒 𝜒
√
N/A [43] 𝜒 𝜒 𝜒 𝜒 𝜒
Topic modelling+LDA [105] 𝜒 81 𝜒 𝜒 𝜒 𝜒
cTAKES [106] 𝜒 96.7 𝜒 𝜒 𝜒 𝜒
N/A [107] 𝜒 98.2 𝜒 𝜒 𝜒 𝜒
LR, RF, SVM [66] 𝜒 71 𝜒 𝜒 𝜒 𝜒
Sag, Meta map, SHAP [105] 𝜒 96 𝜒 𝜒 𝜒 𝜒
CNN-LSTM, ResNet-LSTM [44] 𝜒 99 𝜒 𝜒 𝜒 𝜒
LF, Sense2vec, OxCRIS [44] 𝜒 95.7 𝜒 𝜒 𝜒 𝜒
NLP, DL [46] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
RF, LASSO, EXGB, NV [108] 0.95 𝜒 𝜒 𝜒 𝜒 𝜒
N/A [109] 𝜒 𝜒 97 𝜒 𝜒 𝜒
ASUDS, LRM [35] 𝜒 94 𝜒 𝜒 𝜒 𝜒
√ √
Cohort study [110] 𝜒 𝜒 𝜒 𝜒
√
ICD9-CM, CPT and NLP techniques [72] 𝜒 𝜒 𝜒 85.7–92.9 𝜒
√
Expert-driven Queries+NLP [18] 𝜒 𝜒 𝜒 𝜒 𝜒
√
Rule-based NLP [73] 𝜒 𝜒 𝜒 𝜒 𝜒
cTAKES NLP Software [74] 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √ √ √ √ √
LR, SVM, DT and RF [75]
√ √ √
LMT, LR, Linear Regression and SVM [75] 𝜒 𝜒 𝜒
√ √ √ √ √ √
Automated Clinical Follow-up Tool [77]
√
Regression, SVMs, DT, RF [111] 𝜒 𝜒 𝜒 𝜒 𝜒
√ √ √
RF, SVM, LR [112] 𝜒 𝜒 𝜒
√ √ √ √
Supervised and Unsupervised Model [78] 𝜒 𝜒
√ √
RF, Gradient Boosting, Neural Network, and Linear Regression [78] 𝜒 𝜒 𝜒 𝜒
Transfer Learning and Neural Networks [82] 𝜒 𝜒 𝜒 𝜒 82.4 𝜒
√ √
PAD-ML and LASSO approach [83] 0.801 and 0.888 70 90 𝜒
5. Research viewpoint text summarisation model, whereas two studies [82,132] offered the
medical NER model. The most prevalent physiological illnesses in
This section discusses current research trends, core challenges in recent years were dementia and geriatric mental health. There were
medical NLP, research gaps, and potential future directions. five studies on dementia [133–137] and three on geriatric mental
health [138–140]. Many studies have come up with solutions to alle-
5.1. Trend of current clinical NLP research viate such disorders, yet, there is a significant research opportunity in
this field.
Current trends in medical NLP research are illustrated in Fig. 9.
Analysis of clinical notes from five studies [35,39,42,42,44] revealed a 5.2. Core challenges in clinical NLP
substantial effort was put into patient risk assessment. We found several
papers that explored ICD-9 code classification; however, most of the ar- In clinical NLP, the core challenge is information overload, which
ticles did not emphasise machine learning or deep learning techniques. poses a substantial problem in accessing a particular, significant piece
As a result, we discarded them because they did not meet our criteria of information from vast datasets. In addition, semantic and context
for literature selection. We only found two publications [45,52] that understanding are essential and challenging for summarisation systems
emphasised ICD-9 code classification based on machine learning and with a quality deficiency and issues related to usability. Also another
deep learning. significant problem is the wide variety of text formats that an NLP pro-
In addition, the scientific community is currently paying consider- gramme has to deal with to answer queries from several sources. The
able attention to clinical Named Entity Recognition (NER) and medical following subsections provide a detailed description of the challenges
text summarisation. Five studies [60–62,130,131] created the medical in clinical natural language processing.
13
Table 5
Comparison of popular ML and DL models applied in the EHRs.
Model Advantage Disadvantage
SVM (a) SVM is very efficient for high dimensional space. (a) Support vector machine is not very efficient for very large
(b) This algorithm uses relatively less memory. datasets.
(b) The model is easily affected if the dataset contains overlapping
classes and noise.
Gradient Boosting (a) Does not require data scaling and can handle missing values. (a) Outliers may overfit the model.
(b) The algorithm is relatively flexible due to loss function (b) Comparatively more time-consuming /slower and requires more
optimisation and hyperparameter tuning. memory.
XGBoost (a) Can build decision trees in parallel. (a) XGBoost does not perform so well on sparse and unstructured
(b) Can use distributed computing method for complex models. data.
Logistic Regression (a) Logistic regression is easy to implement and interpret. (a) The overfitting tendency of logistic regression is generally low,
At the same time, it uses relatively less computational resources. but the model may overfit if the dataset becomes too dimensional,
(b) Logistic regression works well when the data are linearly in which case, dimension reduction should be done before
separable. modelling.
(b) If the number of observations becomes less than the number of
features,
the model will not be valid, and the problem of overfitting will
arise.
Random Forest (a) Following the random forest bagging method reduces the (a) Complex models require more computational resources when
probability of the number
being influenced by outliers. of learners is large.
(b) It works well for both categorical and continuous data,
Naïve Bayes (a) It is easy to implement and relatively fast. (a) Gives slightly lower accuracy than other algorithms.
(b) It also works well for small datasets.
Decision Tree (a) Decision trees do not require the dataset to be scaled. (a) Decision trees are not very effective for continuous value
(b) Decision tree can be explained very easily. prediction in many cases.
(b) Decision tree model takes comparatively more time in training.
Neural Network (a) Multitasking is a common advantage of neural networks. (a) Black Box Nature
(b) Hardware dependent
LSTM (a) The complexity of updating each weight is reduced to O(1). (a) Dropout is much harder to implement in LSTMs.
BI-LSTM (a) Enable additional training by traversing the input data twice (a) Since BiLSTM has double LSTM cells, so it is costly
CNN (a) Without any human oversight, it automatically discovers (a) Large training data required
significant features.
ResNet (a) Large number of layers can be trained easily without increasing (a) Deeper network usually requires weeks of training.
the training error percentage.
Transfer Learning (TL) (a) Overcome cost- and time-consuming issues. (a) Problem of negative transfer, i.e., utilising source domain
data/knowledge reduces unfavourably learning performance in the
target domain.
Recurrent Neural Network (RNN) (a) When processing temporal, sequential data, like text or videos, (a) Gradient vanishing and exploding problems.
RNNs perform better.
Fig. 9. Trend of current medical NLP research.
5.2.1. Medical abbreviations these strategies to address frequent clinical phrases used to describe
Sometimes abbreviations are misread, misinterpreted or misunder- patient care: time of admission, time of initial examination, time of
stood. Their use increases the amount of time needed to train health hospitalisation, time to discharge planning, and throughout the coding
care professionals, wastes time determining their cost, often delays and billing process [141]. Similar challenges are presented by medical
patient care, and occasionally results in patient harm. As shown in abbreviations and acronyms, such as when prescribing medications.
Fig. 10, many studies have combined medical abbreviations, confusing Physicians sometimes use Latin-derived abbreviations to specify the
terms, or data from different diseases when analysing clinical descrip- frequency of drug administration, such as ‘‘BD’’ (bis die), meaning
tions as a strategy to overcome NLP challenges. Researchers have used twice daily. Computers had trouble correctly recognising patterns when
14
Table 6
Feature extraction approaches used with EHRs.
Context Word Weighting Word Embedding Transformer Approach Automated Tools
TF-IDF BOW CountVectorizer Word2vec Glove FastText cTakes Metamap
√ √ √ √
Clinical notes classification 𝜒 𝜒 𝜒 𝜒 𝜒
√ √ √
DL assessment for ICD 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Free text classification 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Medical text labelling 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
recognising alcohol consumption 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Computerised ICD coding 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Indexing biomedical literature 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Multi-label classification 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Clinical coding 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
ML-based encoding 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Feature identification 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Extracting medication 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
ICD encoding 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√ √
Clinical coding 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Note classification 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Note embedding 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Classifying diagnosis 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Pre-screening for paediatric oncology patients 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
DL comparison 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Feature engineering 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Knowledge extraction 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Text mining of cancer 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Free text analysis 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Extraction of drugs indications 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
√
Diagnosis codes to free-text 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒 𝜒
Table 7
Automated machine learning integrated tools.
Platform Chargeable Coding Environment Dataset Domain
√
Auto ML [114] 𝜒 Google Cloud Images, Text, and Tabular Nonspecific
√
Create ML [114] 𝜒 Local Images, Text, and Tabular Nonspecific
√
Amazon Auto ML [114] 𝜒 Cloud Images, Text, and Tabular Nonspecific
√
Microsoft Auto ML [114] 𝜒 Cloud Images, Text, and Tabular Nonspecific
√
Auto-Sklearn [13] 𝜒 Local Tabular Nonspecific
√
Auto-WEKA [115] 𝜒 Local Tabular Nonspecific
√
Auto-Keras [116] 𝜒 Local Tabular Nonspecific
√
TPOT [117] 𝜒 Local Tabular Nonspecific
√
JADBIO [118] 𝜒 Cloud Tabular Biomedical and Multi-omics
√
AutoPrognosis [119] 𝜒 Local Tabular Biomedical
Table 8
Commercially available and as open source automated machine learning tools used.
Dataset Format Category Feature Platform
Health Associated Open Source Commercial
√
Hearing Aid [120] 𝜒 𝜒
Audio √
Unstructured Lung cancer [15] 𝜒 𝜒
√
Generic [121][114] 𝜒 𝜒
Images √
Liver Injury [122] 𝜒 𝜒
√
Ophthalmic syndrome [123] 𝜒 𝜒
√
Alzheimer’s disease [124] 𝜒 𝜒
√ √
BioSignature [125] 𝜒
√
Brain Age [126] 𝜒 𝜒
√
Structured Tabular Brain Tumour [127] 𝜒 𝜒
√ √
Cardiovascular disease prognosis [119] 𝜒
√
Diabetes [115] 𝜒 𝜒
√ √
Generic [128] 𝜒
√
Metabolic [129] 𝜒 𝜒
identifying such complex abbreviations [142]. However, no study has Levenshtein distance is a straightforward method for replacing all
effectively filtered medical acronyms by removing stop words. misspelt words with dictionary words. While many dictionary words
can have the same edit distance, replacing words based solely on Lev-
5.2.2. Spelling correction and negation detection enshtein distance [143] does not result in higher accuracy. Correcting
Terms in medical summary documents are misspelt for two reasons: misspelt words with those the language model suggests significantly
clerical errors and Optical Character Recognition (OCR) errors. The improves accuracy. Since medical texts use a distinct language than
15
Fig. 10. Potential challenges of clinical natural language processing.
other texts, language models developed using general English do not 5.2.4. Sensitivity of medical data and privacy
work well in this case. Clinical notes contain detailed information about patient–physician
Shannon’s Noisy Channel Model [144] is the most effective de- interactions. During these exchanges, patients reveal their health diffi-
velopment in NLP in terms of recent advances in medical spelling culties, eating habits and potentially stigmatising disorders. The Health
correction. This technique uses an extensive dictionary built from Insurance Portability and Accountability Act (HIPAA) privacy law guar-
numerous sources. This model predicates named entity recognition antees the privacy of personal medical information in the United States.
so that misspellings are not incorrectly corrected. This spell checker In addition, the European Union’s General Data Protection Regulation
was applied to three distinct forms of free-text data: clinical notes, (GDPR) establishes regulations for using health data for scientific pur-
allergy entries, and pharmaceutical orders. The efficacy of this model poses. These legislative moves have immediate implications for NLP
is that it is capable of high-performance spelling correction in various research, with informed consent from individuals and sanitisation of
clinical narratives. To the best of our knowledge, no suitable tools for sensitive data categories being paramount. The GDPR outlines broad
medical spelling checking have yet been widely produced; therefore, principles regarding the processing of confidential data, including that
it is essential to build further tools, such as Shannon’s noisy channel the processing must be fair, transparent, and lawful (i.e., with consent),
model [144], to overcome the highlighted issues. carried out for specific and legitimate purposes, and the data should
On the other hand, a linguistic phenomenon known as negation be retained for no longer than is required. This is known as data min-
causes sentences to have their meanings reversed. The negation term imisation, which includes sanitisation. ‘‘Special categories of personal
determines if a finding in the clinical narrative has to be annotated data are primarily employed in the scientific analysis’’. The processing
as a finding or should be excluded. For example, Kundeti et al. [145] of private information is only permissible with the subject’s express
demonstrated that the significance of the findings is altered by the use consent or after the person has made the information public. Generally,
of qualifiers and negation terms. For instance, a cyst is a finding in the ‘‘scientific usage" refers to basic, applied, privately funded research and
statement "cyst detected in the lungs’’, but it is no longer a finding in
technological innovation.
the sentence "No cyst identified in the lungs’’. Another example NegEx
Methods of sanitisation are frequently regarded as the bare mini-
developed by Mehrabi et al. [146]. NegEx is an algorithm for negation
mum for protecting the privacy of individuals when collecting data. The
detection that has proven effective in clinical NLP. NegEx fails to ac-
goal is to utilise a technology that generates entirely new copies of the
curately determine the negation status of concepts in complex phrases
dataset that appear real for data analysis while protecting the privacy
because it disregards the contextual relationship between words inside
of the individuals in the dataset to a certain extent, depending on the
a sentence.
technique. The sanitisation approach has been criticised for numerous
reasons, even though it is a vital step in protecting patient privacy.
5.2.3. Lack of medical data
Initially, both the data’s value and integrity are compromised. Second,
Data shortages have become a significant obstacle for medical NLP
while sanitisation promotes data access and sharing, achieving this is
research. Adopting supervised ML models successfully solves a variety
of healthcare challenges and sufficient training data is a precondition not always adequate. This is primarily due to the possibility that the
for deploying supervised machine learning algorithms. However, many deductive discovery could result in the re-identification of the original
health systems are hesitant to share confidential patient data due to sensitive data.
ethical, privacy, and liability concerns.
Most of the study data in studies we reviewed were acquired directly 5.3. Future research directions
from hospitals, rather than from more convenient and accessible online
repositories such as Kaggle or the UCI Machine Learning Repository; This sub-section highlights gaps and provides future research direc-
hence, data scarcity is now a hurdle for the scientific community, par- tions for various aspects of natural language processing in electronic
ticularly for clinical NLP research. Therefore, modern techniques such health records.
as transformer-based models and cutting-edge deep learning algorithms
are not commonly utilised in this field. This leads us to the following 5.3.1. Model assessment and point of view of adopted models
question: how much data is required to perform research in medical The existing literature did not validate their proposed models using
NLP? The minimum amount of data needed for an AI study cannot be model validation indicators, such as K-Fold Cross Validation, as the
established with any degree of accuracy. It goes without saying that first point in this regard. A model may contain both generalisation and
the nature of a project significantly impacts how much data is needed. overfitting errors. When a model is overfitted, it performs exceptionally
Text, images, and videos, for instance, typically require considerable well on the training data but fails when presented with new data.
data. To generate an accurate estimate, however, additional criteria, Generalisation is the term used to describe a model’s performance on
such as the number of anticipated categories and model performance, new data. Therefore, evaluating ML models before applying them to an
must be addressed. algorithm is essential. Note that K– Fold Cross Validation is an effective
16
method for mitigating these underlying problems [147]. The K– Fold federation, etc. However, the current implementation does not sup-
Cross Validation method is used to develop an ML model on multiple port laws, regulations, and industry standards. Specifically, federated
subsets of the same dataset, resulting in different prediction accuracy learning, also known as collaborative learning, is an ML technique that
for each subgroup. This approach allows us to assess how much the ac- uses several distributed edge devices or servers that store local data
curacy performance of a model differs for distinct data and the average samples to train an algorithm without transferring the data samples.
accuracy of other data. Cross Validation has distinctive characteristics, Multi-Party Computing (MPC) is a cryptographic technique that allows
such as the fact that different folds have various efficacy, so it is multiple parties to perform computations using combined data without
possible to estimate how well a proposed model will perform overall, revealing their inputs. Currently, these methods address various privacy
and overfitting can be eliminated using this technique [148]. Also, and security concerns [151]. Therefore, organisations must encourage
regularisation strategies like as dropout, L1/L2, capacity reduction, and developing and enhancing standards for federated learning and secure
early stopping are primarily used to combat overfitting. multi-party computing.
In contrast, TF-IDF and BOW approaches were used in many of the In addition to these issues, AutoML typically encounters problems
reviewed research articles to extract features from the free text in EHRs, with data and model applications. For instance, insufficient high-
indicating that relatively little research has utilised cutting-edge word quality labelled data and data inconsistencies will hurt offline data
embedding techniques. The word2vec method was utilised in a few ar- analysis. Automating the processing of unstructured and semi-structured
ticles, but its prevalence was negligible compared to traditional feature data by machine learning is necessary but technically challenging. The
extraction methods. Given conventional feature extraction methods, current optimisation objectives for the AutoML system are predefined.
acquiring semantic information is challenging; however, this difficulty Multiple purposes, such as differentiating between decision-making
can be mitigated by introducing other advanced word embedding and cost, frequently present a challenge. This multi-objective inves-
methods such as FastText, Glove, and BERT. tigation has limited analysis options before yielding effective results.
In recent years, this has been a top priority for clinical note analyses, The actual business may have specialised data processing requirements
including the identification of goal-of-care documentation in EHRs for the existing machine learning process. In the current Black Box
and suicide prediction from large-scale clinical records. A number AutoML solution, such conditions are poorly handled. Consequently,
of deficiencies must be addressed in order to improve the adopted it may be necessary to adopt a new solution capable of mitigating the
solutions, despite the satisfactory performance achieved in these con- shortcomings of the current Black Box system.
texts. Concerning the identification of goal-of-care discussions in EHRs, On the contrary, many studies indicate that researchers frequently
it is evident that algorithms with extremely high positive predictive concentrate on developing a model rather than deploying it in a pro-
values may not achieve sufficient positive predictive values. Future duction environment. Implementing a simulation and deploying it in
research should consider using a word- or phrase-level annotation of production are two distinct processes, as a model may perform well dur-
training data (such as identifying specific sections of a note containing ing the simulation phase but contain numerous errors when analysing
documentation of goal-care), developing ontologies to interpret goal- real-time data. Therefore, model deployment is essential for deter-
care discussions, and investigating cutting-edge techniques to eliminate mining the efficacy of proposed models. If an End-to-end model can
training data biases that differ from real-world data. be developed, it will significantly contribute to clinical settings and
Regarding suicidal prediction, a multitude of obstacles has hin- provide an effective tool for more efficiently analysing complex patient
dered understanding, forecasting, and preventing suicidal behaviour. data.
First, there is a lack of knowledge regarding the actual suicide predic-
tors [38]. Numerous risk factors have been identified, including mental 5.3.3. Medical data imbalance and data shortage
illness, youth, and a history of suicide behaviour. However, these In Table 3, we demonstrate that researchers did not specify which
characteristics have limited predictive accuracy for suicidal conduct text preprocessing pipeline settings were selected to handle unstruc-
and account for a minor percentage of the variance. Second, there tured EHRs. This is an essential deficiency in reporting this type of
is no recognised model for identifying the relationship between risk research, as cleaning medical text differs significantly from cleaning
factors and suicide conduct. Apart from these, rarity of suicide-rare other data. For example, text normalisation or stop word removal
events is harder to predict. It is important to make a distinction between methods are available that perform correctly, but due to various ab-
suicide and suicide attempts, which re more common but might not breviations and terms in medical free text, applying current stop word
share the same predictors. This is an issue for researchers and clinicians or normalisation approaches is somewhat challenging and limits the
because there is no technique that doctors can use to combine data ability to obtain accurate results. Likewise, when working with any
when determining if a patient is likely to attempt suicide in the near text data, an imbalanced dataset creates several issues [152]. Typi-
future. Medical practitioners must rely on intuition, which is no more cally, ‘‘imbalanced data’’ refers to a categorisation issue in which the
reliable than random chance in predicting suicide behaviour. On the classes are not represented equally. Before applying the data to the ML
basis of the above discussion, it is necessary to obtain longitudinal data system, it is recommended to address the imbalance issue in clinical
from large data samples that can be used to create and test novel models free text in EHRs. The problem of data imbalance can be alleviated
of suicide risk. However, if suicide researchers examine the notion, it by utilising techniques such as the Synthetic Minority Oversampling
will indicate a possible new direction for future research and clinical Technique (SMOTE). This can increase the number of cases from the
treatment. dataset and reduce the medical data imbalance problem by combining
oversampling and undersampling [153]. In addition, Class disparity can
5.3.2. Perspectives on ML integration and automl be addressed through cost-sensitive training and sampling technique.
Various businesses hire data scientists for data processing and However, a significant deficiency in the reviewed literature was that
decision-making [149]. Data scientists sometimes lack interdisciplinary the solutions to address these issues were not described.
experience, particularly in clinical natural language processing [150]. On the other hand, the scarcity of medical data is one of the current
It is uncommon to find experts with such insight. obstacles. Utilising synthetic data is a possible solution [154–156]. It
Although AI-integrated tools are widely accepted, several flaws, may provide a safer method of development for clinical data. Synthetic
such as business challenges and security, have been identified. Security data is frequently employed when there is insufficient actual data or not
is one of the most frequently discussed areas of research in AutoML. enough to identify specific patterns. Both training and testing datasets
Regarding the security of AutoML, businesses are investigating various utilise it in the same manner. Transfer learning techniques can be
technological solutions, such as automatic machine learning for pri- used as a substitute when there is an absence of training data for the
vacy protection, automated multi-party machine learning, automated target domain, and there are few or no exact matches between the
17
Table 9
Commonly utilised technical terms used in this systematic review.
Technical Terms Definition
ROC-AUC Receiver Operating Characteristics-Area Under The Curve (ROC-AUC).
These curve plots the true positive ratio against the false positive rate at
different threshold values.
K– Fold Cross Validation ML models are created by creating multiple subsets of the same dataset using K– Fold Cross
Validation.
Traditional ML Traditional ML models can utilised used to resolve classification, regression, clustering, dimension
reduction problems. For examples: linear regression, logistic regression, naive bayes.
AutoML Automated machine learning-enabled tools are used to complete a variety of tasks,
especially from clinical tasks to other classification and prediction tasks.
Feature Extraction It is used for transforming raw data into numerical features
Word Embedding Word embedding is used for the representation of words for text analysis
TF-IDF, Word2vec, Glove Term Frequency-Inverse Document Frequency (TF-IDF).
Global Vectors for word representation (Glove).
These are used to convert text data to numeric form to to apply the ML algorithm.
FastText Fasttext is an open source, free of charge, lightweight library that lets users learn text representation
and text classification.
BOW Bag of Words (BOW) is a classical word representation technique.
Free Text Free text of electronic medical notes is considered a rich source for healthcare operations and research
Overfitting and Underfitting Overfitting indicates that a model performs satisfactorily in training data, but performs poorly
in new data. However, underfitting works poorly on both datasets.
Bayesian Optimisation Bayesian optimisation methods are effective because they choose hyper parameters in a known
manner.
Stop Words In the case of classifying text documents, some terms do not contain the actual meaning
to be used in the classification model. For example: {‘‘a’’, ‘‘however moreover’’, ‘‘is the’’,
‘‘afterwards’’,‘‘again’’, etc. .}
source and target domains. Lastly, the Naive Bayes algorithm, one of health records. We reviewed recent research on the following EHRs-
the simplest classifiers, should be more widely recognised for its utility NLP tasks: patient risk analysis/prediction, state-of-the-art architec-
when dealing with clinical data, as it learns surprisingly well from tures for analysing EHRs, medical text summarisation, and other NLP
relatively small data sets. applications such as clinical named entity recognition, blockchain-
based EHRs, mental health research, goals of care conversations, clini-
5.4. Limitations of the study cal chart review, negation identification, and medical language transla-
tion. In addition, we provide a list of automated ML-enabled tools used
A limitation of our study is that we did not consider grey liter-
by the healthcare industry and medical experts to support EHR-NLP
ature, which consists of academic papers such as theses and essays,
research. The highlight of our findings are as follows:
research and committee reports, official reports, conference articles,
and ongoing research [157]. Compared to scientific research, grey
1. Physiological disorders, such as dementia and geriatric mental
literature publications might be a more detailed source of information
as they can be longer and include more information because a typical health, have been identified as promising research areas and are
structure does not constrain them. Due to the heterogeneity of the the subject of ongoing research in which various models and
papers, no meta-analyses were included in this review. Another major methods for extracting features suited to these tasks are being
weakness of this study is the failure to assess publication bias which explored.
can occur for several reasons. Some researchers may decide not to 2. The literature review performed in this work shows that SVM,
publish their findings if they discover that the data sets do not support boosting techniques, LR, LSTM, RNN, and CNNs are appropriate
their hypothesis. In this case, they prefer to present study reports for analysing unstructured free text data for downstream EHRs
that support their incorrect hypothesis. When publication bias becomes applications.
widespread, favourable results are overrepresented in the scientific 3. We find that while deep learning algorithms have achieved great
literature, impairing our comprehension of any systematic investiga- success in the NLP sector, their application in the biological
tion. Comparing the findings of published and unpublished research realm remains difficult. In contrast to classic ML models, which
on the same subject is an efficient method for identifying publication are frequently used for health records, DL models present a num-
bias. Comparing results can reveal if there is a positive result bias
ber of disadvantages relating to data availability, the difficulty
across studies. As understanding current clinical NLP challenges and
of domain-specific textual data, and interpretability. Notable
the techniques used to analyse EHRs was our primary objective, we
is that DL-based algorithms require a large amount of data
did not assess publication bias at this stage.
to outperform other methods, as well as expensive GPUs and
6. Conclusion hundreds of workstations.
4. Cutting-edge NLP methods, such as transformer-based models
This study finds that recent advancements in Machine Learning for free text analysis, are yet to be used extensively, and conven-
and Deep Learning models can facilitate health informatics tasks on tional methods are currently preferred. Therefore, we wonder if
Electronic Health Records (EHR). We have concentrated on conduct- transformer-based techniques will become the de facto standard
ing a thorough analysis of natural language processing in electronic for clinical NLP.
18
Table 10
List of the abbreviations used in this manuscript.
Abbreviations Meaning
ROC-AUC Receiver Operating Characteristics-Area Under The Curve
ML Machine Learning
DL Deep Learning
NLP Natural Language Processing
TF-IDF Term Frequency-Inverse Document Frequency
Glove Global Vectors for word representation
BOW Bag of Words
GDPR General Data Protection Regulation
HIPAA The Health Insurance Portability and Accountability Act
BERT Bidirectional Encoder Representations from Transformers
CNN Convolutional Neural Network
TL Transfer Learning
ResNet Residual Neural Network
LSTM Long Short-Term Memory
BI-LSTM Bidirectional Long ShortTerm Memory
RNN Recurrent Neural Network
GRU Gated Recurrent Units
RL Representation Learning
LR Logistic Regression
SVM Support Vector Machine
XGBoost eXtreme Gradient Boosting
RF Random Forest
LR Linear Regression
NB Naïve Bayes
GB Gradient Boosting
DT Decision Tree
ICD-9 The International Classification of Diseases, Ninth Revision
ICD-10 International Classification of Diseases, 10th Revision
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses
ICU Intensive Care Unit
ABLSTM Gated Attention incorporated Bi-Directional Long Short-Term Memory
FCNN Fully Connected Neural Network
MIMIC Medical Information Mart for Intensive Care
D2V Document to Vector
CUIs Concept Unique Identifiers
CAM Confusion Assessment Method
LDA Latent Dirichlet Allocation
HIV Human Immunodeficiency Virus
LASSO Least Absolute Shrinkage and Selection
SMI Serious Mental Illness
EHR Electronic Health Record
XML Extensible Markup Language
CRIS Clinical Record Interactive Search
AMIA American Medical Informatics Association
Funding statement Appendix
This research received no specific grant from any funding agency in See Tables 9 and 10.
the public, commercial, or not-for-profit sectors.
References
CRediT authorship contribution statement [1] M.R. Cowie, J.I. Blomster, L.H. Curtis, S. Duclaux, I. Ford, F. Fritz, S. Goldman,
S. Janmohamed, J. Kreuzer, M. Leenay, et al., Electronic health records to
facilitate clinical research, Clin. Res. Cardiol. 106 (1) (2017) 1–9, http://dx.
Elias Hossain: Proposed the study, Conducted a literature search doi.org/10.1007/s00392-016-1025-6.
and completed a full-text review, finding gaps, challenges, citations, [2] H. Consultant, Why unstructured data holds the key to intelligent
healthcare systems [Internet], 2015, Atlanta (GA): HIT Consultant. URL
data analysis, illustrations, and future directions, Notified additional https://hitconsultant.net/2015/03/31/tapping-unstructured-data-healthcares-
corrections and clarifications. Rajib Rana: Given instructions to Rajib biggest-hurdle-realized/.
Rana for conducting a literature search, Reviewed and edited the study. [3] J. Liang, Y. Li, Z. Zhang, D. Shen, J. Xu, X. Zheng, T. Wang, B. Tang, J. Lei, J.
Zhang, Adoption of electronic health records (EHRs) in China during the past
Niall Higgins: Subsequently outlined the inclusion and exclusion cri-
10 years: Consecutive survey data analysis and comparison of Sino-American
teria for downloading and organising the papers, Reviewed and edited challenges and experiences, J. Med. Internet Res. 23 (2) (2021) e24813–e,
the study. Jeffrey Soar: Reviewed and edited the study. Prabal Datta http://dx.doi.org/10.2196/24813.
Barua: Reviewed and edited the study. Anthony R. Pisani: Reviewed [4] A. Hodgkins, J. Mullan, D. Mayne, C. Boyages, A. Bonney, Australian general
practitioners’ attitudes to the extraction of research data from electronic health
and edited the study. Kathryn Turner: Reviewed and edited the study.
records, Aust. J. Gen. Prac. 49 (3) (2020) 145–150, http://dx.doi.org/10.31128/
AJGP-07-19-5024.
[5] K. Cairns, M. Rawlins, S. Unwin, F. Doukas, R. Burke, E. Tong, A. Henderson,
Declaration of competing interest A.C. Cheng, Building on antimicrobial stewardship programs through integra-
tion with electronic medical records: The Australian experience, Infect. Dis.
Ther. 10 (1) (2021) 61–73, http://dx.doi.org/10.1007/s40121-020-00392-5.
The authors declare that they have no known competing finan-
[6] U. Naseem, M. Khushi, S.K. Khan, K. Shaukat, M.A. Moni, A comparative
cial interests or personal relationships that could have appeared to analysis of active learning for biomedical text mining, Appl. Syst. Innov. 4
influence the work reported in this paper. (1) (2021) 23, http://dx.doi.org/10.3390/asi4010023.
19
[7] Y.H. Bhosale, K.S. Patnaik, Application of deep learning techniques in diagnosis [29] Y. Luo, W.K. Thompson, T.M. Herr, Z. Zeng, M.A. Berendsen, S.R. Jonnala-
of COVID-19 (Coronavirus): A systematic review, Neural Process. Lett. (2022) gadda, M.B. Carson, J. Starren, Natural language processing for EHR-based
1–53, URL https://link.springer.com/article/10.1007/s11063-022-11023-0. pharmacovigilance: a structured review, Drug Saf. 40 (11) (2017) 1075–1089,
[8] Y.H. Bhosale, K.S. Patnaik, IoT deployable lightweight deep learning application http://dx.doi.org/10.1007/s40264-017-0558-6.
for COVID-19 detection with lung diseases using RaspberryPi, in: 2022 Inter- [30] A.K. Jabali, A. Waris, D.I. Khan, S. Ahmed, R.J. Hourani, Electronic health
national Conference on IoT and Blockchain Technology, ICIBT, IEEE, 2022, pp. records: Three decades of bibliometric research productivity analysis and some
1–6, URL https://ieeexplore.ieee.org/document/9807725. insights, Inform. Med. Unlocked (2022) 100872, http://dx.doi.org/10.1016/j.
[9] A.L. Beam, I.S. Kohane, Big data and machine learning in health care, JAMA imu.2022.100872.
319 (13) (2018) 1317–1318, URL https://jamanetwork.com/journals/jama/ [31] K. Ayre, H.G. Gordon, R. Dutta, J. Hodsoll, L.M. Howard, The prevalence and
article-abstract/2675024. correlates of self-harm in the perinatal period: a systematic review, J. Clin.
[10] Y.H. Bhosale, S. Zanwar, Z. Ahmed, M. Nakrani, D. Bhuyar, U. Shinde, Deep Psychiatry 81 (1) (2019) 15343, http://dx.doi.org/10.4088/JCP.19r12773.
convolutional neural network based COVID-19 classification from radiology X- [32] A. Bittar, S. Velupillai, A. Roberts, R. Dutta, Text classification to inform suicide
Ray images for IoT enabled devices, in: 2022 8th International Conference on risk assessment in electronic health records, in: MedInfo, 2019, pp. 40–44,
Advanced Computing and Communication Systems, Vol. 1, ICACCS, IEEE, 2022, http://dx.doi.org/10.3233/SHTI190179.
pp. 1398–1402, URL https://ieeexplore.ieee.org/document/9785113. [33] J. Downs, S. Velupillai, G. George, R. Holden, M. Kikoler, H. Dean, A.
[11] A.F. Leite, K.d.F. Vasconcelos, H. Willems, R. Jacobs, Radiomics and machine Fernandes, R. Dutta, Detection of suicidality in adolescents with autism spec-
learning in oral healthcare, PROTEOMICS–Clin. Appl. 14 (3) (2020) 1900040, trum disorders: developing a natural language processing approach for use
URL https://onlinelibrary.wiley.com/doi/full/10.1002/prca.201900040. in electronic health records, in: AMIA Annual Symposium Proceedings, Vol.
[12] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. 2017, American Medical Informatics Association, 2017, p. 641, URL https:
Cui, G. Corrado, S. Thrun, J. Dean, A guide to deep learning in healthcare, Nat. //www.ncbi.nlm.nih.gov/pmc/articles/PMC5977628/.
Med. 25 (1) (2019) 24–29, http://dx.doi.org/10.1038/s41591-018-0316-z. [34] H.D. Anderson, W.D. Pace, E. Brandt, R.D. Nielsen, R.R. Allen, A.M. Libby,
[13] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hut- D.R. West, R.J. Valuck, Monitoring suicidal patients in primary care using
ter, Efficient and robust automated machine learning, Adv. Neural Inf. electronic health records, J. Am. Board Fam. Med. 28 (1) (2015) 65–71, URL
Process. Syst. 28 (2015) URL https://proceedings.neurips.cc/paper/2015/file/ https://www.jabfm.org/content/28/1/65.
11d0e6287202fced83f79975ec59a3a6-Paper.pdf. [35] K. Ayre, A. Bittar, J. Kam, S. Verma, L.M. Howard, R. Dutta, Developing a
[14] J. Waring, C. Lindvall, R. Umeton, Automated machine learning: Review of the natural language processing tool to identify perinatal self-harm in electronic
state-of-the-art and opportunities for healthcare, Artif. Intell. Med. 104 (2020) healthcare records, PLoS One 16 (8) (2021) e0253809, http://dx.doi.org/10.
101822, http://dx.doi.org/10.1016/j.artmed.2020.101822. 1371/journal.pone.0253809.
[15] A.A. Borkowski, C.P. Wilson, S.A. Borkowski, L.B. Thomas, L.A. Deland, S.J. [36] B.E. Belsher, D.J. Smolenski, L.D. Pruitt, N.E. Bush, E.H. Beech, D.E. Workman,
Grewe, S.M. Mastorides, Google AutoML versus Apple CreateML for histopatho- R.L. Morgan, D.P. Evatt, J. Tucker, N.A. Skopp, Prediction models for suicide
logic cancer diagnosis; which algorithms are better? 2019, arXiv preprint attempts and deaths: a systematic review and simulation, JAMA Psychiatry 76
arXiv:1903.08057. http://dx.doi.org/10.48550/arXiv.1903.08057. (6) (2019) 642–651, http://dx.doi.org/10.1001/jamapsychiatry.2019.0174.
[37] G.E. Simon, E. Johnson, J.M. Lawrence, R.C. Rossom, B. Ahmedani, F.L. Lynch,
[16] A. Choudhary, A. Choudhary, S. Suman, NLP applications for big data analytics
A. Beck, B. Waitzfelder, R. Ziebell, R.B. Penfold, et al., Predicting suicide
within healthcare, in: S. Mishra, H. Tripathy, P. Mallick, K. Shaalan (Eds.),
attempts and suicide deaths following outpatient visits using electronic health
Augmented Intelligence in Healthcare: A Pragmatic and Integrated Analysis,
records, Am. J. Psychiatry 175 (10) (2018) 951–960, http://dx.doi.org/10.
Springer, Singapore, 2022, pp. 237–257, http://dx.doi.org/10.1007/978-981-
1176/appi.ajp.2018.17101167.
19-1076-0_13.
[38] C.G. Walsh, J.D. Ribeiro, J.C. Franklin, Predicting suicide attempts in adoles-
[17] C. Ulrich, C. Grady, G. Demiris, T. Richmond, The competing demands of
cents with longitudinal clinical data and machine learning, J. Child Psychol.
patient privacy and clinical research, Ethics Hum. Res. 4 (1) (2021) 25–31,
Psychiatry 59 (12) (2018) 1261–1270, http://dx.doi.org/10.1111/jcpp.12916.
http://dx.doi.org/10.1002/eahr.500076.
[39] F.R. Tsui, L. Shi, V. Ruiz, N.D. Ryan, C. Biernesser, S. Iyengar, C.G. Walsh, D.A.
[18] N. Afzal, V.P. Mallipeddi, S. Sohn, H. Liu, R. Chaudhry, C.G. Scott, I.J.
Brent, Natural language processing and machine learning of electronic health
Kullo, A.M. Arruda-Olson, Natural language processing of clinical notes for
records for prediction of first-time suicide attempts, JAMIA Open 4 (1) (2021)
identification of critical limb ischemia, Int. J. Med. Inform. 111 (2018) 83–89,
ooab011, http://dx.doi.org/10.1093/jamiaopen/ooab011.
http://dx.doi.org/10.1016/j.ijmedinf.2017.12.024.
[40] N.J. Carson, B. Mullin, M.J. Sanchez, F. Lu, K. Yang, M. Menezes, B.L. Cook,
[19] B. Galatzan, J. Carrington, S. Gephart, Testing the use of natural language
Identification of suicidal behavior among psychiatrically hospitalized adoles-
processing software and content analysis to analyze nursing hand-off text data,
cents using natural language processing and machine learning of electronic
Comput. Inform. Nurs. 39 (8) (2021) 411–417, http://dx.doi.org/10.1097/CIN.
health records, PLoS One 14 (2) (2019) e0211116, http://dx.doi.org/10.1371/
0000000000000732.
journal.pone.0211116.
[20] T. Tyagi, NeuraHealthNLP: An automated screening pipeline to detect undi- [41] G.K. Savova, J.J. Masanz, P.V. Ogren, J. Zheng, S. Sohn, K.C. Kipper-Schuler,
agnosed cognitive impairment in electronic health records with deep learning C.G. Chute, Mayo clinical Text Analysis and Knowledge Extraction System
and natural language processing, 2022, arXiv preprint arXiv:2202.00478. http: (cTAKES): architecture, component evaluation and applications, J. Am. Med.
//dx.doi.org/10.48550/arXiv.2202.00478. Inform. Assoc. 17 (5) (2010) 507–513, http://dx.doi.org/10.1136/jamia.2009.
[21] M. Chowdhury, E.G. Cervantes, W.-Y. Chan, D.P. Seitz, Use of machine learning 001560.
and artificial intelligence methods in geriatric mental health research involving [42] D.J. Feller, J. Zucker, M.T. Yin, P. Gordon, N. Elhadad, Using clinical notes
electronic health record or administrative claims data: A systematic review, and natural language processing for automated HIV risk assessment, J. Acquir.
Front. Psychiatry 12 (2021) http://dx.doi.org/10.3389/fpsyt.2021.738466. Immune Defic. Syndr. (1999) 77 (2) (2018) 160, URL https://www.ncbi.nlm.
[22] Y. Juhn, H. Liu, Artificial intelligence approaches using natural language nih.gov/pmc/articles/PMC5762388/.
processing to advance EHR-based clinical research, J. Allergy Clin. Immunol. [43] S. Fu, G.S. Lopes, S.R. Pagali, B. Thorsteinsdottir, N.K. LeBrasseur, A. Wen, H.
145 (2) (2020) 463–469. Liu, W.A. Rocca, J.E. Olson, J. St. Sauver, et al., Ascertainment of delirium
[23] T. Ahmed, M.M.A. Aziz, N. Mohammed, De-identification of electronic health status using natural language processing from electronic health records, J.
record using neural network, Sci. Rep. 10 (1) (2020) 1–11, http://dx.doi.org/ Gerontol. Ser. A 77 (3) (2022) 524–530, http://dx.doi.org/10.1093/gerona/
10.1038/s41598-020-75544-1. glaa275.
[24] S. Wu, K. Roberts, S. Datta, J. Du, Z. Ji, Y. Si, S. Soni, Q. Wang, Q. Wei, [44] Y. Deng, J.A. Pacheco, A. Chung, C. Mao, J.C. Smith, J. Zhao, W.-Q. Wei,
Y. Xiang, H. Xu, Deep learning in clinical natural language processing: a A. Barnado, C. Weng, C. Liu, et al., Natural language processing to identify
methodical review, J. Am. Med. Inform. Assoc. 27 (3) (2020) 457–470, http: lupus nephritis phenotype in electronic health records, 2021, arXiv preprint
//dx.doi.org/10.1093/jamia/ocz200. arXiv:2112.10821. http://dx.doi.org/10.48550/arXiv.2112.10821.
[25] H. Alzoubi, R. Alzubi, N. Ramzan, D. West, T. Al-Hadhrami, M. Alazab, A [45] M. Li, Z. Fei, M. Zeng, F.-X. Wu, Y. Li, Y. Pan, J. Wang, Automated ICD-9 coding
review of automatic phenotyping approaches using electronic health records, via a deep learning approach, IEEE/ACM Trans. Comput. Biol. Bioinform. 16
Electronics 8 (11) (2019) 1235, http://dx.doi.org/10.3390/electronics8111235. (4) (2018) 1193–1202, http://dx.doi.org/10.1109/TCBB.2018.2817488.
[26] Y. Juhn, H. Liu, Natural language processing to advance EHR-based clinical [46] A. Kormilitzin, N. Vaci, Q. Liu, A. Nevado-Holgado, Med7: a transferable clinical
research in Allergy, Asthma, and Immunology, J. Allergy Clin. Immunol. 145 natural language processing model for electronic health records, Artif. Intell.
(2019) http://dx.doi.org/10.1016/j.jaci.2019.12.897. Med. 118 (2021) 102086, http://dx.doi.org/10.1016/j.artmed.2021.102086.
[27] T.A. Koleck, C. Dreisbach, P.E. Bourne, S. Bakken, Natural language processing [47] S. Bird, E. Klein, E. Loper, Natural Language Processing with Python:
of symptoms documented in free-text narratives of electronic health records: Analyzing Text with the Natural Language Toolkit, O’Reilly Media, Inc,
a systematic review, J. Am. Med. Inform. Assoc. 26 (4) (2019) 364–379, 2009, URL https://www.amazon.com/Natural-Language-Processing-Python-
http://dx.doi.org/10.1093/jamia/ocy173. Analyzing/dp/0596516495.
[28] Y. Wang, L. Wang, M. Rastegar-Mojarad, S. Moon, F. Shen, N. Afzal, S. Liu, Y. [48] C.D. Manning, M. Surdeanu, J. Bauer, J.R. Finkel, S. Bethard, D. McClosky,
Zeng, S. Mehrabi, S. Sohn, et al., Clinical information extraction applications: The Stanford CoreNLP natural language processing toolkit, in: Proceedings of
a literature review, J. Biomed. Inform. 77 (2018) 34–49, http://dx.doi.org/10. 52nd Annual Meeting of the Association for Computational Linguistics: System
1016/j.jbi.2017.11.011. Demonstrations, 2014, pp. 55–60, URL https://aclanthology.org/P14-5010.pdf.
20
[49] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, [70] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao,
T. Rault, R. Louf, M. Funtowicz, et al., Transformers: State-of-the-art natural H. Poon, Domain-specific language model pretraining for biomedical natural
language processing, in: Proceedings of the 2020 Conference on Empirical language processing, ACM Trans. Comput. Healthc. (HEALTH) 3 (1) (2021)
Methods in Natural Language Processing: System Demonstrations, 2020, pp. 1–23, http://dx.doi.org/10.1145/3458754.
38–45, http://dx.doi.org/10.18653/v1/2020.emnlp-demos.6. [71] Y.H. Bhosale, K.S. Patnaik, PulDi-COVID: Chronic obstructive pulmonary (lung)
[50] J.D. Choi, Dynamic feature induction: The last gist to the state-of-the-art, diseases with COVID-19 classification using ensemble deep convolutional neural
in: Proceedings of the 2016 Conference of the North American Chapter of network from chest X-ray images to minimize severity and mortality rates,
the Association for Computational Linguistics: Human Language Technologies, Biomed. Signal Process. Control 81 (2023) 104445, http://dx.doi.org/10.1016/
2016, pp. 271–281, URL https://aclanthology.org/N16-1031.pdf. j.bspc.2022.104445.
[51] X. Li, M. Cui, J. Li, R. Bai, Z. Lu, U. Aickelin, A hybrid medical text [72] C. Lindvall, E.J. Lilley, S.N. Zupanc, I. Chien, B.V. Udelsman, A. Walling, Z.
classification framework: Integrating attentive rule construction and neural Cooper, J.A. Tulsky, Natural language processing to assess end-of-life quality
network, Neurocomputing 443 (2021) 345–355, http://dx.doi.org/10.1016/j. indicators in cancer patients receiving palliative surgery, J. Palliat. Med. 22 (2)
neucom.2021.02.069. (2019) 183–187, http://dx.doi.org/10.1089/jpm.2018.0326.
[52] P. Nigam, Applying Deep Learning to ICD-9 Multi-Label Classification from [73] D. Dorr, C.A. Bejan, C. Pizzimenti, S. Singh, M. Storer, A. Quinones, Identifying
Medical Records, Technical report, Stanford University, 2016, URL http:// patients with significant problems related to social determinants of health with
cs224d.stanford.edu/reports/priyanka.pdf. natural language processing, in: MEDINFO 2019: Health and Wellbeing E-
[53] D. Kici, G. Malik, M. Cevik, D. Parikh, A. Basar, A BERT-based transfer Networks for All, IOS Press, 2019, pp. 1456–1457, http://dx.doi.org/10.3233/
learning approach to text classification on software requirements specifications, SHTI190482.
in: Canadian Conference on AI, 2021, URL https://assets.pubpub.org/rrc10aja/ [74] E.T. Sholle, L.C. Pinheiro, P. Adekkanattu, M.A. Davila, S.B. Johnson, J. Pathak,
31621568150182.pdf. S. Sinha, C. Li, S.A. Lubansky, M.M. Safford, et al., Underserved populations
[54] A. Mulyar, O. Uzuner, B. McInnes, MT-clinical BERT: scaling clinical informa- with missing race ethnicity data differ significantly from those with structured
tion extraction with multitask learning, J. Am. Med. Inform. Assoc. 28 (10) race/ethnicity documentation, J. Am. Med. Inform. Assoc. 26 (8–9) (2019)
(2021) 2108–2115, http://dx.doi.org/10.48550/arXiv.2004.10220. 722–729, http://dx.doi.org/10.1093/jamia/ocz040.
[55] Y. Li, S. Rao, J.R.A. Solares, A. Hassaine, R. Ramakrishnan, D. Canoy, Y. Zhu, K. [75] T.A. Miller, P. Avillach, K.D. Mandl, Experiences implementing scalable, con-
Rahimi, G. Salimi-Khorshidi, BEHRT: transformer for electronic health records, tainerized, cloud-based NLP for extracting biobank participant phenotypes
Sci. Rep. 10 (1) (2020) 1–12, http://dx.doi.org/10.1038/s41598-020-62922-y. at scale, JAMIA Open 3 (2) (2020) 185–189, http://dx.doi.org/10.1093/
[56] L. Liu, J. Shen, M. Zhang, Z. Wang, J. Tang, Learning the joint representation of jamiaopen/ooaa016.
heterogeneous temporal events for clinical endpoint prediction, in: Proceedings [76] N. Hong, A. Wen, D.J. Stone, S. Tsuji, P.R. Kingsbury, L.V. Rasmussen, J.A.
of the AAAI Conference on Artificial Intelligence, Vol. 32, No. 1, 2018, http: Pacheco, P. Adekkanattu, F. Wang, Y. Luo, et al., Developing a FHIR-based EHR
//dx.doi.org/10.1609/aaai.v32i1.11307. phenotyping framework: A case study for identification of patients with obesity
[57] A.D. Costa, S. Denkovski, M. Malyska, S.Y. Moon, B. Rufino, Z. Yang, T. and multiple comorbidities from discharge summaries, J. Biomed. Inform. 99
Killian, M. Ghassemi, Multiple sclerosis severity classification from clinical (2019) 103310, http://dx.doi.org/10.1016/j.jbi.2019.103310.
[77] D. Van Le, J. Montgomery, K.C. Kirkby, J. Scanlan, Risk prediction using natural
text, 2020, arXiv preprint arXiv:2010.15316. http://dx.doi.org/10.48550/arXiv.
language processing of electronic mental health records in an inpatient forensic
2010.15316.
psychiatry setting, J. Biomed. Inform. 86 (2018) 49–58, http://dx.doi.org/10.
[58] A. Smit, S. Jain, P. Rajpurkar, A. Pareek, A.Y. Ng, M.P. Lungren, CheXbert:
1016/j.jbi.2018.08.007.
combining automatic labelers and expert annotations for accurate radiology
[78] J. Shi, S. Liu, L.C. Pruitt, C.L. Luppens, J.P. Ferraro, A.V. Gundlapalli, W.W.
report labeling using BERT, 2020, arXiv preprint arXiv:2004.09167. http://dx.
Chapman, B.T. Bucher, Using natural language processing to improve EHR
doi.org/10.48550/arXiv.2004.09167.
structured data-based surgical site infection surveillance, in: AMIA Annual Sym-
[59] A.E. Johnson, T.J. Pollard, N.R. Greenbaum, M.P. Lungren, C.-y. Deng, Y.
posium Proceedings, 2019, American Medical Informatics Association, 2019, p.
Peng, Z. Lu, R.G. Mark, S.J. Berkowitz, S. Horng, MIMIC-CXR-JPG, a large
794, URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153106/.
publicly available database of labeled chest radiographs, 2019, arXiv preprint
[79] S. Rajendran, U. Topaloglu, Extracting smoking status from electronic health
arXiv:1901.07042. http://dx.doi.org/10.48550/arXiv.1901.07042.
records using NLP and deep learning, in: AMIA Summits on Translational
[60] F. Portet, E. Reiter, A. Gatt, J. Hunter, S. Sripada, Y. Freer, C. Sykes,
Science Proceedings, 2020, American Medical Informatics Association, 2020,
Automatic generation of textual summaries from neonatal intensive care data,
p. 507, URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7233082/.
Artificial Intelligence 173 (7–8) (2009) 789–816, http://dx.doi.org/10.1016/j.
[80] S.S. Zhao, C. Hong, T. Cai, C. Xu, J. Huang, J. Ermann, N.J. Goodson,
artint.2008.12.002.
D.H. Solomon, T. Cai, K.P. Liao, Incorporating natural language processing
[61] M. Moradi, Small-world networks for summarization of biomedical arti-
to improve classification of axial spondyloarthritis using electronic health
cles, 2019, arXiv preprint arXiv:1903.02861. http://dx.doi.org/10.48550/arXiv.
records, Rheumatology 59 (5) (2020) 1059–1065, http://dx.doi.org/10.1093/
1903.02861.
rheumatology/kez375.
[62] D.J. McInerney, B. Dabiri, A.-S. Touret, G. Young, J.-W. Meent, B.C. Wal- [81] E. Kogan, K. Twyman, J. Heap, D. Milentijevic, J.H. Lin, M. Alberts, Assess-
lace, Query-focused ehr summarization to aid imaging diagnosis, in: Machine ing stroke severity using electronic health record data: a machine learning
Learning for Healthcare Conference, PMLR, 2020, pp. 632–659, URL https: approach, BMC Med. Inform. Decis. Mak. 20 (1) (2020) 1–8, http://dx.doi.
//proceedings.mlr.press/v126/mcinerney20a.html. org/10.1186/s12911-019-1010-x.
[63] Y. Zhang, D.Y. Ding, T. Qian, C.D. Manning, C.P. Langlotz, Learning to [82] L. Gligic, A. Kormilitzin, P. Goldberg, A. Nevado-Holgado, Named entity
summarize radiology findings, 2018, arXiv preprint arXiv:1809.04698. http: recognition in electronic health records using transfer learning bootstrapped
//dx.doi.org/10.48550/arXiv.1809.04698. neural networks, Neural Netw. 121 (2020) 132–139, http://dx.doi.org/10.
[64] P.K. Bharimalla, H. Choudhury, S. Parida, D.K. Mallick, S.R. Dash, A blockchain 1016/j.neunet.2019.08.032.
and NLP based electronic health record system: Indian subcontinent context, [83] E.H. Weissler, J. Zhang, S. Lippmann, S. Rusincovitch, R. Henao, W.S. Jones,
Informatica 45 (4) (2021) http://dx.doi.org/10.31449/inf.v45i4.3503. Use of natural language processing to improve identification of patients with
[65] R.Y. Lee, L.C. Brumback, W.B. Lober, J. Sibley, E.L. Nielsen, P.D. Treece, peripheral artery disease, Circul. Cardiovasc. Interv. 13 (10) (2020) e009447,
E.K. Kross, E.T. Loggers, J.A. Fausto, C. Lindvall, et al., Identifying goals http://dx.doi.org/10.1161/CIRCINTERVENTIONS.120.009447.
of care conversations in the electronic health record using natural language [84] N. Vaci, Q. Liu, A. Kormilitzin, F. De Crescenzo, A. Kurtulmus, J. Harvey,
processing and machine learning, J. Pain Symp. Manag. 61 (1) (2021) 136–142, B. O’Dell, S. Innocent, A. Tomlinson, A. Cipriani, et al., Natural language
http://dx.doi.org/10.1016/j.jpainsymman.2020.08.024. processing for structuring clinical text data on depression using UK-CRIS,
[66] S. Fu, C.C. Wyles, D.R. Osmon, M.L. Carvour, E. Sagheb, T. Ramazanian, Evidence Based Ment. Health 23 (1) (2020) 21–26, http://dx.doi.org/10.1136/
W.K. Kremers, D.G. Lewallen, D.J. Berry, S. Sohn, et al., Automated detection ebmental-2019-300134.
of periprosthetic joint infections and data elements using natural language [85] R.E. Leiter, E. Santus, Z. Jin, K.C. Lee, M. Yusufov, I. Chien, A. Ramaswamy,
processing, J. Arthroplasty 36 (2) (2021) 688–692, http://dx.doi.org/10.1016/ E.T. Moseley, Y. Qian, D. Schrag, et al., Deep natural language processing to
j.arth.2020.07.076. identify symptom documentation in clinical notes for patients with heart failure
[67] W.-H. Weng, Y.-A. Chung, P. Szolovits, Unsupervised clinical language trans- undergoing cardiac resynchronization therapy, J. Pain Symp. Manag. 60 (5)
lation, in: Proceedings of the 25th ACM SIGKDD International Conference on (2020) 948–958, http://dx.doi.org/10.1016/j.jpainsymman.2020.06.010.
Knowledge Discovery & Data Mining, 2019, pp. 3121–3131, http://dx.doi.org/ [86] P. Suryanarayanan, E.A. Epstein, A. Malvankar, B.L. Lewis, L. DeGenaro, J.J.
10.1145/3292500.3330710. Liang, C.-H. Tsou, D. Pathak, Timely and efficient AI insights on EHR: System
[68] A. Al-Aiad, R. Duwairi, M. Fraihat, Survey: deep learning concepts and design, in: AMIA Annual Symposium Proceedings, 2020, American Medical
techniques for electronic health record, in: 2018 IEEE/ACS 15th International Informatics Association, 2020, p. 1180, URL https://pubmed.ncbi.nlm.nih.gov/
Conference on Computer Systems and Applications, AICCSA, IEEE, 2018, pp. 33936494/.
1–5, http://dx.doi.org/10.1109/AICCSA.2018.8612827. [87] E. Steinberg, K. Jung, J.A. Fries, C.K. Corbin, S.R. Pfohl, N.H. Shah, Language
[69] J. Luo, Z. Zheng, H. Ye, M. Ye, Y. Wang, Q. You, C. Xiao, F. Ma, A benchmark models are an effective representation learning technique for electronic health
dataset for understandable medical language translation, 2020, arXiv preprint record data, J. Biomed. Inform. 113 (2021) 103637, http://dx.doi.org/10.1016/
arXiv:2012.02420. http://dx.doi.org/10.48550/arXiv.2012.02420. j.jbi.2020.103637.
21
[88] Q. Yuan, T. Cai, C. Hong, M. Du, B.E. Johnson, M. Lanuti, T. Cai, D.C. [106] S.K. Tedeschi, T. Cai, Z. He, Y. Ahuja, C. Hong, K.A. Yates, K. Dahal, C.
Christiani, Performance of a machine learning algorithm using electronic health Xu, H. Lyu, K. Yoshida, et al., Classifying pseudogout using machine learning
record data to identify and estimate survival in a longitudinal cohort of approaches with electronic health record data, Arthritis Care Res. 73 (3) (2021)
patients with lung cancer, JAMA Netw. Open 4 (7) (2021) e2114723, http: 442–448, http://dx.doi.org/10.1002/acr.24132.
//dx.doi.org/10.1001/jamanetworkopen.2021.14723. [107] C.R. Moore, S. Jain, S. Haas, H. Yadav, E. Whitsel, W. Rosamand, G. Heiss,
[89] M.D. Solomon, G. Tabada, A. Allen, S.H. Sung, A.S. Go, Large-scale identifi- A.M. Kucharska-Newton, Ascertaining Framingham heart failure phenotype
cation of aortic stenosis and its severity using natural language processing on from inpatient electronic health record data using natural language processing:
electronic health records, Cardiovasc. Digit. Health J. 2 (3) (2021) 156–163, a multicentre Atherosclerosis Risk in Communities (ARIC) validation study,
http://dx.doi.org/10.1016/j.cvdhj.2021.03.003. BMJ Open 11 (6) (2021) e047356, URL https://bmjopen.bmj.com/content/11/
[90] A.J. Steele, S.C. Denaxas, A.D. Shah, H. Hemingway, N.M. Luscombe, Machine 6/e047356.abstract.
learning models in electronic health records can outperform conventional [108] K. Jain, V. Prajapati, NLP/Deep learning techniques in healthcare for
survival models for predicting patient mortality in coronary artery disease, PLoS decision making, Prim. Health Care: Open Access 11 (3) (2021) 1–4,
One 13 (8) (2018) e0202344, http://dx.doi.org/10.1371/journal.pone.0202344. URL https://www.iomcworld.org/open-access/nlpdeep-learning-techniques-in-
healthcare-for-decision-making-66608.html.
[91] B.S. Glicksberg, R. Miotto, K.W. Johnson, K. Shameer, L. Li, R. Chen, J.T.
[109] W.-H. Weng, K.B. Wagholikar, A.T. McCray, P. Szolovits, H.C. Chueh, Medical
Dudley, Automated disease cohort selection using word embeddings from
subdomain classification of clinical notes using a machine learning-based
electronic health records, in: PACIFIC SYMPOSIUM on BIOCOMPUTING 2018:
natural language processing approach, BMC Med. Inform. Decis. Mak. 17 (1)
Proceedings of the Pacific Symposium, World Scientific, 2018, pp. 145–156,
(2017) 1–13, http://dx.doi.org/10.1186/s12911-017-0556-8.
http://dx.doi.org/10.1142/9789813235533_0014.
[110] Y. Ni, A. Bachtel, K. Nause, S. Beal, Automated detection of substance use
[92] C. Ye, T. Fu, S. Hao, Y. Zhang, O. Wang, B. Jin, M. Xia, M. Liu, X. Zhou, Q. information from electronic health records for a pediatric population, J. Am.
Wu, et al., Prediction of incident hypertension within the next year: prospective Med. Inform. Assoc. 28 (10) (2021) 2116–2127, http://dx.doi.org/10.1093/
study using statewide electronic health records and machine learning, J. Med. jamia/ocab116.
Internet Res. 20 (1) (2018) e22, http://dx.doi.org/10.2196/jmir.9268. [111] M.D. Kovacs, J. Mesterhazy, D. Avrin, T. Urbania, J. Mongan, Correlate: a PACS-
[93] M. Afshar, C. Joyce, D. Dligach, B. Sharma, R. Kania, M. Xie, K. Swope, E. and EHR-integrated tool leveraging natural language processing to provide
Salisbury-Afshar, N.S. Karnik, Subtypes in patients with opioid misuse: A prog- automated clinical follow-up, Radiographics 37 (5) (2017) 1451–1460, http:
nostic enrichment strategy using electronic health record data in hospitalized //dx.doi.org/10.1148/rg.2017160195.
patients, PLoS One 14 (7) (2019) e0219717, http://dx.doi.org/10.1371/journal. [112] Z. Zeng, Y. Deng, X. Li, T. Naumann, Y. Luo, Natural language process-
pone.0219717. ing for EHR-based computational phenotyping, IEEE/ACM Trans. Comput.
[94] T. Zheng, W. Xie, L. Xu, X. He, Y. Zhang, M. You, G. Yang, Y. Chen, A machine Biol. Bioinform. 16 (1) (2018) 139–153, http://dx.doi.org/10.1109/TCBB.2018.
learning-based framework to identify type 2 diabetes through electronic health 2849968.
records, Int. J. Med. Inform. 97 (2017) 120–127, http://dx.doi.org/10.1016/j. [113] R. Yacouby, D. Axman, Probabilistic extension of precision, recall, and F1
ijmedinf.2016.09.014. score for more thorough evaluation of classification models, in: Proceedings
[95] C.-S. Wu, C.-J. Kuo, C.-H. Su, S.-H. Wang, H.-J. Dai, Using text mining to of the First Workshop on Evaluation and Comparison of NLP Systems, 2020,
extract depressive symptoms and to validate the diagnosis of major depressive pp. 79–91, http://dx.doi.org/10.18653/v1/2020.eval4nlp-1.9.
disorder from electronic health records, J. Affect. Disord. 260 (2020) 617–623, [114] L. Faes, S.K. Wagner, D.J. Fu, X. Liu, E. Korot, J.R. Ledsam, T. Back, R.
http://dx.doi.org/10.1016/j.jad.2019.09.044. Chopra, N. Pontikos, C. Kern, et al., Automated deep learning design for medical
image classification by health-care professionals with no coding experience: a
[96] S.R. Jonnalagadda, A.K. Adupa, R.P. Garg, J. Corona-Cox, S.J. Shah, Text
feasibility study, Lancet Digit. Health 1 (5) (2019) e232–e242, http://dx.doi.
mining of the electronic health record: an information extraction approach
org/10.1016/S2589-7500(19)30108-6.
for automated identification and subphenotyping of HFpEF patients for clinical
[115] S. Kocbek, P. Kocbek, T. Zupanic, G. Stiglic, B. Gabrys, Using (automated)
trials, J. Cardiovasc. Transl. Res. 10 (3) (2017) 313–321, http://dx.doi.org/10.
machine learning and drug prescription records to predict mortality and
1007/s12265-017-9752-2.
polypharmacy in older type 2 diabetes mellitus patients, in: International
[97] D. Gavrilov, A. Gusev, I. Korsakov, R. Novitsky, L. Serova, Feature extraction
Conference on Neural Information Processing, Springer, 2019, pp. 624–632,
method from electronic health records in Russia, in: Conference of Open
http://dx.doi.org/10.1007/978-3-030-36808-1_68.
Innovations Association, FRUCT, No. 26, FRUCT Oy, 2020, pp. 497–500,
[116] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hyperband: A
URL https://webiomed.ru/media/publications_files/feature-extraction-method-
novel bandit-based approach to hyperparameter optimization, J. Mach. Learn.
from-electronic-health-records-in-russia.pdf.
Res. 18 (1) (2017) 6765–6816, URL https://www.jmlr.org/papers/volume18/
[98] R. Zhu, X. Tu, J. Huang, Using deep learning based natural language processing 16-558/16-558.pdf.
techniques for clinical decision-making with EHRs, in: Deep Learning Tech- [117] R.S. Olson, J.H. Moore, TPOT: A tree-based pipeline optimization tool for
niques for Biomedical and Health Informatics, Springer, 2020, pp. 257–295, automating machine learning, in: Workshop on Automatic Machine Learning,
http://dx.doi.org/10.1007/978-3-030-33966-1_13. PMLR, 2016, pp. 66–74, URL https://proceedings.mlr.press/v64/olson_tpot_
[99] M. Weiner, J. Weaver, P. Dexter, A. Roberts, Z. Liu, S. Hui, A. Church, I. 2016.
Doshi, K. Heithoff, A semi-automated approach to identifying chronic cough in [118] I. Tsamardinos, P. Charonyktakis, K. Lakiotaki, G. Borboudakis, J.C. Zenklusen,
electronic health records, Ann. Allergy Asthma Immunol. 121 (5) (2018) S57, H. Juhl, E. Chatzaki, V. Lagani, Just Add Data: Automated Predictive Model-
http://dx.doi.org/10.1016/j.anai.2018.09.187. ing and BioSignature Discovery, Cold Spring Harbor Laboratory, 2020, http:
[100] R. Sivarethinamohan, S. Sujatha, P. Biswas, Envisioning the potential of //dx.doi.org/10.1101/2020.05.04.075747, BioRxiv. arXiv:https://www.biorxiv.
Natural Language Processing (NLP) in Health Care Management, in: 2021 org/content/early/2020/05/05/2020.05.04.075747.full.pdf. URL https://www.
7th International Engineering Conference ‘‘Research & Innovation Amid Global biorxiv.org/content/early/2020/05/05/2020.05.04.075747.
Pandemic", IEC, IEEE, 2021, pp. 189–193, http://dx.doi.org/10.1109/IEC52205. [119] A. Alaa, M. Schaar, Autoprognosis: Automated clinical prognostic modeling
2021.9476131. via bayesian optimization with structured kernel learning, in: International
[101] C. Dymek, B. Kim, G.B. Melton, T.H. Payne, H. Singh, C.-J. Hsiao, Building the Conference on Machine Learning, PMLR, 2018, pp. 139–148, URL https://
evidence-base to reduce electronic health record–related clinician burden, J. proceedings.mlr.press/v80/alaa18b.html.
Am. Med. Inform. Assoc. 28 (5) (2021) 1057–1061, http://dx.doi.org/10.1093/ [120] G.S. Bhat, N. Shankar, I.M. Panahi, Automated machine learning based speech
jamia/ocaa238. classification for hearing aid applications and its real-time implementation
on smartphone, in: 2020 42nd Annual International Conference of the IEEE
[102] Y.-C. Shen, T.-C. Hsia, C.-H. Hsu, Analysis of electronic health records based
Engineering in Medicine & Biology Society, EMBC, IEEE, 2020, pp. 956–959,
on deep learning with natural language processing, Arab. J. Sci. Eng. (2021)
http://dx.doi.org/10.1109/EMBC44109.2020.9175693.
1–11, http://dx.doi.org/10.1007/s13369-021-05596-6.
[121] Y. Weng, T. Zhou, Y. Li, X. Qiu, Nas-unet: Neural architecture search for medical
[103] M. Levis, C.L. Westgate, J. Gui, B.V. Watts, B. Shiner, Natural language
image segmentation, IEEE Access 7 (2019) 44247–44257, http://dx.doi.org/10.
processing of clinical mental health notes may add predictive value to existing
1109/ACCESS.2019.2908991.
suicide risk models, Psychol. Med. 51 (8) (2021) 1382–1391, http://dx.doi.org/ [122] M. Puri, Automated machine learning diagnostic support system as a compu-
10.1017/S0033291720000173. tational biomarker for detecting drug-induced liver injury patterns in whole
[104] J. Irving, R. Patel, D. Oliver, C. Colling, M. Pritchard, M. Broadbent, H. slide liver pathology images, Assay Drug Dev. Technol. 18 (1) (2020) 1–10,
Baldwin, D. Stahl, R. Stewart, P. Fusar-Poli, Using natural language processing http://dx.doi.org/10.1089/adt.2019.919.
on electronic health records to enhance detection and prediction of psychosis [123] I.K. Kim, K. Lee, J.H. Park, J. Baek, W.K. Lee, Classification of pachychoroid
risk, Schizophr. Bull. 47 (2) (2021) 405–414, http://dx.doi.org/10.1093/schbul/ disease on ultrawide-field indocyanine green angiography using auto-machine
sbaa126. learning platform, Br. J. Ophthalmol. 105 (6) (2021) 856–861, URL https:
[105] N. Viani, R. Botelle, J. Kerwin, L. Yin, R. Patel, R. Stewart, S. Velupillai, A //bjo.bmj.com/content/105/6/856.
natural language processing approach for identifying temporal disease onset [124] M. Karaglani, K. Gourlia, I. Tsamardinos, E. Chatzaki, Accurate blood-based di-
information from mental healthcare text, Sci. Rep. 11 (1) (2021) 1–12, http: agnostic biosignatures for Alzheimer’s disease via automated machine learning,
//dx.doi.org/10.1038/s41598-020-80457-0. J. Clin. Med. 9 (9) (2020) 3016, http://dx.doi.org/10.3390/jcm9093016.
22
[125] I. Tsamardinos, P. Charonyktakis, K. Lakiotaki, G. Borboudakis, J.C. Zenklusen, [146] S. Mehrabi, A. Krishnan, S. Sohn, A.M. Roch, H. Schmidt, J. Kesterson, C.
H. Juhl, E. Chatzaki, V. Lagani, Just Add Data: Automated Predictive Modeling Beesley, P. Dexter, C.M. Schmidt, H. Liu, et al., DEEPEN: A negation detection
and Biosignature Discovery, Cold Spring Harbor Laboratory, 2020, http://dx. system for clinical text incorporating dependency relation into NegEx, J.
doi.org/10.1101/2020.05.04.075747, BioRxiv. Biomed. Inform. 54 (2015) 213–219, http://dx.doi.org/10.1016/j.jbi.2015.02.
[126] J. Dafflon, W.H. Pinaya, F. Turkheimer, J.H. Cole, R. Leech, M.A. Harris, S.R. 010.
Cox, H.C. Whalley, A.M. McIntosh, P.J. Hellyer, An automated machine learning [147] T.-T. Wong, P.-Y. Yeh, Reliable accuracy estimates from k-fold cross validation,
approach to predict brain age from cortical anatomical measures, Hum. Brain IEEE Trans. Knowl. Data Eng. 32 (8) (2019) 1586–1594, http://dx.doi.org/10.
Mapp. 41 (13) (2020) 3555–3566, http://dx.doi.org/10.1002/hbm.25028. 1109/TKDE.2019.2912815.
[127] X. Su, N. Chen, H. Sun, Y. Liu, X. Yang, W. Wang, S. Zhang, Q. Tan, J. Su, Q. [148] M.S. Santos, J.P. Soares, P.H. Abreu, H. Araujo, J. Santos, Cross-validation
Gong, et al., Automated machine learning based on radiomics features predicts for imbalanced datasets: avoiding overoptimistic and overfitting approaches
H3 K27M mutation in midline gliomas of the brain, Neuro-Oncol. 22 (3) (2020) [research frontier], IEEE Comput. Intell. Mag. 13 (4) (2018) 59–76, http:
393–401, http://dx.doi.org/10.1093/neuonc/noz184. //dx.doi.org/10.1109/MCI.2018.2866730.
[149] F. Smaldone, A. Ippolito, J. Lagger, M. Pellicano, Employability skills: Profiling
[128] G. Luo, B.L. Stone, M.D. Johnson, P. Tarczy-Hornoch, A.B. Wilcox, S.D. Mooney,
data scientists in the digital labour market, Eur. Manag. J. (2022) http://dx.
X. Sheng, P.J. Haug, F.L. Nkoy, et al., Automating construction of machine
doi.org/10.1016/j.emj.2022.05.005.
learning models with clinical big data: proposal rationale and methods, JMIR
[150] A. Sarker, M.A. Al-Garadi, Y.-C. Yang, J. Choi, A.A. Quyyumi, G.S. Martin, et
Res. Prot. 6 (8) (2017) e7757, http://dx.doi.org/10.2196/resprot.7757.
al., Defining patient-oriented natural language processing: A new paradigm for
[129] R. Ooms, M. Spruit, Self-service data science in healthcare with automated
research and development to facilitate adoption and use by medical experts,
machine learning, Appl. Sci. 10 (9) (2020) 2992, http://dx.doi.org/10.3390/
JMIR Med. Inform. 9 (9) (2021) e18471, URL https://www.ncbi.nlm.nih.gov/
app10092992.
pmc/articles/PMC8512184/.
[130] X. Liu, K. Xu, P. Xie, E. Xing, Unsupervised pseudo-labeling for extractive [151] Y. Li, Y. Zhou, A. Jolfaei, D. Yu, G. Xu, X. Zheng, Privacy-preserving federated
summarization on electronic health records, 2018, arXiv preprint arXiv:1811. learning framework based on chained secure multiparty computing, IEEE
08040. http://dx.doi.org/10.48550/arXiv.1811.08040. Internet Things J. 8 (8) (2020) 6178–6186, URL.
[131] D. Molla, C. Jones, V. Nguyen, Query focused multi-document summarisation [152] H. Lu, L. Ehwerhemuepha, C. Rakovski, A comparative study on deep learning
of biomedical texts, 2020, arXiv preprint arXiv:2008.11986. http://dx.doi.org/ models for text classification of unstructured medical notes with various levels
10.48550/arXiv.2008.11986. of class imbalance, BMC Med. Res. Methodol. 22 (1) (2022) 1–12, http://dx.
[132] A. kormilitzin, Med7: a transferable clinical natural language processing model doi.org/10.1186/s12874-022-01665-y.
for electronic health records, Artif. Intell. Med. 118 (2021) 102086, http: [153] Y. Wang, L. Sun, S. Subramani, CAB: classifying arrhythmias based on im-
//dx.doi.org/10.1016/j.artmed.2021.102086. balanced sensor data, KSII Trans. Internet Inform. Syst. (TIIS) 15 (7) (2021)
[133] Z.B. Miled, K. Haas, C.M. Black, R.K. Khandker, V. Chandrasekaran, R. Lipton, 2304–2320, http://dx.doi.org/10.3837/tiis.2021.07.001.
M.A. Boustani, Predicting dementia with routine care EMR data, Artif. Intell. [154] J. Guan, R. Li, S. Yu, X. Zhang, Generation of synthetic electronic medical
Med. 102 (2020) 101771, http://dx.doi.org/10.1016/j.artmed.2019.101771. record text, in: 2018 IEEE International Conference on Bioinformatics and
[134] G. Tsang, S.-M. Zhou, X. Xie, Modeling large sparse data for feature selection: Biomedicine, BIBM, IEEE, 2018, pp. 374–380, http://dx.doi.org/10.1109/BIBM.
hospital admission predictions of the dementia patients using primary care 2018.8621223.
electronic health records, IEEE J. Transl. Eng. Health Med. 9 (2020) 1–13, [155] S. Latif, R. Rana, S. Khalifa, R. Jurdak, J. Qadir, B.W. Schuller, Survey of
http://dx.doi.org/10.1109/JTEHM.2020.3040236. deep representation learning for speech emotion recognition, IEEE Trans. Affect.
Comput. (2021).
[135] Y. Shao, Q.T. Zeng, K.K. Chen, A. Shutes-David, S.M. Thielke, D.W. Tsuang,
[156] S. Latif, M. Asim, R. Rana, S. Khalifa, R. Jurdak, B.W. Schuller, Augmenting
Detection of probable dementia cases in undiagnosed patients using structured
generative adversarial networks for speech emotion recognition, 2020, arXiv
and unstructured electronic health records, BMC Med. Inform. Decis. Mak. 19
preprint arXiv:2005.08447.
(1) (2019) 1–11, http://dx.doi.org/10.1186/s12911-019-0846-4.
[157] A. Paez, Gray literature: An important resource in systematic reviews, J. Evid.
[136] E. Ford, J. Sheppard, S. Oliver, P. Rooney, S. Banerjee, J.A. Cassell, Automated
Based Med. 10 (3) (2017) 233–240, http://dx.doi.org/10.1111/jebm.12265.
detection of patients with dementia whose symptoms have been identified in
primary care but have no formal diagnosis: a retrospective case–control study
using electronic primary care records, BMJ Open 11 (1) (2021) e039248, URL
https://bmjopen.bmj.com/content/11/1/e039248.abstract. Elias Hossain holds a B.Sc. in software engineering degree and is presently enrolled
[137] L. Wang, L. Sha, J.R. Lakin, J. Bynum, D.W. Bates, P. Hong, L. Zhou, at North South University (NSU) in Dhaka to pursue an M.Sc. in computer science and
Development and validation of a deep learning algorithm for mortality engineering. His areas of interest in research are biomedical imaging, natural language
prediction in selecting patients with dementia for earlier palliative care inter- processing, and machine learning. Prior to starting his Masters at NSU, he worked in the
ventions, JAMA Netw. Open 2 (7) (2019) e196972, http://dx.doi.org/10.1001/ industry for a few years, primarily designing algorithms and architectures for artificial
jamanetworkopen.2019.6972. intelligence-based systems. He has published a number of scientific papers in reputable
[138] L.J. Anzaldi, A. Davison, C.M. Boyd, B. Leff, H. Kharrazi, Comparing clinician journals and conferences, including (Q1 & Q2). Throughout his undergraduate studies,
descriptions of frailty and geriatric syndromes using electronic health records: he competed in numerous national and international software project competitions and
a retrospective cohort study, BMC Geriatr. 17 (1) (2017) 1–7, http://dx.doi. received various awards. His record of publications attests to his creative engineering
org/10.1186/s12877-017-0645-7. and research abilities.
[139] H. Kharrazi, L.J. Anzaldi, L. Hernandez, A. Davison, C.M. Boyd, B. Leff, J.
Kimura, J.P. Weiner, The value of unstructured electronic health record data
in geriatric syndrome case identification, J. Am. Geriatr. Soc. 66 (8) (2018)
Rajib Rana received the B.Sc. degree in computer science and engineering from
1499–1507, http://dx.doi.org/10.1111/jgs.15411.
Khulna University, with the Prime Minister and President’s Gold Medal for outstanding
[140] T. Chen, M. Dredze, J.P. Weiner, H. Kharrazi, Identifying vulnerable older
achievements, and the Ph.D. degree in computer science and engineering from the
adult populations by contextualizing geriatric syndrome information in clinical
University of New South Wales, Sydney, Australia, in 2011. He received his Postdoctoral
notes of electronic health records, J. Am. Med. Inform. Assoc. 26 (8–9) (2019)
Training with the Autonomous System Laboratory, CSIRO, before joining the University
787–795, http://dx.doi.org/10.1093/jamia/ocz093.
of Southern Queensland (UniSQ) as a Research Scientist in 2015. Rajib started as an
[141] S. Sinha, F. McDermott, G. Srinivas, P. Houghton, Use of abbreviations by academic when he accepted the lecturer position at UniSQ in 2017.
healthcare professionals: what is the way forward? Postgrad. Med. J. 87 (1029) He is currently a Senior Advance Queensland Research Fellow and an Associate
(2011) 450–452, URL https://pmj.bmj.com/content/87/1029/450. Professor with the University of Southern Queensland (UniSQ). He is also the Director
[142] A. Jaber, P. Martínez, Disambiguating clinical abbreviations using a one-fits- of the IoT Health Research Program with the UniSQ, which capitalises on advancements
all classifier based on deep learning techniques, Methods Inf. Med. (2022) in technology and sophisticated information and data processing to understand disease
http://dx.doi.org/10.1055/s-0042-1742388. progression in chronic health conditions better and develop predictive algorithms for
[143] P.E. Black, Levenshtein distance, Dictionary of Algorithms and Data chronic diseases, such as mental illness and cancer. His current research interests
Structures [online], US National Institute of Standards and Technology, include Unsupervised Representation Learning, Adversarial Machine Learning, Re-
2008, URL https://www.researchgate.net/publication/246951886_Dictionary_ enforcement Learning, Federated Learning, Emotional Speech Generation, Natural
of_Algorithms_and_Data_Structures. Language Processing, and Domain Adaptation.
[144] K.H. Lai, M. Topaz, F.R. Goss, L. Zhou, Automated misspelling detection and
correction in clinical free-text records, J. Biomed. Inform. 55 (2015) 188–195,
http://dx.doi.org/10.1016/j.jbi.2015.04.008.
[145] S.R. Kundeti, J. Vijayananda, S. Mujjiga, M. Kalyan, Clinical named entity Niall Higgins has worked as a senior clinician for 30 years and is an Adjunct Associate
recognition: Challenges and opportunities, in: 2016 IEEE International Confer- Professor with Queensland University of Technology and Metro North Hospital and
ence on Big Data, Big Data, IEEE, 2016, pp. 1937–1945, http://dx.doi.org/10. Health Services. His background is in acute care and mental health research. His cross-
1109/BigData.2016.7840814. discipline Ph.D. from the University of Queensland, School of Medicine focused on
23
the use of technology applications in healthcare. He undertakes research using both 50 publications in various journals. He is an industry leader in ICT entrepreneurship in
quantitative and qualitative research methodologies including knowledge translation Australia and sits as an ICT advisory panel member of many organisations. Dr Barua is
studies. His current interest is in applying assistive technology approaches to clinical an Academic Dean at Queensland Institute of Higher Education, an Adjunct Professor
care in a mental health setting. at the University of Southern Queensland and an Honourary Industry Fellow at the
University of Technology Sydney.
Jeffrey Soar is Personal Chair in Human-Centred Technology at the University of

Southern Queensland. He spent most of his career at senior executive levels in
Anthony R. Pisani, Ph.D is an Associate Professor of Psychiatry and Pediatrics at
state, federal and New Zealand government where he held state-wide and national
the University of Rochester Center for the Study and Prevention of Suicide and
responsibility for information technology. He managed some of the largest public sector
the founder of SafeSide Prevention. Dr. Pisani’s career is devoted to preventing
ICT development projects in Australia and New Zealand and consulted to industry
suicide and promoting strength, recovery, and wellbeing. His work spans the suicide
and governments around the world. In his more recent 20-year career in academia
prevention continuum: upstream (enhancing school and military suicide prevention
he produced over 200 publications, supervised 30 doctoral degree research students
with technology); healthcare (workforce education and quality improvement to support
to completion and attracted the support of over 40 major research grants including
Zero Suicide); and crisis and treatment intervention for individuals at greatest risk. He
prestigious 7 ARC grants. He has been a Head of two different schools, research director,
currently leads a major randomised controlled trial funded by the US National Institutes
foundation director of eHealth research centres at two universities, and member of
of Health to test a brief intervention for suicide attempt survivors.
university academic board, research committee and chair of education committee. He
has degrees in information systems, social science, agriculture, education and a Ph.D.
in business. His current research interests are focused on the development of AI and
its application in healthcare and other domains. Kathryn Turner, MBBS, FRANZCP is a Psychiatrist and Executive Director of Metro
North Mental Health, a large mental health and addiction service in Brisbane, Australia.
She has had a longstanding interest in education, training, and continuous improvement
Prabal Datta Barua obtained Ph.D. (in Information systems) from the University of of quality and safety in healthcare systems, evaluation, and a focus on culture in the
Southern Queensland. He is an academic and accredited research supervisor at the workplace. Dr Turner led the implementation of the Zero Suicide framework in her
University of Southern Queensland with 15 years of teaching experience. Dr Barua previous service, the Gold Coast Mental Health and Specialist Services. Her publications
received research support from the Queensland Government Innovation Connections have demonstrated significant positive outcomes of this work. Dr Turner implemented
under the Entrepreneurs program to research ‘‘Cancer Recurrence Using Innovative a Restorative Just Culture framework into the service, providing support for consumers,
Machine Learning Approaches’’. Dr Barua is interested in AI technology development carers, clinicians and the organisation to heal, learn and improve following the loss of
in health, education, agriculture, and environmental science and published more than a consumer to suicide.
24

1 s2.0 S0010482523001142 Main

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

1 s2.0 S0010482523001142 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0010482523001142 Main

Uploaded by

Copyright:

Available Formats

Computers in Biology and Medicine 155 (2023) 106649

Contents lists available at ScienceDirect

Computers in Biology and Medicine

Natural Language Processing in Electronic Health Records in relation to

ARTICLE INFO ABSTRACT

Fig. 2. Medical NLP research around the globe.

Fig. 3. Clinical notes analysis’s architecture diagram.

Fig. 4. Architecture of the attention-based bi-directional LSTM.

Fig. 5. Framework for Blockchain-based Healthcare System [64].

Fig. 6. Analysing EHRs using DL and ML algorithms.

Fig. 8. Deep learning models and their cumulative frequency mapping.

Fig. 9. Trend of current medical NLP research.

Fig. 10. Potential challenges of clinical natural language processing.

Funding statement Appendix

Jeffrey Soar is Personal Chair in Human-Centred Technology at the University of

You might also like