0% found this document useful (0 votes)
50 views8 pages

Automated Clinical Diagnosis: The Role of Content in Various Sections of A Clinical Document

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Automated Clinical Diagnosis: The Role of Content in


Various Sections of a Clinical Document

Vivek Datla, Sadid A. Hasan, Ashequl Qadir, Kathy Lee, Yuan Ling, Joey Liu, and Oladimeji Farri
Artificial Intelligence Laboratory, Philips Research North America, Cambridge, MA
Email: {firstname.lastname,kathy.lee 1,dimeji.farri}@philips.com

Abstract—Clinical diagnosis is a critical aspect of patient useful in various tasks including question answering (QA)
care that is typically driven by expert medical knowledge and and automated reasoning [4], [5], [6].
intuition. An automated system for clinical diagnosis could In this paper, we leverage articles under the “Clinical
reduce the cognitive burden of clinicians during patient care
and medical education. In this paper, we describe a Knowledge Medicine” category of Wikipedia to build a knowledge-
Graph (KG)-based clinical diagnosis system that leverages driven clinical diagnosis system. We use a Knowledge
publicly available knowledge sources to infer possible diagnoses Graph (KG)-based approach to accomplish this goal. Our
from free-text clinical narratives. We experiment with the system takes free-text description of a medical problem (a
content in various sections of a clinical document within the clinical narrative) as input and provides the most likely
electronic health record (EHR) to investigate the contribution
of each section to the performance of automated diagnosis diagnoses. We convert the link structure in Wikipedia into
systems. Evaluation on MIMIC-III dataset demonstrates that a knowledge graph where the nodes represent Wikipedia
the content of “history of present illness” and “past medical pages, hyperlinked concepts, and redirect pages, while edges
history” sections can play a greater role for clinical diagno- represent the relationships between them. We develop a
sis inference than other sections and all sections combined. query system on the knowledge graph that utilizes the
Comparison with a state-of-the-art deep learning-based clinical
diagnosis system confirms the effectiveness of our system. content of Wikipedia as well as the link structure in iden-
tifying the most probable diagnoses. The structure is used
Keywords-clinical diagnosis; knowledge graph; electronic to determine relationships among diseases and symptoms
health record;
while the content of Wikipedia pages is used to rank their
strength of association. Based on the strength of association,
I. I NTRODUCTION
the system generates a ranked list of diagnoses for a given
Clinical diagnosis is a critical and non-trivial aspect of medical problem.
patient care. Intuition based on past professional experiences Our experiments on MIMIC (Medical Information Mart
and knowledge gained from formal medical training typi- for Intensive Care)-III [7] discharge summaries demonstrate
cally drives the clinician’s ability to make a diagnosis [1]. that identifying relevant sections in these documents can lead
Although mimicking the intuition of clinicians can be very to a substantial gain in performance in inferring the most
challenging, an automated system designed for clinical diag- probable diagnoses. We observe that providing content from
nosis can support expert reasoning based on available knowl- all sections of the document to the system works poorly
edge sources, especially when trying to resolve complicated compared to using specific sections of the document. We
clinical scenarios. Such a system could significantly reduce also compare the KG-based system’s performance to the
the cognitive burden of clinicians during patient care so state-of-the-art Condensed Memory Networks (C-MemNN)-
they could be better-informed and adequately engage their based clinical diagnosis system [8] also trained on MIMIC-
patients towards achieving desired health outcomes [2], [3]. III dataset. Evaluation results reveal that our system per-
Available text-based knowledge sources for medicine forms better in some of the experiments.
include scientific publications and textbooks. However, a Given the increasing interest in artificial intelligence (AI)
significant proportion of these sources are proprietary and and clinical decision support systems within the machine
require formal and commercial agreements in place for learning and health informatics communities, our work helps
wide-spread use in automated systems for clinical decision identify the most appropriate information within electronic
support. Instead, we use Wikipedia as our knowledge source clinical documents that would drive automated diagnostic so-
given the fact it is publicly available and that medical lutions towards optimal accuracy leading to better-informed
and colloquial usage of medical terms are represented - a clinical decisions. Researchers in these communities can uti-
feature that may help build a robust computational model for lize findings of our paper to improve quality of training data
automated diagnostic inferencing. Furthermore, Wikipedia is when developing AI models to address complex reasoning
used by several researchers in the field of natural language tasks in patient care.
processing (NLP) as a rich multilingual knowledge base The main contributions of this paper can be summarized

978-1-5090-3050-7/17/$31.00 ©2017 IEEE 1004


as follows: ontologies proposed to address the problem. Examples of
• Construction of a KG-based clinical diagnosis inference successful open domain knowledge graph (KG) representa-
engine. tions include Freebase [25], YAGO [26], and DBpedia [27].
• Experiments with different sections of an EHR note to These KG representations have been successfully used to
identify sections that contribute the most for an accurate answer factoid-based questions such as ‘Who won the 2017
clinical diagnosis inference. super bowl?”, but it is not clear how to address non-
• Comparison with a state-of-the-art system [8] to show factoid questions such as “Why is there a drop in blood
the effectiveness of the proposed approach. pressure while a person reaches higher altitudes?”. Many
In the next sections, we link our work to the literature methods [28] convert the natural language question into a
and describe the proposed KG-based clinical diagnosis infer- structured query and then search efficiently over large col-
ence approach, followed by the detailed experimental setup, lection of resource description format (RDF) triples. These
results and discussion. Finally, we conclude the paper in methods have the underlying assumption that the answer is
Section VI. a node or a path in the knowledge graph. Also, they are
shown to work well in open domains, but it is not clear how
II. R ELATED W ORK they can be adapted for specific domains such as clinical
AI systems for clinical decision support have been previ- medicine.
ously developed using bio-signals from patients [9], [10], To facilitate the development of artificially intelligent
[11]. Such structured clinical data contain raw signals systems assisting the clinical diagnosis inferencing process,
without much context for accurate interpretation, whereas the Text Retrieval Conference (TREC) has recently initiated
unstructured free-text clinical documents contain detailed the clinical decision support (CDS) track3 that requires
descriptions of overall clinical scenarios. EHRs typically retrieval of relevant biomedical articles for a given clinical
store both structured clinical data (including physiological case narrative. The challenges of the task include answering
signals, vital signs, lab tests, etc.) in addition to unstructured three types of generic clinical questions: 1) Diagnosis (“what
text documents that contain a relatively more complete is the patient’s diagnosis?”), 2) Tests (“what tests should
picture of associated clinical events. the patient receive?”), and 3) Treatment (“how should the
Diagnostic inferencing from medical narratives has gained patient be treated?”). The organizers provided a collection
much attention in recent times [2], [3], [12], [13], [14], [15], of 30 topics for the task. One of the major challenges
[16], [17], [18]. Researchers have formulated the problem of building effective models for such intelligent clinical
of diagnostic inferencing from free-text as a document decision support applications arises from the unavailability
retrieval task (medical literature retrieval) [19], [20] or as of a large volume of annotated corpus for training and testing
a multiclass-multilabel classification task [8], [14]. In a the models [29].
document retrieval task, the goal is to obtain documents from More recently, recurrent neural networks (RNNs) have
a given database that mention and/or describe possible diag- been implemented in systems for clinical inferencing by
noses for the underlying clinical scenario. In the multiclass- utilizing structured clinical data [9], [10]. Prakash et al. [8]
multilabel classification approach, the classes are predefined created a novel C-MemNN model based on RNNs with a
to represent the most frequent diagnoses in the training set, memory component to enhance the possibility of arriving
and classification models are developed to analyze clinical at the potential diagnosis for a given medical problem. We
scenarios to generate a list of differential diagnoses. benchmark our KG-based diagnosis inference system against
To infer clinical diagnoses, a few research works have this state-of-the-art deep learning system to determine its
explored graph-based reasoning methods where the graphs effectiveness and recommend future directions for perfor-
incorporate relevant medical concepts and their associations. mance improvements.
Shi et al. [21] organized textual medical knowledge into Our KG-based system differs from existing reasoning
conceptual graphs and proposed a contextual information models with respect to the construction of the knowledge
pruning algorithm to conduct semantic reasoning over the graph, which we build incorporating medical concepts in
graph. Geng et al. [22] constructed a causal graph for Wikipedia represented by their corresponding pages. We
medical knowledge representation and inferred diagnosis also employ a hybrid activation-based querying mechanism,
over the graph, but focused on selected diseases. which is both knowledge-driven and data-driven. Further-
Overcoming the current limitations of improving the more, although there are few systems [30], [31] that extract
accuracy levels of Clinical QA, especially scenario-based section information directly from free-text in clinical narra-
analysis, [23], [24] may require leveraging domain expertise tives, to the best of our knowledge this is the first study that
from a variety of sources (e.g. domain-specific knowledge explores the impact of different sections in free text medical
bases). However, knowledge representation is one of the notes for predicting diagnoses from clinical narratives.
fundamental problems in Artificial Intelligence. Over the
decades of research, there have been several solutions and 3 http://www.trec-cds.org/

1005
III. I NFERRING C LINICAL D IAGNOSIS WITH can be mapped to their corresponding Wikipedia page. The
K NOWLEDGE G RAPH NLP engine uses medical ontologies such as SNOMED [33],
In this work, we introduce a novel hybrid approach to UMLS [34], and RadLex [35] for normalization. Compo-
address the clinical diagnostic inferencing problem. Figure 1 nents 1-2 in Figure 1 represent these steps.
shows the overall architecture of our system. We first build 2) Querying the Knowledge Graph: Next, we query the
a structured knowledge graph (KG) using contents from knowledge graph with the extracted symptoms for predicting
Wikipedia that are relevant for this problem. Given a clin- a set of diagnoses. However, not all symptoms contribute
ical narrative, we then identify the patient’s symptoms in equally in the prediction process. For example, symptoms
the narrative using an information extraction engine. The such as “fever” is very common and can occur with many
extracted symptoms are used to query the knowledge graph diseases, whereas “stridor” can be more uniquely associated
for predicting a set of diagnoses for the given narrative. The to respiratory diseases. So from the list of extracted symp-
following sections discuss the details of the knowledge graph toms, we need to identify the symptoms that are the most
construction and our method for predicting diagnoses from distinctive for determining potential diagnoses and weigh
a clinical narrative. them accordingly. For each extracted symptom, we query
the PubMed corpus of the 2014 TREC CDS track using
A. Knowledge Graph Construction Elasticsearch1 to retrieve its term frequency in the corpus,
For constructing our knowledge graph, we used Wikipedia and use the inverse of the term frequency as its weight.
as our knowledge source. We collected all documents under This signifies that the symptoms that are relatively rare in
the clinical medicine category in Wikipedia. This category the PubMed corpus will be assigned higher weights than the
served as the root node of our knowledge graph. The symptoms that are more frequent. Component 3 in Figure 1
subcategories and any page in Wikipedia under this root represents this step. The calculated weights are used to
category became the initial children nodes in our graph. activate these nodes in the knowledge graph at a later step.
The nodes representing the sub-categories might not have 3) Building the Solution Space: Our next step is to
had any content text, whereas the nodes representing the create a solution space within the knowledge graph that
pages had their content text. We further expanded the nodes consists of nodes representing symptoms, leading to nodes
recursively up to a depth of 10 using breadth-first search, representing candidate diagnosis. We conduct this in two
extracted all subcategories and pages, mining a total of stages: a) building a bare-bones sub-space, and b) expanding
188,139 Wikipedia pages from 17,121 categories. These the subspace to have a connected path between any two
pages and the categories were then added to the graph. nodes.
Some of the categories were verified by Domain experts a) Building the initial subspace: The solution space
as unrelated to clinical medicine, so we pruned our graph initially contains only the nodes representing the input
at these categories. Furthermore, we created an edge for query symptoms. We further include all of the immediate
any hyperlink associated with a term in any of the retrieved neighbors of the input symptom nodes in the initial solution
pages. These edges connected the page node that contained space. This process gives us several trees which may or
the term with the page node that was the hyperlink des- may not (a scattered forest) be connected. If the trees are
tination. The resulting knowledge graph (KG) contained a connected i.e. if there is an existing path between any two
total of 381,964 nodes and 1,906,302 edges. The constructed nodes of the graph, then we identify it as a connected forest.
knowledge graph is represented in the 4th component in This is represented in component 5 of Figure 1. If there is
Figure 1. no connected forest, then we perform the next step.
b) Expanding the initial subspace: If the resulting forest
B. Inferencing Diagnosis
is not connected then we expand the subspace. For this, we
1) Symptom Extraction from Clinical Narratives: The identify nodes that share common entities such as diseases,
diagnosis inferencing process begins with a clinical nar- medications, procedures and symptoms. We use a greedy
rative that describes the symptoms and any demographic approach to minimize the number of new nodes added to the
information of a patient. The clinical narrative is written solution space by expanding the nodes that would provide
in unstructured text, so these concepts need to be identified the maximum connectivity with a minimal number of nodes
and extracted before we can query the knowledge graph. added to our solution space. For implementing the greedy
For example, the clinical narrative may contain sentences approach we identify two nodes in the knowledge graph
such as “A 5-year-old boy has fever, cough, drooling, stridor, that have the smallest number of children and share at least
and dysphagia with voice change.” We use a hybrid clinical one child between them. This common child acts like a
NLP engine [32] to first identify and extract the symptoms: path between the two unconnected trees making it a single
fever, cough, drooling, stridor, dysphagia with voice change, connected graph. We repeat the process till the whole graph
and demographic information: 5-year-old boy. We also use
the NLP engine to normalize the symptoms so that they 1 https://www.elastic.co/products/elasticsearch

1006
 
  
 


 ! &

 "  
%   !  
!  "
  
  #
  #
 !

! 

  % 
%


  

 
  
#
"   
"%

 $ 

$ %
 "  !   !

Figure 1. Knowledge graph based clinical diagnosis system

becomes connected. their neighbors except to their parent node. The motivation
The solution space can grow exponentially in size if the for performing this step is to accumulate the weights from
expanded node is a very common symptom. Also, expanding symptoms to the nodes containing diagnosis.
a very common symptom adds many unrelated diagnosis and The control module is responsible for stopping an activa-
procedures to our solution space. By following the expansion tion if the propagated weight is below a certain threshold. We
strategy mentioned above, we overcome the risk of adding use a very small value 0.001 as our threshold, below which
the unrelated nodes to our solution space. Component 6 in the activation do not contribute much in the accumulation
Figure 1 represents this process. process. Also, the control module makes sure that there is no
4) Activating Nodes in the Solution Space: Next, we cyclic propagation of weights and keeps track of the nodes
start activating nodes in the solution space to find a set of to which the current node has passed the activation.
probable diagnoses. We start with the weights of the input The end result of this stage is a weighted graph, where
query symptoms. These weights are then spread across the each node is weighted based on the accumulation of the
knowledge graph with an activation module and a control proportion of weights propagated from the symptoms. Com-
module. ponents 7-8 in Figure 1 represent this stage.
The activation module takes the weight of a symptom 5) Identifying and Ranking Diagnoses: Since our ul-
node and propagates the weight to its immediate neighbor. timate goal is to infer a set of diagnoses, any node in
The propagated weight is dampened by a factor, so that the weighted graph that is not a disease/syndrome node is
the weight propagated from a node weakens (i.e. lessens) filtered out from our solution space. The remaining nodes
when a propagation happens farther away from the initial form the set of possible diagnoses for the input symptoms
symptom. All the nodes keep propagating the activation to retrieved from the clinical narrative. For each disease node,

1007
we check the signs and symptoms mentioned in Wikipedia discharge note (e.g. in Table I), we get the concepts shown
for that disease, and score the node based on the overlap in Table II.
of the symptoms in the clinical narrative and the content
of that disease/syndrome Wikipedia page. The diseases are There are 4,186 unique diagnoses in MIMIC-III discharge
then re-ranked based on the overlap score and they form the notes. However, many diagnoses (labels) occur in only a
candidate set of diagnoses for the symptoms. single note. The 50 most-common labels cover 97% of the
As the final ranking step, if the demographic information notes, and the 100 most-common labels cover 99.97%. We
of the patient is retrieved from the clinical narrative, then present experiments for both the 50 most-common and 100-
we mine the epidemiology of the disease mentioned in most common labels.
Wikipedia to identify the prevalence of the current diagnosis
in that age group. For example, if the disease is very B. Comparison of Sections of EHR as Clinical Narrative
prevalent in children but not in adults and the patient is We conducted extensive experiments to understand the
mentioned as an adult, then the rank of the disease is role of the content in a particular section of a clinical note to
pushed lower than the adult diseases in the list. Once the infer the correct diagnoses. Given the unstructured free-text
re-ranking process based on the epidemiology information in each section of the medical note as input, we measure
is completed, we get our final ranked list of diagnoses that the accuracy of the system in identifying the diagnoses.
are inferred for the given clinical narrative. These final steps In this study, we consider the following sections of a
are represented in components 9-10 in Figure 1. MIMIC-III note for these experiments: social history (i.e.,
behavioral information such as smoking, drinking, diabetes
IV. E XPERIMENTS AND E VALUATION
etc.), chief complaint (i.e., symptoms such as chest pain,
A. Dataset headache, dizziness, etc.), history of present illness, past
medical history, brief hospital course (i.e. information about
procedures and medications provided during the hospital
stay), and discharge medications.
We also compare our system to a state-of-the-art clinical
diagnosis inference system on the MIMIC-III dataset, which
uses Condensed Memory Neural Networks (C-MemNNs) [8]
to formulate the task as a multiclass-multilabel classification
problem. Due to a large number of diagnoses (class labels)
in the dataset, the C-MemNN model simplifies the task by
considering the most frequent N diagnoses for training. We
also adapt similar settings for our experiments.
C. Metrics
Figure 2. Distribution of diagnoses in MIMIC-III [8] We use precision and recall to evaluate our systems. For a
meaningful comparison, we consider two variations of these
We evaluate the system on discharge notes in MIMIC-III metrics: 1) strict (exact word match with the ground truth
database [36]. MIMIC-III contains physiological signals and diagnosis), and 2) relaxed (allowing paraphrases and disease
various measurements captured from patient monitors, and synonyms based on the human disease network [38]).
comprehensive clinical data obtained from hospital medical Recall that, our knowledge graph is built using the
information systems for over 58K hospital admissions. We medical concepts in Wikipedia, where the clinical concepts
use the note events table from MIMIC-III v1.3, which are mostly standardized and may be different from the
contains the free-text clinical notes for patients. We use ‘dis- abbreviated/colloquial usage of medical terms in a clinical
charge summaries,’ instead of ‘admission notes,’ as former note. For example, a MIMIC note may refer to “diabetes
contains actual ground truth and free-text. Table I shows an mellitus type 2” by mentioning “dm type 2”, “diabetes type
example discharge note used in this paper. The diagnoses 2”, “db 2” or “diabetes 2”. Since our approach considers
present in the MIMIC-III notes are very specific and are diagnosis concepts based on the Wikipedia page titles, a
not evenly distributed as shown in Figure 2. Many diseases strict measure of precision and recall based on exact word
appear very few times inside the corpus. overlap with the ground truth diagnosis may be insufficient
to measure the effectiveness of our systems. Hence, we
For the experiments, we have used a subset of 14K notes. introduced the relaxed alternatives of the metrics.
We processed the notes to extract the medical concepts from The precision at 5 represented as P@5 is the ratio of
the MIMIC notes based on SNOMED [37] using our hybrid correct diagnoses over the top five predictions. It should be
clinical NLP engine [32]. For example, after processing a noted that a MIMIC chart note can have many diagnoses,

1008
Table I
E XAMPLE OF A DISCHARGE NOTE IN MIMIC-III

Discharge Note (partially shown)


CHIEF COMPLAINT: Chest pain.
HISTORY OF PRESENT ILLNESS:A 41-year-old female with a history of
coronary artery bypass graft x3 in [**3216**]
who has experienced substernal chest pain
over the past two days. Patient initially attributed her discomfort to a cold.
This afternoon pain worsened then spread to her arms and neck. She planned
to see her doctor tomorrow, but due to this worsening of the pain,
the patient decided to come to the Emergency Department.
Discharge Diagnosis
1. Three vessel coronary artery disease.
2. Occluded saphenous vein graft to obtuse marginal.
3. Mild systolic and diastolic left ventricular dysfunction.
4. Acute inferior myocardial infarction managed by acutePTCA.
5. Successful Angio-Jet and stenting of the distal right coronary artery beyond
the saphenous vein graft-right coronary artery anastomosis.

Table II
M EDICAL CONCEPTS EXTRACTED USING A HYBRID CLINICAL NLP ENGINE [32]

Discharge Note
CHIEF COMPLAINT: Chest pain.
HISTORY OF PRESENT ILLNESS: 41-year-old female , coronary artery bypass,
chest pain, discomfort,cold, pain
Diagnosis
coronary artery disease, saphenous vein graft, myocardial infarction,
stenting, coronary artery, coronary artery anastomosis.

hence making this a very strict measure. The recall at 5 in MIMIC notes. Not surprisingly, the results show that
represented as R@5 measures how well we covered all history of present illness and past medical history have the
the possible diagnoses in the MIMIC note. In the relaxed most relevant information for identifying the diagnoses.
setting of a measure we consider that the predicted diagnosis
is correct even when the ground truth diagnoses and the We also compare our results with the C-MemNN [8]
predicted diagnoses are synonyms or paraphrases of each model. In their paper, the authors report the results using
other. three metrics: P@5, Area Under the Curve (AUC), and Ham-
ming loss. AUC and Hamming loss are not the appropriate
metrics for our experimental settings, so we use precision-
D. Results and Discussion and recall-based metrics for this comparison. Results show
that our systems have lower strict precision scores for the
Table III shows the results of our experiments with various
“top-50 classes” experiments. However, when we consider
sections of the medical chart note. From these results, we can
the top 100 classes, the All sections variant performs better
understand the role of content to infer the correct diagnoses.
than the C-MemNN system, which also uses all sections
We can see that the individual sections perform better (except the diagnosis section) as the input of their model.
than the combined sections (All). This can be attributed to Considering the relaxed precision metric, we find that the
the generality of some of the sections in the MIMIC notes, proposed KG-based system can perform better than the
where the procedures/medications apply to many diseases. C-MemNN model with the selective use of content from
Specifically, the brief hospital course section has many pro- various sections.
cedures that are common among several diseases, which may
have led it to lower scores. On the other hand, the discharge From our experiments, it is clear that all sections do not
medications section only covers the pain medications and contribute equally for clinical diagnosis inference. Hence,
may not be representative of the surgery or complications it might be difficult for a machine learning system to learn
the patient had due to pre-existing chronic conditions. the complex relationships among the medical concepts and
Further analysis shows that social history has a higher the diagnoses present in a clinical note. For the MIMIC-
score in the relaxed measure of precision when we consider III dataset, our experiments suggest that training the model
the top 100 classes. Also, this can be an aberration as on the past medical history and history of present illness
people with a social history of alcohol and smoking had sections could help a machine learning system improve
more chances of having diabetes, hypertension and other the accuracy of clinical diagnosis inference compared to
lifestyle diseases that are among the most common diseases considering the full clinical note.

1009
Table III
E VALUATION RESULTS (A LL = COMBINATION OF ALL CONSIDERED SECTIONS ; P= PRECISION ; R= RECALL ; S = STRICT; R = RELAXED ; TOP SCORES ARE
BOLDFACED ).

Top 50 Top 100


Sections of EHR
R@5(r) R@5(s) P@5(r) P@5(s) R@5(r) R@5(s) P@5(r) P@5(s)
Social history 0.667 0.177 0.403 0.082 0.798 0.104 0.469 0.048
Chief complaint 0.717 0.289 0.426 0.111 0.729 0.229 0.432 0.087
History of present illness 0.735 0.309 0.460 0.135 0.731 0.235 0.449 0.103
Past medical history 0.728 0.312 0.453 0.152 0.726 0.257 0.440 0.128
Brief hospital course 0.663 0.236 0.380 0.091 0.714 0.177 0.404 0.068
Discharge medications 0.589 0.183 0.343 0.087 0.533 0.113 0.300 0.055
All 0.669 0.286 0.436 0.135 0.799 0.219 0.429 0.103
C-MemNN [8] - - - 0.420 - - - 0.320

V. L IMITATIONS [2] Y. Ling, S. A. Hasan, V. Datla, A. Qadir, K. Lee, J. Liu, and


O. Farri, “Learning to diagnose: Assimilating clinical narra-
Due to the fact that a thorough evaluation of the NLP tives using deep reinforcement learning,” in Proceedings of
engine [32] used in our KG-based clinical diagnosis system the 8th International Joint Conference on Natural Language
was not conducted, especially with respect to information Processing (IJCNLP), 2017.
extraction from the MIMIC III discharge summaries, we [3] Y. Ling, S. A. Hasan, V. Datla, A. Qadir, K. Lee, J. Liu,
could not adequately control for the errors in predicting the and O. Farri, “Diagnostic inferencing via improving clinical
most likely diagnoses that may be attributable to errors in concept extraction with deep reinforcement learning: A pre-
the extracted clinical concepts. However, we rely on the fact liminary study,” in Proceedings of the 2nd Conference on
that the same NLP engine was used to process all sections Machine Learning for Health Care (MLHC), 2017.
of the discharge summaries in verifying that the results from [4] F. Wu and D. S. Weld, “Open information extraction using
our experiments are most likely unbiased in terms of errors wikipedia,” in Proceedings of the 48th Annual Meeting of
in the extracted clinical concepts. We intend to evaluate the the Association for Computational Linguistics, ser. ACL ’10.
accuracy of the NLP engine on information extraction from Stroudsburg, PA, USA: Association for Computational
the MIMIC III discharge summaries in the near future. Linguistics, 2010, pp. 118–127. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1858681.1858694

VI. C ONCLUSION [5] D. Milne and I. H. Witten, “An open-source toolkit for mining
wikipedia,” Artificial Intelligence, vol. 194, pp. 222–239,
In this paper, we described our Knowledge Graph (KG)- 2013.
based clinical diagnosis inference system. We conducted
extensive experiments on the MIMIC-III benchmark dataset [6] B. Katz, G. Marton, G. C. Borchardt, A. Brownell, S. Felshin,
D. Loreto, J. Louis-Rosenberg, B. Lu, F. Mora, S. Stiller
considering various sections of a clinical note. Results et al., “External knowledge sources for question answering.”
demonstrated that the content of the history of present illness in TREC, 2005.
and past medical history sections can contribute the most
for clinical diagnosis inference compared to all sections. [7] A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman,
Furthermore, we showed that the proposed KG-based system M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi,
and R. G. Mark, “Mimic-iii, a freely accessible critical care
can perform well in comparison to the state-of-the-art C- database,” Scientific data, vol. 3, 2016.
MemNN model for a relaxed precision metric.
In future, we would improve the current KG-based di- [8] A. Prakash, S. Zhao, S. A. Hasan, V. Datla, K. Lee, A. Qadir,
agnosis inference system by adding more properties (e.g. J. Liu, and O. Farri, “Condensed memory networks for
clinical diagnostic inferencing,” AAAI, 2016.
relationships among the clinical concepts) to the edges of the
knowledge graph. Also, we would like to utilize the findings [9] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning
of this study to improve the training data sets for machine to diagnose with lstm recurrent neural networks,” arXiv
learning models that help infer the clinical diagnoses from preprint arXiv:1511.03677, 2015.
free-text narratives.
[10] E. Choi, M. T. Bahadori, and J. Sun, “Doctor ai: Predicting
clinical events via recurrent neural networks,” arXiv preprint
R EFERENCES arXiv:1511.05942, 2015.

[1] G. Norman, M. Young, and L. Brooks, “Non-analytical [11] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz,
models of clinical reasoning: the role of experience,” and W. Stewart, “Retain: An interpretable predictive model
Medical Education, vol. 41, no. 12, pp. 1140–1145, for healthcare using reverse time attention mechanism,” in
2007. [Online]. Available: http://dx.doi.org/10.1111/j.1365- Advances in Neural Information Processing Systems, 2016,
2923.2007.02914.x pp. 3504–3512.

1010
[12] M. S. Simpson, E. M. Voorhees, and W. Hersh, “Overview [25] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor,
of the trec 2014 clinical decision support track,” DTIC “Freebase: a collaboratively created graph database for struc-
Document, Tech. Rep., 2014. turing human knowledge,” in Proceedings of the 2008 ACM
SIGMOD international conference on Management of data.
[13] K. Roberts, M. S. Simpson, E. Voorhees, and W. R. Hersh, AcM, 2008, pp. 1247–1250.
“Overview of the trec 2015 clinical decision support track,”
in TREC, 2015. [26] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: a
core of semantic knowledge,” in Proceedings of the 16th
[14] S. A. Hasan, S. Zhao, V. Datla, J. Liu, K. Lee, A. Qadir, international conference on World Wide Web. ACM, 2007,
A. Prakash, and O. Farri, “Clinical question answering using pp. 697–706.
key-value memory networks and knowledge graph.” TREC,
[27] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak,
2016.
and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in
The semantic web. Springer, 2007, pp. 722–735.
[15] S. A. Hasan, Y. Ling, J. Liu, and O. Farri, “Using neural
embeddings for diagnostic inferencing in clinical question [28] S. Yang, Y. Xie, Y. Wu, T. Wu, H. Sun, J. Wu, and X. Yan,
answering,” 2015. “Slq: a user-friendly graph querying system,” in Proceedings
of the 2014 ACM SIGMOD International Conference on
[16] T. R. Goodwin and S. M. Harabagiu, “Medical question Management of Data. ACM, 2014, pp. 893–896.
answering for clinical decision support,” in Proceedings of
the 25th ACM International on Conference on Information [29] S. Jonnalagadda, T. Cohen, S. Wu, and G. Gonzalez, “En-
and Knowledge Management. ACM, 2016, pp. 297–306. hancing clinical concept extraction with distributional seman-
tics,” J. of Biomedical Informatics, vol. 45, no. 1, pp. 129–
[17] Y. Ling, Y. An, and S. A. Hasan, “Improving clinical 140, 2012.
diagnosis inference through integration of structured and
unstructured knowledge,” in Proceedings of the 1st EACL [30] Y. Li, S. Lipsky Gorman, and N. Elhadad, “Section clas-
Workshop on Sense, Concept and Entity Representations and sification in clinical notes using supervised hidden markov
their Applications (SENSE), 2017. model,” in Proceedings of the 1st ACM International Health
Informatics Symposium. ACM, 2010, pp. 744–750.
[18] Y. Ling, Y. An, M. Liu, S. A. Hasan, Y. Fan, and X. Hu, [31] R. Pivovarov and N. Elhadad, “A hybrid knowledge-based
“Integrating extra knowledge into word embedding models and data-driven approach to identifying semantically similar
for biomedical nlp tasks,” in Proceedings of the 30th Interna- concepts,” Journal of biomedical informatics, vol. 45, no. 3,
tional Joint Conference on Neural Networks (IJCNN), 2017. pp. 471–481, 2012.
[19] Z. Zheng and X. Wan, “Graph-based multi-modality learning [32] S. A. Hasan, X. Zhu, Y. Dong, J. Liu, and O. Farri, “A hybrid
for clinical decision support,” in Proceedings of the 25th ACM approach to clinical question answering,” in Proceedings of
International on Conference on Information and Knowledge The Twenty-Third Text REtrieval Conference, TREC 2014,
Management. ACM, 2016, pp. 1945–1948. Gaithersburg, Maryland, USA, November 19-21, 2014, 2014.

[20] S. Balaneshin-kordan and A. Kotov, “Optimization method [33] K. A. Spackman, K. E. Campbell, and R. A. Côté, “Snomed
for weighting explicit and latent concepts in clinical decision rt: a reference terminology for health care.” in Proceedings
support queries,” in Proceedings of the 2016 ACM on Inter- of the AMIA annual fall symposium. American Medical
national Conference on the Theory of Information Retrieval. Informatics Association, 1997, p. 640.
ACM, 2016, pp. 241–250.
[34] O. Bodenreider, “The unified medical language system
[21] L. Shi, S. Li, X. Yang, J. Qi, G. Pan, and B. Zhou, “Semantic (umls): integrating biomedical terminology,” Nucleic acids
health knowledge graph: Semantic integration of heteroge- research, vol. 32, no. suppl 1, pp. D267–D270, 2004.
neous medical knowledge and services.”
[35] C. P. Langlotz, “Radlex: a new method for indexing online
educational materials 1,” 2006.
[22] S. Geng and Q. Zhang, “Clinical diagnosis expert system
based on dynamic uncertain causality graph,” in Information [36] A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman,
Technology and Artificial Intelligence Conference (ITAIC), M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi,
2014 IEEE 7th Joint International. IEEE, 2014, pp. 233– and R. G. Mark, “Mimic-iii, a freely accessible critical care
237. database,” Scientific data, vol. 3, 2016.

[23] D. Ferrucci, A. Levas, S. Bagchi, D. Gondek, and E. T. [37] M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang,
Mueller, “Watson: beyond jeopardy!” Artificial Intelligence, “Snomed clinical terms: overview of the development process
vol. 199, pp. 93–105, 2013. and project status.” in Proceedings of the AMIA Symposium.
American Medical Informatics Association, 2001, p. 662.
[24] A. Lally, S. Bachi, M. A. Barborak, D. W. Buchanan, J. Chu-
Carroll, D. A. Ferrucci, M. R. Glass, A. Kalyanpur, E. T. [38] L. M. Schriml, C. Arze, S. Nadendla, Y.-W. W. Chang,
Mueller, J. W. Murdock et al., “Watsonpaths: scenario-based M. Mazaitis, V. Felix, G. Feng, and W. A. Kibbe, “Disease
question answering and inference over unstructured informa- ontology: a backbone for disease semantic integration,” Nu-
tion,” Yorktown Heights: IBM Research, 2014. cleic acids research, vol. 40, no. D1, pp. D940–D946, 2012.

1011

You might also like