0% found this document useful (0 votes)
50 views6 pages

SeDIE - A Semantic-Driven Engine For Integration

Uploaded by

housseindh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views6 pages

SeDIE - A Semantic-Driven Engine For Integration

Uploaded by

housseindh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SeDIE: A Semantic-Driven Engine for Integration

of Healthcare Data
1st Houssein Dhayne 2nd Rima Kilany 3rd Rafiqul Haque 4th Yehia Taher
ESIB ESIB Cognitus David lab
Saint Joseph University of Beirut Saint Joseph University of Beirut 34 Av. des Champs Elysees 45 Avenue des etats unis
Mar Roukos, Beirut - Lebanon Mar Roukos, Beirut - Lebanon Paris - France Versailles - France
[email protected] [email protected] [email protected] [email protected]

Abstract—A wider adoption of Electronic Health Records data types, and formats. In fact, the variety of medical data
(EHR) has produced massive scale clinical data that are stored stemming from heterogeneous sources results in consider-
in distributed and heterogeneous repositories using different able complexity of data interoperability and by extent in
formats including structured and unstructured. These data are
critically important for an analysis of patient health. However, for integration of data [4]. However, various standards could
an effective analysis, integration of data is sine qua non because it be adapted to meet interoperability and data integration in
provides a unified view which enables healthcare professionals to healthcare. For instance, the Semantic Web is gaining maturity
extract meaningful information from data collected from various to meet challenges of data variety [5] by providing standards
sources. Several standards have been developed to improve data and technologies able to address critical challenges in data
integration and system interoperability. The HL7 messaging is
the most successful standard for medical data exchange that integration from heterogeneous datasets. Furthermore, HL7
facilitates interoperability between health systems. Therefore, messages is the most common standard for the exchange of
converting HL7 messages to RDF enables the data scientists to de- medical data between health systems. These messages often
velop graph-based analytics model. However, unstructured data contain structured and unstructured data, which are a wealth
such as physicians notes, pathology report, etc, very often hinder of potentially exploitable information for smarter health oper-
the process of integrating data. Unfortunately, the state of the
art solutions are not able to perform such integration efficiently. ations and are deemed a valuable data source for health data
In this paper, we propose a semantic-driven engine called SeDIE integration [6]. Moreover, UMLS is a terminology integration
for integrating healthcare data. It is built on a novel approach project [7] that brings together more than 203 families of bio-
using a statistical method and Multiple-Criteria Decision-Making medical vocabularies, which enables it to play an important
model to overcome the barriers of integrating unstructured and role in semantic integration by identifying lexically similar
structured data. Moreover, we used UMLS as Metathesaurus to
provide a link between different terminologies. Our experiment concepts [8].
results show that the use of semantic techniques and healthcare Our Contributions. Relying on the standards discussed
interoperability tools does enhance data integration, enabling earlier, in this paper we propose a solution called SeDIE
researchers to discover data regardless of provenance and format for semantic integration of heterogeneous and distributed data
by using a Scored-SPARQL query. Thereby promoting improved
outcomes in the healthcare industry. sources. SeDIE enables to integrate data stemming from
Index Terms—Data integration, information extraction, different medical data sources in order to reformulate an entire
Healthcare data patients medical record and query patient data across different
data sources. We summarize our contributions as follows:
I. I NTRODUCTION • We proposed a method that recognizes patterns from HL7
segments in order to be filled with information extracted
Clinical information gathered from various hospital systems
from a medical free text, toward integrates structured and
is a valuable resource for building platforms for secondary
unstructured data.
use of the electronic health record EHR [1]. The existence of
• We proposed and implemented an algorithm that effi-
such platforms could help researchers and clinicians to assess
ciently converts data from HL7 v2.x messages into RDF,
patient care quality, determine research studies feasibility and
which seamlessly integrates data by linking patients data
guide clinical trials [2]. Unfortunately, the aggregation and
including disease, medical and environmental data.
integration of medical data not always succeeds, and often
remains a daunting task confronting many challenges. The The remainder of this paper is organized as follows: Section
rapid adoption of EHRs has led to generation of a huge II provides fundamental information on health care standards.
amount of data relating to the diagnosis, monitoring, testing, Section III describes the proposed approach and details the
treatment and health management of patients [3]. These data main steps needed to convert free text into structured data.
are stored in different and distributed Health Information The experimental model and results are discussed in Section
Systems. However, unfortunately, inconsistencies are found IV. Finally, conclusions and future work are presented in
in these data in terms of semantics, schema, terminologies, Section V.
Fig. 2: The same CUI from UMLS is used to map Diabetes Mellitus
terms in SNOMED CT and ICD-10
Fig. 1: An example of the HL7 v2 message the notion of a Concept Unique Identifier (CUI) to map terms
II. A N OVERVIEW OF I NTEROPERABILITY STANDARDS IN with similar meaning in different terminologies (see Fig. 2).
H EALTHCARE In addition to data, UMLS offers a number of tools to help
users, for instance, MetaMap [13] allows users to extract
The use of standardized information models has advantages Metathesaurus concepts from a text.
in healthcare areas. Due to the lack of interoperability between Moreover, BioPortal1 is a web portal that provides access
health information systems, many eHealth initiatives could to a library of biomedical ontologies and terminologies via
not be effectively implemented. A number of interoperability the NCBO Web services. BioPortal contained 509 distinct
standards have been developed to address issues related to ontologies from a number of different groups, including 31
seamless integration of data across a wide variety of healthcare UMLS Terminologies (e.g., SNOMED CT, ICD-10) mapped
information systems. . based on the common (CUI) in UMLS [14]. BioPortal includes
A. HL7 messaging standards RESTful Web services to access ontologies and individual on-
tology terms in RDF, therefore the entire content of BioPortal
HL7 is a Standards Developing Organization accredited by
ontologies can be exposed as Linked Open Data.
the American National Standards Institute (ANSI) to develop
The hierarchical structure of UMLS could be used to group
standards, such as HL7 v2.x, v3 and FHIR, for the exchange
related records, making the configuration of clinical decision
of administrative and clinical data between healthcare systems,
support much simpler than would be possible if users were to
regardless of the software solutions, operating systems or
select individual records or clinical concepts. Thus, for these
database management systems chosen by each institution.
important reasons, UMLS will be used in this engine as a
HL7’s Version 2.x (V2) messaging standard is still the most
reference for information extraction and data integration.
commonly used approach to exchange data in the clinical
domain. It has grown over time from 2.1 around since 1981 III. S E DIE I NTEGRATION E NGINE
to 2.8.2 in 2016 [9]. Whereas HL7 V3.0 and FHIR version HL7 segments such as OBX, RXE, PID are often formatted
represent a very small portion of real-world interfaces [10]. in a predefined structure that able to be converted to RDF.
The main building block of a message in HL7 v2.x is On the other hand, HL7 message includes note segment noted
a segment. A segment always begins with a three-character by ”NTE” describing information related, in the most case, to
designation (e.g. PID, OBR, OBX, ORC and NTE) that patient’s illness or treatment [15] in free text format that can
indicates the segment type. Each HL7 segment includes one not be used as it is in data integration and processing [16].
or more composites (or fields). A pipe | character is used to Therefore, extracting entities and relational facts from ”NTE”
separate one composite from another [11]. Fig. 1 illustrates an segment makes medical notes more convenient for automatic
example of the structure of HL7 v2 in addition to detailing a processing.
description of the main components of an OBX segment. Different approaches have been adopted to extract RDF
Consequently, in this paper, HL7 v2.x will be adopted as a from unstructured data. In [17] the authors propose an ap-
messaging standard at the data ingestion level. proach called triplet extraction and build parse trees to extract
B. Unified Medical Language System subject-predicate-object triplets from English sentences. [18] is
a system that automatically extracts RDF triples describing en-
The key problem facing data integration in the medical tity relations and properties from unstructured text by applying
field is the use of several classifications and terminologies Semantic Role Labeling (SRL), Coreference Resolution (CR)
such as (ICD-10, LONIC, RxNorm, SNOMED CT, etc.). and EEL with respect to DBpedia. Our approach made use
Transforming various codes and free text into a single coding of structural historical data as well as medical ontologies to
system could improve data collection and analysis. Achieving extract entities and find relationships between them.
effective transformation requires a complete controlled termi- In this section, we provide a comprehensive introduction
nology whereby medical data from disparate sources could be of our solution SeDIE that we designed and developed for
systematically integrated. healthcare data integration. We provide a detail how medical
Unified Medical Language System (UMLS) provides a data transmitted from different providers to SeDIE as HL7
mechanism for integrating all the major biomedical vocabu- messages are translated and integrated using different modules
laries, which provide a unified global terminology [12]. With of SeDIE. Since HL7 v2.x standard is supported by many
three knowledge sources; the Metathesaurus, the semantic different systems, we developed SeDIE using rich technologies
network and the specialist lexicon, UMLS builds a mapping to be an inter-operable solution with various systems.
among these diverse terminologies that allows the translation
of concepts among the different terminologies [7]. UMLS uses 1 http://bioportal.bioontology.org
TABLE I: PR of most widely used segments in the HL7 standard.
Segemnt Fields
Code Desc. Field name Semantic type Occ.
Organic Chemical,
Observation
Pharmacologic Substance. 100%
Identifier
Observation Laboratory or Test Result
OBX
Result Observation Quantitative Concept.
100%
Value Pathologic Function
Units Measurement Unit 90%
Date/Time Temporal Concept. 50%
Antibiotic,
Allergy Id 100%
AL1 Allergy Organic Chemical.
Allergy Reaction Sign or Symptom. 95%
quantity/timing Temporal Concept. 80%
Pharmacy,
Organic Chemical,
Fig. 3: SeDIE – a semantic-driven engine for health data integration RXE Treatment Give Code 100%
Pharmacologic Substance.
Order
Give Amount Quantitative Concept. 100%
A. SeDIE in a Nutshell Units Measurement Unit. 94%
Diagnosis Id Disease or Syndrome. 100%
SeDIE consists of various components for performing var- DG1 Diagnosis Diagnosis Data Temporal Concept. 42%
ious functions such as pre-processing and processing of data. Diagnosis Type Pathologic Function. 35%
Fig. 3 shows the high-level architecture of SeDIE. SeDIE Procedure Id Diagnostic Procedure. 100%
performs pattern recognition (Fig.3 A) tasks in data pre- PR1 Procedure Date/Time Temporal Concept. 100%
Diagnosis Disease or Syndrome. 60%
processing phase. Three components are employed to perform
tasks include the following: the HL7 message repository, the of information (as shown in listing 1.a). Thus, the existence
pattern recognition engine, and the HL7 patterns repository of these four concepts in a phrase allows the construction of
(We provided a detailed description of the functions of these the RXE segment.
components later in this section). The data ingestion interface In order to develop our PR module, we used Statistical Pat-
enables ingesting data into different components such as entity tern Recognition technique [19] which is based on frequency
recognition module and HL7 message converter. Messages count summarizations technique. According to [20], statistical
are ingested in a text format, and may contain structured pattern recognition draws from established concepts in statisti-
(or semi-structured) data such as patient diagnosis, lab result, cal decision theory to discriminate among data from different
etc., and unstructured data such as physician notes, pathology groups based upon quantitative features of the data. In our
report, imaging radiology. The entity recognizer (Fig.3 B) model, the extracted features are the percentage of occurrence
detects healthcare entities in the messages. A search interface frequency of each field in a set of segments. For instance,
(Fig.3 C) is provided to enable querying pattern in patterns the occurrence (or weight) of fields ”Quantity/Timing” in a
repository and reconstruct a structured segment. The converter set of RXE segments is 80%. Then, the model recognizes
(Fig.3 D & E) converts HL7 messages segments into RDF the segment patterns through medical concepts by replacing
triples and consolidate all codification used in the messages each field of segments with a concept from Semantic Types
according to one standard coding system. Finally, the RDF Ontology STY2 . In this way, RXE segment could be written
triples are stored into the triplestore. The new representation with concepts from an extended version of STY (as presented
of HL7 data and the triplestore allows for semantic querying in listing 1.b).
and intelligent retrieval of data in research-oriented scenarios. a) RXE||quantity/timing|medication_codeˆtextˆcoding_system
|amount_to_be_given||units
B. Functional Modules of SeDIE b) RXE||Temporal_Concept|cuiˆPharmacologic_SubstanceˆUMLS
|Quantitative_Concept||Measurement_Unit
This section delves into the details of SeDIE functional
module implementation and highlights its methods for linking Listing 1: RXE segment with minimum fields
extracted terms to segment patterns as well as converting them This method enables the solution to recognize the pattern
to RDF. of each segment, using semantic types from STY extended
1) HL7 Pattern Recognition (PR) module: In our approach, with ”Measurement Unit” class. The ”Measurement Unit”
we assume that each sentence, in a HL7 NTE, could be repre- class is a subClassOf ”Quantitative Concept” used for terms
sented by one structured segment such as (OBX, RXE, etc.), having a type ”Quantitative Concept” and derived from UCUM
as well as each extracted term should correspond to one data- ontology [21]. This class was added to eliminate the ambiguity
element (field or sub-field). Therefore, this module recognizes between Numbers and Units. Table I represents the Patterns
patterns from HL7 historical repository (See Fig. 3). Once Repository, which contains several HL7 segments, the most
identified patterns have been confirmed, patterns are registered used fields, their semantic types and occurrences.
in HL7 patterns repository for later use in automatically 2) Medical Entity Recognition (ER) module: ER from
reconstructing NTE to a structured segment. unstructured data is the process of detecting and delimiting of
Consider a simple example to illustrate PR. The RXE sentence information referring to medical entities and mapping
segment is used as a record of medications and is attached
to the patient. The RXE segment often consists of four pieces 2 https://bioportal.bioontology.org/ontologies/STY
TABLE II: ER for a phrase using MetaMap, the higher the score, the
of alternatives in terms of numerous decision criteria [24].
greater the relevance of STY concept.
Despite this available methods, there is no single method that
Score Term CUI Semantic Type
764 Well controlled C3853142 Functional Concept could be considered most appropriate for all types of decision-
575 Metformin C0025598 Pharmacologic Substance making problem [25]. The weighted sum model (WSM) is
575 500 C3816747 Quantitative Concept probably the best known and most widely used method of
598 mg/dl C0439269 Measurement Unit
598 twice daily C0585361 Temporal Concept
MCDM, especially in single dimensional problems, in which
all the units are the same [26]. In our approach, all units are
these mentions to a knowledge base. Terminological variation represented by the score, therefore, WSM fits to our problem
is the main problem of ER. The MetaMap3 provided by the to find a suitable segment pattern. Thereby, we consider the
National Institute of Health (NIH) is considered a baseline in segments pattern (Pseg ) as alternatives and the semantic types
ER [22]. MetaMap uses an approach based on natural language (ST ) as criteria, and the data for this WSM problem would
processing (NLP) and computational linguistic techniques. We be as follows:
use MetaMap to discover and map medical text to UMLS  
ST t1 t2 ti tn
terminologies. Table II is an example that shows the semantic  
type, cui, and the score of terms extracted from sentence ”Her Alt s1 s2 si sn
diabetes is well-controlled with metformin 500 mg/dl twice  
daily” using MetaMap. seg1 w11 w12 w1i w1n
segj  wj1 wj2 wji wjn 
3) Reconstruction module: In this module, SeDIE seman-
segm wm1 wm2 wmi wmn
tically reconstructs extracted medical entities into a structured
segment applying an appropriate pattern (Fig.4 illustrates Technically speaking, the Best Alternative (BA) could be
the process). The pattern is searched for in HL7 Patterns the one that satisfies the following expression: BA(E) =
Repository using a scoring mechanism. Then, a new segment i=n i=n
segm X segm X
instance containing extracted entities is created and associated max | (ti , si ) | × | (ti , wji ) |= max (si × wji ) (1)
with a score value representing the reconstruction accuracy. segj segj
i=0 i=0
The scoring mechanism computation is based on the weight
In order to normalize the result and use the score as an
(occurrence) of fields in the pattern and the score of the
accuracy value, we suggest dividing BA(E) by the sum of the
extracted entities. To carry out this computation, extracted
function F (ti ). The objective of F (ti ) is twofold, i)normalize
entities and patterns are represented as vectors as follows:
the score to a value between 0 and 1000, ii)give the score a
• ST = {ti | ti represent semantic types used to annotate more realistic value by taking into account the semantic types
terms( or entities) | i ≤ n} extracted from a phrase and which are not mapped to a field
• Seg = {segj | segj represent a segment of HL7 message of the segment pattern. ∀(ti , si ) ∈ E ∧ si 6= 0 then
| j ≤ m} (
• Pattern Pseg = {(ti , wi ) | ti ∈ ST ∧ wi represents the wji if (ti , wji ) ∈ Psegj ∧ wji 6= 0
F (ti | segj ) = (2)
weight (percentage of occurrence) of ti in a set of segj } 100 if wji = 0
• Phrase E ={(ti , si ) | ti ∈ ST ∧ si represent the score of
ti returned by MetaMap} Thus, the best pattern will be the segment that yields the
maximum score (BAs ) as follows:
Pi=n
segm i=0 (si × wji )
BAs (E) = max P i=n
(3)
segj
i=0 F (ti )

Finally, a predefined threshold β is used to evaluate BAs (E).


That is, segj that has the value BAs (E) will be the appropriate
pattern of the phrase represented by E, provided that the
BAs (E) > β.
4) Conversion module (From HL7 to RDF): Converting
Fig. 4: An example to convert phrase to structured HL7 segment. data to RDF is critically important in data integration, since
Furthermore, to search an appropriate pattern for an instance adopting the RDF format for describing data offers various
phrase, we use the Multi-Criteria Decision-Making (MCDM) advantages such as: i) no need for a common schema for data
approach adequate for solving decision problems in which as long as it is described as RDF triplets and ii) the ability
multiple decision criteria are investigated in the decision to query data from heterogeneous data sources as long as it
process. The objective of MCDM is to calculate a scoring is transformed into RDF. The World Wide Web Consortium
value that provides an order of alternatives, from the most (W3C) presents a list of helpful RDF converter tools on their
preferred to the least preferred option [23]. There are a number Wiki4 , such as RDMS, XML and JSON. Despite the efforts
of MCDM methods that deal with the estimation of a set being made for converting HL7 to RDF, there is no suitable
3 https://metamap.nlm.nih.gov/ 4 https://www.w3.org/wiki/ConverterToRdf
IV. E XPERIMENTAL RESULTS
This section provides results of experiments SeDIE. The
goal of the experiment is to evaluate the quality of the created
HL7 segments from NTE segment. Since there is no gold
standard extraction of HL7 segments from notes, we randomly
extracted 100 sampled sentences from MTSamples website7 .
The website provides a large collection of publicly available
transcribed medical records. Sampled sentences are examined
for relevant segments and manually compared to the result
obtained from our engine. We use the standard equations (4)
for measures of Precision (P), Recall (R) and F-measure (F).
TP TP
P = R= (4)
(T P + F P ) (T P + F N )
We evaluated the newly created segment based on the correct-
ness of the chosen pattern associated with the semantic types
extracted from the phrase. Regarding several results of True
Positive (TP), False Negative (FN) and False Positive (FP),
Fig. 5: An HL7 message converted to RDF and bound to CUI
given by our method, we found that these problems stemmed
off-the-shelf tool available. There were some attempts to from the FN and FP provided by MetaMap. Therefore, up-
perform conversion, however we found that either there was no dated equations (5) of evaluation measures are defined, which
usable tool to achieve the conversion [27], or it was mandatory respectively divide the standard R and P measures by the R
to go through an intermediate xml transformation [28], which and P of MetaMap (RM and PM ) respectively.
has an impact on the performance. Therefore, a converter TP TP
for transforming HL7 data into RDF instances is a sine quo P = R= (5)
(T P + F P ) × PM (T P + F N ) × RM
non, in order to overcome the gap between medical data and
semantic web technologies. In this section, we describe a To determine R and P measures of Metamap, an entity recogni-
highly flexible open-source tool for converting HL7 messages tion experiment [29] was applied to a corpus to automatically
to RDF instances, that we implemented as a Java library annotate it with UMLS concepts. The experiment results show
under the name HL7toRDF 5 . The HL7toRDF uses the Java- that MetaMap obtained 53.5% for R and 83.9% for P. Thus,
based parser provided by the HL7 Application Programming the result of our experiment using equations (4) shows that an
Interface (HAPI)6 for parsing HL7 v2.0 messages. F1 score of 44.5%, a P of 66.7%, and a R of 53.4%. Whereas,
the experiment using equations (5) resulted in the F1 score of
At the basic level, the algorithm works by processing every
85.1%, a P of 88.1%, and a R of 82.3%.
HL7 message received by the engine. For each message, it
The analysis of the FN and FP of the result shows that
creates an appropriate instance URI based on the message
MetaMap performed an incorrect or ambiguity extraction of
identifier; then, search in the message PID segment for the
medical entities, as well as the number of semantic types ex-
patient identifier, to find if the patient URI is already created
tracted from a phrase was greater than the number of semantic
in the RDF model, in order to link this message to the existing
types in a segment pattern. Another limitation of our approach
patient. If the URI is not found, a new patient’s URI is created.
is that there is not always a segment pattern able to represent
Next, the algorithm iterates over every segment of the HL7
the meaning of a phrase. Conversely, HL7 message has about
message. For each segment, the converter converts the fields
140 different segments8 . Therefore, recognize patterns of these
and sub-fields to a set of RDF triples as appropriate for that
whole segments could eliminate many of these issues.
value. Fig. 5(A) illustrates an HL7 message converted to RDF
graph, each segment is accompanied by a score of 1000 for PREFIX HL7: <http://www.HL7.org/segment#>
SELECT DISTINCT ?patient ?GivenName ?FamilyName
the structured segments and the result of equation (3) for NTE WHERE { ?patient HL7:PID ?PID.
segment. ?PID HL7:PatientName ?PatientName.
?patient HL7:DG1 ?DG1.
However, in practice, various clinical codification standards ?DG1 HL7:DiagnosisCodeDG1 ?DiagnosisCodeDG1.
are being used in HL7 messages, making it more complex for ?DiagnosisCodeDG1 HL7:Identifier ?Identifier.
?Identifier HL7:umls_cui ?cui. ?DG1 HL7:SegScore ?score
health information to be shared among different providers as FILTER (?score >= 400 && regex(?cui,’C0022661’))};
well as complicating the analysis of clinical trial data. Thus,
Listing 2: Searching in RDF graph for all patients with ’Chronic renal
SeDIE bind each codification to its corresponding CUI using failure’ with an accuracy greater than 400, regardless of the initially
a ”umls cui” predicate, as shown in Fig. 5(B). used coding.

5 https://github.com/housseindh/HL7ToRDF 7 http://www.mtsamples.com/
6 https://hapifhir.github.io/hapi-HL7v2/ 8 http://HL7-definition.caristix.com:9010/HL7%20v2.5.1/segment
Finally, it is worth noting that the proposed method allows [8] A. Névéol, J. Li, and Z. Lu, “Linking multiple disease-related resources
researchers to discover data using SPARQL query, regardless through umls,” in Proceedings of the 2nd ACM SIGHIT international
health informatics symposium. ACM, 2012, pp. 767–772.
of its source or its medical coding, as well as with any case, [9] “Hl7 standards product brief - hl7 version 2 product suite,”
structured and unstructured. The researcher only needs to write http://www.hl7.org/implement/standards/product brief.cfm?
the query and specify the target accuracy (see Listing 2). product id=185, (Accessed on 07/01/2018).
[10] D. Bender and K. Sartipi, “Hl7 fhir: An agile and restful approach to
healthcare information exchange,” in Computer-Based Medical Systems
V. C ONCLUSION & F UTURE WORKS (CBMS), 2013 IEEE 26th International Symposium on. IEEE, 2013,
pp. 326–331.
In this paper, we described a semantic-driven engine for [11] T. Benson and G. Grieve, “Hl7 version 2,” in Principles of Health
health data integration called SeDIE. Our solution integrates Interoperability. Springer, 2016, pp. 223–242.
HL7 messages coming from heterogeneous healthcare sys- [12] D. A. Lindberg, B. L. Humphreys, and A. T. McCray, “The unified
medical language system,” Methods of information in medicine, vol. 32,
tems. We explained the integration workflow which is com- no. 04, pp. 281–291, 1993.
posed of four main steps: i) recognition of patterns from HL7 [13] A. R. Aronson and F.-M. Lang, “An overview of metamap: historical
segments using a statistical pattern recognition, ii) extraction perspective and recent advances,” Journal of the American Medical
Informatics Association, vol. 17, no. 3, pp. 229–236, 2010.
of medical entities using the MetaMap, iii) reconstructing [14] M. R. Kamdar, T. Tudorache, and M. A. Musen, “A systematic analysis
extracted entities into structured segment by applying MCDM of term reuse and term overlap across biomedical ontologies,” Semantic
approach and iv) converting HL7 segements to RDF. The web, vol. 8, no. 6, pp. 853–871, 2017.
[15] J. Friedlin, S. Grannis, and J. M. Overhage, “Using natural language
model SPARQL in Listing 2 shows that the combined use of processing to improve accuracy of automated notifiable disease report-
semantic web technologies and healthcare standards, including ing,” in AMIA Annual Symposium Proceedings, vol. 2008. American
HL7 and UMLS, serve as an effective infrastructure for Medical Informatics Association, 2008, p. 207.
[16] M.-C. Lin, D. J. Vreeman, and S. M. Huff, “Investigating the semantic
semantic medical data integration on different granular, and interoperability of laboratory data exchanged using loinc codes in three
thereby extracting hidden patterns and relationships from a large institutions,” in AMIA Annual Symposium Proceedings, vol. 2011.
wide variety of data. American Medical Informatics Association, 2011, p. 805.
[17] D. Rusu, L. Dali, B. Fortuna, M. Grobelnik, and D. Mladenic, “Triplet
A list of tasks is lined up to extend SeDIE. We will focus extraction from sentences,” in Proceedings of the 10th International
on the most compelling and critical task in the next version Multiconference” Information Society-IS, 2007, pp. 8–12.
of SeDIE. We planned to improve our approach by proposing [18] P. Exner and P. Nugues, “Entity extraction: From unstructured text to
dbpedia rdf triples.”
a method that could map one phrase to multiple segments [19] K. Fukunaga, Introduction to statistical pattern recognition. Elsevier,
patterns, as well as finding an ER tool more suited for our 2013.
entity extraction approach, which decreases FN and FP rates. [20] R. T. Olszewski, “Generalized feature extraction for structural pattern
recognition in time-series data,” CARNEGIE-MELLON UNIV PITTS-
This will enable to convert a medical textual document into BURGH PA SCHOOL OF COMPUTER SCIENCE, Tech. Rep., 2001.
RDF graph data model. [21] G. Schadow, C. J. McDonald, J. G. Suico, U. Föhring, and T. Tolxdorff,
“Units of measure in clinical information systems,” Journal of the
ACKNOWLEDGEMENTS American Medical Informatics, vol. 6, no. 2, pp. 151–162, 1999.
[22] A. B. Abacha and P. Zweigenbaum, “Medical entity recognition: A
The authors would like to thank Mrs Jocelyne Ziadeh, the comparison of semantic and statistical methods,” in Proceedings of
Director of Information Systems for HDF, and Mr Joseph BioNLP 2011 Workshop. Association for Computational Linguistics,
2011, pp. 56–64.
Geagea, HL7 expert at HDF (http://www.hdf.usj.edu.lb/), for [23] E. K. Zavadskas and Z. Turskis, “Multiple criteria decision making
providing the domain knowledge and expert advice. (mcdm) methods in economics: an overview,” Technological and eco-
nomic development of economy, vol. 17, no. 2, pp. 397–427, 2011.
R EFERENCES [24] E. Triantaphyllou, “Multi-criteria decision making methods,” in Multi-
criteria decision making methods: A comparative study. Springer, 2000,
[1] B. A. Goldstein, A. M. Navar, M. J. Pencina, and J. Ioannidis, “Op- pp. 5–21.
portunities and challenges in developing risk prediction models with [25] A. Guitouni and J.-M. Martel, “Tentative guidelines to help choosing an
electronic health records data: a systematic review,” Journal of the appropriate mcda method,” European Journal of Operational Research,
American Medical Informatics, vol. 24, no. 1, pp. 198–208, 2017. vol. 109, no. 2, pp. 501–521, 1998.
[2] H. Prokosch and T. Ganslandt, “Perspectives for medical informatics. [26] E. Triantaphyllou, B. Shu, S. N. Sanchez, and T. Ray, “Multi-criteria
reusing the electronic medical record for clinical research.” Methods of decision making: an operations research approach,” Encyclopedia of
information in medicine, vol. 48, no. 1, p. 38, 2009. electrical and electronics, vol. 15, no. 1998, pp. 175–186, 1998.
[3] A. R. Tate, N. Beloff, B. Al-Radwan, J. Wickson, S. Puri, T. Williams, [27] F. Prasser, F. Kohlmayer, A. Kemper, and K. Kuhn, “A generic trans-
T. Van Staa, and A. Bleach, “Exploiting the potential of large databases formation of hl7 messages into the resource description framework data
of electronic health records for research using rapid search algorithms model,” 2012.
and an intuitive query interface,” Journal of the American Medical [28] Y. Kawazoe, T. Imai, and K. Ohe, “A querying method over rdf-
Informatics Association, vol. 21, no. 2, pp. 292–298, 2013. ized health level seven v2. 5 messages using life science knowledge
[4] O. Iroju, A. Soriyan, I. Gambo, and J. Olaleke, “Interoperability in resources,” JMIR medical informatics, vol. 4, no. 2, 2016.
healthcare: benefits, challenges and resolutions,” 2013. [29] A. Jimeno, E. Jimenez-Ruiz, V. Lee, S. Gaudan, R. Berlanga, and
[5] X. Zenuni, B. Raufi, F. Ismaili, and J. Ajdari, “State of the art of semantic D. Rebholz-Schuhmann, “Assessment of disease named entity recogni-
web for healthcare,” Procedia-Social and Behavioral Sciences, vol. 195, tion on a corpus of annotated sentences,” in BMC bioinformatics, vol. 9,
pp. 1990–1998, 2015. no. 3. BioMed Central, 2008, p. S3.
[6] P. Jayaratna and K. Sartipi, “Hl7 v3 message extraction using semantic
web techniques,” International Journal of Knowledge Engineering and
Data Mining, vol. 2, no. 1, pp. 89–115, 2012.
[7] O. Bodenreider, “The unified medical language system (umls): inte-
grating biomedical terminology,” Nucleic acids research, vol. 32, no.
suppl 1, pp. D267–D270, 2004.

You might also like