Healthcare Data Analytics 1st Edition Chandan K. Reddy Download PDF
Healthcare Data Analytics 1st Edition Chandan K. Reddy Download PDF
Healthcare Data Analytics 1st Edition Chandan K. Reddy Download PDF
com
https://ebookgate.com/product/healthcare-data-
analytics-1st-edition-chandan-k-reddy/
https://ebookgate.com/product/big-data-analytics-1st-edition-
venkat-ankam/
https://ebookgate.com/product/big-data-analytics-1st-ed-edition-
arvind-sathi/
https://ebookgate.com/product/agile-data-science-building-data-
analytics-applications-with-hadoop-1st-edition-russell-jurney/
https://ebookgate.com/product/introduction-to-data-analytics-for-
accounting-2nd-edition/
Data Analytics in Football Positional Data Collection
Modelling and Analysis 1st Edition Daniel Memmert
https://ebookgate.com/product/data-analytics-in-football-
positional-data-collection-modelling-and-analysis-1st-edition-
daniel-memmert/
https://ebookgate.com/product/intelligent-techniques-for-
predictive-data-analytics-1st-edition-neha-singh/
https://ebookgate.com/product/textbook-of-engineering-
drawing-2nd-ed-edition-k-venkata-reddy/
https://ebookgate.com/product/statistics-for-data-science-and-
analytics-1st-edition-peter-c-bruce/
https://ebookgate.com/product/business-analytics-data-analysis-
decision-making-5th-edition-s-christian-albright/
H ealthcare
D ata
A nalytics
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A.
PUBLISHED TITLES
ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY
Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava
BIOLOGICAL DATA MINING
Jake Y. Chen and Stefano Lonardi
COMPUTATIONAL BUSINESS ANALYTICS
Subrata Das
COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE
DEVELOPMENT
Ting Yu, Nitesh V. Chawla, and Simeon Simoff
COMPUTATIONAL METHODS OF FEATURE SELECTION
Huan Liu and Hiroshi Motoda
CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY,
AND APPLICATIONS
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS
Guozhu Dong and James Bailey
DATA CLASSIFICATION: ALGORITHMS AND APPLICATIONS
Charu C. Aggarawal
DATA CLUSTERING: ALGORITHMS AND APPLICATIONS
Charu C. Aggarawal and Chandan K. Reddy
Edited by
Chandan K. Reddy
Wayne State University
Detroit, Michigan, USA
Charu C. Aggarwal
IBM T. J. Watson Research Center
Yorktown Heights, New York, USA
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Contributors xxiii
Preface xxvii
vii
© 2015 Taylor & Francis Group, LLC
viii Contents
7 Natural Language Processing and Data Mining for Clinical Text 219
Kalpana Raja and Siddhartha R. Jonnalagadda
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.2 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2.2 Report Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2.3 Text Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.2.4 Core NLP Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.2.4.1 Morphological Analysis . . . . . . . . . . . . . . . . . . . . . . 224
7.2.4.2 Lexical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.2.4.3 Syntactic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.2.4.4 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.2.4.5 Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.3 Mining Information from Clinical Text . . . . . . . . . . . . . . . . . . . . . . . 226
7.3.1 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.3.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3.1.2 Context-Based Extraction . . . . . . . . . . . . . . . . . . . . . 230
7.3.1.3 Extracting Codes . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.3.2 Current Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
7.3.2.1 Rule-Based Approaches . . . . . . . . . . . . . . . . . . . . . . 234
7.3.2.2 Pattern-Based Algorithms . . . . . . . . . . . . . . . . . . . . . 235
7.3.2.3 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . 235
7.3.3 Clinical Text Corpora and Evaluation Metrics . . . . . . . . . . . . . . . . 235
7.3.4 Informatics for Integrating Biology and the Bedside (i2b2) . . . . . . . . . 237
7.4 Challenges of Processing Clinical Reports . . . . . . . . . . . . . . . . . . . . . . 238
7.4.1 Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.2 Confidentiality of Clinical Text . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.3 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.4 Diverse Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.4.5 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.4.6 Intra- and Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.4.7 Interpreting Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5 Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5.1 General Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5.2 EHR and Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.5.3 Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Index 719
xxi
© 2015 Taylor & Francis Group, LLC
© 2015 Taylor & Francis Group, LLC
Contributors
xxiii
© 2015 Taylor & Francis Group, LLC
xxiv Healthcare Data Analytics
Innovations in computing technologies have revolutionized healthcare in recent years. The analyt-
ical style of reasoning has not only changed the way in which information is collected and stored
but has also played an increasingly important role in the management and delivery of healthcare. In
particular, data analytics has emerged as a promising tool for solving problems in various healthcare-
related disciplines. This book will present a comprehensive review of data analytics in the field of
healthcare. The goal is to provide a platform for interdisciplinary researchers to learn about the
fundamental principles, algorithms, and applications of intelligent data acquisition, processing, and
analysis of healthcare data. This book will provide readers with an understanding of the vast num-
ber of analytical techniques for healthcare problems and their relationships with one another. This
understanding includes details of specific techniques and required combinations of tools to design
effective ways of handling, retrieving, analyzing, and making use of healthcare data. This book
will provide a unique perspective of healthcare related opportunities for developing new computing
technologies.
From a researcher and practitioner perspective, a major challenge in healthcare is its interdis-
ciplinary nature. The field of healthcare has often seen advances coming from diverse disciplines
such as databases, data mining, information retrieval, image processing, medical researchers, and
healthcare practitioners. While this interdisciplinary nature adds to the richness of the field, it also
adds to the challenges in making significant advances. Computer scientists are usually not trained in
domain-specific medical concepts, whereas medical practitioners and researchers also have limited
exposure to the data analytics area. This has added to the difficulty in creating a coherent body of
work in this field. The result has often been independent lines of work from completely different
perspectives. This book is an attempt to bring together these diverse communities by carefully and
comprehensively discussing the most relevant contributions from each domain.
The book provides a comprehensive overview of the healthcare data analytics field as it stands
today, and to educate the community about future research challenges and opportunities. Even
though the book is structured as an edited collection of chapters, special care was taken during the
creation of the book to cover healthcare topics exhaustively by coordinating the contributions from
various authors. Focus was also placed on reviews and surveys rather than individual research results
in order to emphasize comprehensiveness in coverage. Each book chapter is written by prominent
researchers and experts working in the healthcare domain. The chapters in the book are divided into
three major categories:
• Healthcare Data Sources and Basic Analytics: These chapters discuss the details about
the various healthcare data sources and the analytical techniques that are widely used in the
processing and analysis of such data. The various forms of patient data include electronic
health records, biomedical images, sensor data, biomedical signals, genomic data, clinical
text, biomedical literature, and data gathered from social media.
• Advanced Data Analytics for Healthcare: These chapters deal with the advanced data ana-
lytical methods focused on healthcare. These include the clinical prediction models, temporal
pattern mining methods, and visual analytics. In addition, other advanced methods such as
data integration, information retrieval, and privacy-preserving data publishing will also be
discussed.
xxvii
© 2015 Taylor & Francis Group, LLC
xxviii Healthcare Data Analytics
• Applications and Practical Systems for Healthcare: These chapters focus on the applica-
tions of data analytics and the relevant practical systems. It will cover the applications of data
analytics to pervasive healthcare, fraud detection, and drug discovery. In terms of the practi-
cal systems, it covers clinical decision support systems, computer assisted medical imaging
systems, and mobile imaging systems.
It is hoped that this comprehensive book will serve as a compendium to students, researchers,
and practitioners. Each chapter is structured as a “survey-style” article discussing the prominent
research issues and the advances made on that research topic. Special effort was taken in ensuring
that each chapter is self-contained and the background required from other chapters is minimal.
Finally, we hope that the topics discussed in this book will lead to further developments in the field
of healthcare data analytics that can help in improving the health and well-being of people. We be-
lieve that research in the field of healthcare data analytics will continue to grow in the years to come.
Acknowledgment: This work was supported in part by National Science Foundation grant
No. 1231742.
Chandan K. Reddy
Department of Computer Science
Wayne State University
Detroit, MI
[email protected]
Charu C. Aggarwal
IBM T. J. Watson Research Center
Yorktown Heights, NY
[email protected]
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Healthcare Data Sources and Basic Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Biomedical Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Sensor Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Biomedical Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Genomic Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Clinical Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.7 Mining Biomedical Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.8 Social Media Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Advanced Data Analytics for Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Clinical Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 Visual Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 Clinico–Genomic Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.5 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.6 Privacy-Preserving Data Publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Applications and Practical Systems for Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Data Analytics for Pervasive Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Healthcare Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Data Analytics for Pharmaceutical Discoveries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.4 Clinical Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.5 Computer-Aided Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.6 Mobile Imaging for Biomedical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Resources for Healthcare Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1
© 2015 Taylor & Francis Group, LLC
2 Healthcare Data Analytics
1.1 Introduction
While the healthcare costs have been constantly rising, the quality of care provided to the pa-
tients in the United States have not seen considerable improvements. Recently, several researchers
have conducted studies which showed that by incorporating the current healthcare technologies, they
are able to reduce mortality rates, healthcare costs, and medical complications at various hospitals.
In 2009, the US government enacted the Health Information Technology for Economic and Clinical
Health Act (HITECH) that includes an incentive program (around $27 billion) for the adoption and
meaningful use of Electronic Health Records (EHRs).
The recent advances in information technology have led to an increasing ease in the ability to
collect various forms of healthcare data. In this digital world, data becomes an integral part of health-
care. A recent report on Big Data suggests that the overall potential of healthcare data will be around
$300 billion [12]. Due to the rapid advancements in the data sensing and acquisition technologies,
hospitals and healthcare institutions have started collecting vast amounts of healthcare data about
their patients. Effectively understanding and building knowledge from healthcare data requires de-
veloping advanced analytical techniques that can effectively transform data into meaningful and
actionable information. General computing technologies have started revolutionizing the manner in
which medical care is available to the patients. Data analytics, in particular, forms a critical com-
ponent of these computing technologies. The analytical solutions when applied to healthcare data
have an immense potential to transform healthcare delivery from being reactive to more proactive.
The impact of analytics in the healthcare domain is only going to grow more in the next several
years. Typically, analyzing health data will allow us to understand the patterns that are hidden in
the data. Also, it will help the clinicians to build an individualized patient profile and can accurately
compute the likelihood of an individual patient to suffer from a medical complication in the near
future.
Healthcare data is particularly rich and it is derived from a wide variety of sources such as
sensors, images, text in the form of biomedical literature/clinical notes, and traditional electronic
records. This heterogeneity in the data collection and representation process leads to numerous
challenges in both the processing and analysis of the underlying data. There is a wide diversity in the
techniques that are required to analyze these different forms of data. In addition, the heterogeneity
of the data naturally creates various data integration and data analysis challenges. In many cases,
insights can be obtained from diverse data types, which are otherwise not possible from a single
source of the data. It is only recently that the vast potential of such integrated data analysis methods
is being realized.
From a researcher and practitioner perspective, a major challenge in healthcare is its interdisci-
plinary nature. The field of healthcare has often seen advances coming from diverse disciplines such
as databases, data mining, information retrieval, medical researchers, and healthcare practitioners.
While this interdisciplinary nature adds to the richness of the field, it also adds to the challenges in
making significant advances. Computer scientists are usually not trained in domain-specific medical
concepts, whereas medical practitioners and researchers also have limited exposure to the mathe-
matical and statistical background required in the data analytics area. This has added to the difficulty
in creating a coherent body of work in this field even though it is evident that much of the available
data can benefit from such advanced analysis techniques. The result of such a diversity has often led
to independent lines of work from completely different perspectives. Researchers in the field of data
analytics are particularly susceptible to becoming isolated from real domain-specific problems, and
may often propose problem formulations with excellent technique but with no practical use. This
book is an attempt to bring together these diverse communities by carefully and comprehensively
discussing the most relevant contributions from each domain. It is only by bringing together these
diverse communities that the vast potential of data analysis methods can be harnessed.
Chapter 2:
Electronic Health Records
Chapter 9:
Chapter 3: Images
Social Media
Data Sources
Chapter 8:
Chapter 4: Sensors & Basic
Biomedical Literature
Chapter 7:
Chapter 5: Signals
Clinical Notes
Chapter 6: Genomic
Chapter 10:
Advanced
Chapter 13:
Chapter 16:
Pervasive Health
Systems
Chapter 18: Chapter 20:
Drug Discovery CAD Systems
Chapter 19:
Decision Support
Another major challenge that exists in the healthcare domain is the “data privacy gap” between
medical researchers and computer scientists. Healthcare data is obviously very sensitive because it
can reveal compromising information about individuals. Several laws in various countries, such as
the Health Insurance Portability and Accountability Act (HIPAA) in the United States, explicitly
forbid the release of medical information about individuals for any purpose, unless safeguards are
used to preserve privacy. Medical researchers have natural access to healthcare data because their
research is often paired with an actual medical practice. Furthermore, various mechanisms exist in
the medical domain to conduct research studies with voluntary participants. Such data collection is
almost always paired with anonymity and confidentiality agreements.
On the other hand, acquiring data is not quite as simple for computer scientists without a proper
collaboration with a medical practitioner. Even then, there are barriers in the acquisition of data.
Clearly, many of these challenges can be avoided if accepted protocols, privacy technologies, and
safeguards are in place. Therefore, this book will also address these issues. Figure 1.1 provides an
overview of the organization of the book’s contents. This book is organized into three parts:
1. Healthcare Data Sources and Basic Analytics: This part discusses the details of various
healthcare data sources and the basic analytical methods that are widely used in the pro-
cessing and analysis of such data. The various forms of patient data that is currently being
collected in both clinical and non-clinical environments will be studied. The clinical data will
have the structured electronic health records and biomedical images. Sensor data has been
receiving a lot attention recently. Techniques for mining sensor data and biomedical signal
analysis will be presented. Personalized medicine has gained a lot of importance due to the
advancements in genomic data. Genomic data analysis involves several statistical techniques.
These will also be elaborated. Patients’ in-hospital clinical data will also include a lot of un-
structured data in the form of clinical notes. In addition, the domain knowledge that can be
extracted by mining the biomedical literature, will also be discussed. The fundamental data
mining, machine learning, information retrieval, and natural language processing techniques
for processing these data types will be extensively discussed. Finally, behavioral data captured
through social media will also be discussed.
2. Advanced Data Analytics for Healthcare: This part deals with the advanced analytical meth-
ods focused on healthcare. This includes the clinical prediction models, temporal data mining
methods, and visual analytics. Integrating heterogeneous data such as clinical and genomic
data is essential for improving the predictive power of the data that will also be discussed.
Information retrieval techniques that can enhance the quality of biomedical search will be
presented. Data privacy is an extremely important concern in healthcare. Privacy-preserving
data publishing techniques will therefore be presented.
3. Applications and Practical Systems for Healthcare: This part focuses on the practical ap-
plications of data analytics and the systems developed using data analytics for healthcare
and clinical practice. Examples include applications of data analytics to pervasive healthcare,
fraud detection, and drug discovery. In terms of the practical systems, we will discuss the de-
tails about the clinical decision support systems, computer assisted medical imaging systems,
and mobile imaging systems.
These different aspects of healthcare are related to one another. Therefore, the chapters in each
of the aforementioned topics are interconnected. Where necessary, pointers are provided across
different chapters, depending on the underlying relevance. This chapter is organized as follows.
Section 1.2 discusses the main data sources that are commonly used and the basic techniques for
processing them. Section 1.3 discusses advanced techniques in the field of healthcare data analytics.
Section 1.4 discusses a number of applications of healthcare analysis techniques. An overview of
resources in the field of healthcare data analytics is presented in Section 1.5. Section 1.6 presents
the conclusions.
known to be a genetic disease; however, the full set of genetic markers that make an individual
prone to diabetes are unknown. In some other cases, such as the blindness caused by Stargardt
disease, the relevant genes are known but all the possible mutations have not been exhaustively
isolated. Clearly, a broader understanding of the relationships between various genetic markers,
mutations, and disease conditions has significant potential in assisting the development of various
gene therapies to cure these conditions. One will be mostly interested in understanding what kind
of health-related questions can be addressed through in-silico analysis of the genomic data through
typical data-driven studies. Moreover, translating genetic discoveries into personalized medicine
practice is a highly non-trivial task with a lot of unresolved challenges. For example, the genomic
landscapes in complex diseases such as cancers are overwhelmingly complicated, revealing a high
order of heterogeneity among different individuals. Solving these issues will be fitting a major piece
of the puzzle and it will bring the concept of personalized medicine much more closer to reality.
Recent advancements made in the biotechnologies have led to the rapid generation of large
volumes of biological and medical information and advanced genomic research. This has also led
to unprecedented opportunities and hopes for genome scale study of challenging problems in life
science. For example, advances in genomic technology made it possible to study the complete ge-
nomic landscape of healthy individuals for complex diseases [16]. Many of these research directions
have already shown promising results in terms of generating new insights into the biology of hu-
man disease and to predict the personalized response of the individual to a particular treatment.
Also, genetic data are often modeled either as sequences or as networks. Therefore, the work in
this field requires a good understanding of sequence and network mining techniques. Various data
analytics-based solutions are being developed for tackling key research problems in medicine such
as identification of disease biomarkers and therapeutic targets and prediction of clinical outcome.
More details about the fundamental computational algorithms and bioinformatics tools for genomic
data analysis along with genomic data resources are discussed in Chapter 6.
analyzed using signal processing and time-series analysis techniques (e.g., wavelet transform, inde-
pendent component analysis, etc.) [37, 40]. Chapter 11 presents a detailed survey and summarizes
the literature on temporal data mining for healthcare data.
of fraud patterns and prioritization of suspicious cases [3]. Most of such analysis is performed
with respect to an episode of care, which is essentially a collection of healthcare provided to a
patient under the same health issue. Data-driven methods for healthcare fraud detection can be
employed to answer the following questions: Is a given episode of care fraudulent or unnecessary?
Is a given claim within an episode fraudulent or unnecessary? Is a provider or a network of providers
fraudulent? We discuss the problem of fraud in healthcare and existing data-driven methods for fraud
detection in Chapter 17.
[51]. These organizations usually conduct annual conferences and meetings that are well attended
by researchers working in healthcare informatics. The meetings typically discuss new technologies
for capturing, processing, and analyzing medical data. It is a good meeting place for new researchers
who would like to start research in this area.
The following are some of the well-reputed journals that publish top-quality research works in
healthcare data analytics: Journal of the American Medical Informatics Association (JAMIA) [41],
Journal of Biomedical Informatics (JBI) [42], Journal of Medical Internet Research [43], IEEE
Journal of Biomedical and Health Informatics [44], Medical Decision Making [45], International
Journal of Medical Informatics (IJMI) [46], and Artificial Intelligence in Medicine [47]. A more
comprehensive list of journals in the field of healthcare and biomedical informatics along with
details is available here [48].
Due to the privacy of the medical data that typically contains highly sensitive patient informa-
tion, the research work in the healthcare data analytics has been fragmented into various places.
Many researchers work with a specific hospital or a healthcare facility that are usually not willing
to share their data due to obvious privacy concerns. However, there are a wide variety of public
repositories available for researchers to design and apply their own models and algorithms. Due
to the diversity in healthcare research, it will be a cumbersome task to compile all the healthcare
repositories at a single location. Specific health data repositories dealing with a particular healthcare
problem and data sources are listed in the corresponding chapters where the data is discussed. We
hope that these repositories will be useful for both existing and upcoming researchers who do not
have access to the health data from hospitals and healthcare facilities.
1.6 Conclusions
The field of healthcare data analytics has seen significant strides in recent years because of hard-
ware and software technologies, which have increased the ease of the data collection process. The
advancement of the field has, however, faced a number of challenges because of its interdisciplinary
nature, privacy constraints in data collection and dissemination mechanisms, and the inherently un-
structured nature of the data. In some cases, the data may have very high volume, which requires
real-time analysis and insights. In some cases, the data may be complex, which may require special-
ized retrieval and analytical techniques. The advances in data collection technologies, which have
enabled the field of analytics, also pose new challenges because of their efficiency in collecting
large amounts of data. The techniques used in the healthcare domain are also very diverse because
of the inherent variations in the underlying data type. This book provides a comprehensive overview
of these different aspects of healthcare data analytics, and the various research challenges that still
need to be addressed.
Bibliography
[1] Charu C. Aggarwal. Data Streams: Models and Algorithms. Springer. 2007.
[2] Charu C. Aggarwal. Managing and Mining Sensor Data. Springer. 2013.
[3] Charu C. Aggarwal. Outlier Analysis. Springer. 2013.
[4] Charu C. Aggarwal. Social Network Data Analytics. Springer, 2011.
[5] Charu C. Aggarwal and Philip S. Yu. Privacy-Preserving Data Mining: Models and Algo-
rithms. Springer. 2008.
[6] Eta S Berner. Clinical Decision Support Systems. Springer, 2007.
[7] Richard J. Bolton, and David J. Hand. Statistical fraud detection: A review. Statistical Science,
17(3):235–249, 2002.
[8] Charles P. Friedman. Evaluation Methods in Biomedical Informatics. Springer, 2006.
[9] Robert A. Greenes. Clinical Decision Support: The Road Ahead. Academic Press, 2011.
[10] William Hersh. Information Retrieval: A Health and Biomedical Perspective. Springer, 2008.
[11] Daniel A. Keim. Information visualization and visual data mining. IEEE Transactions on Vi-
sualization and Computer Graphics, 8(1):1–8, 2002.
[12] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers. Big data:
The next frontier for innovation, competition, and productivity. McKinsey Global Institute
Report, May 2011.
[13] Kunio Doi. Computer-aided diagnosis in medical imaging: Historical review, current status
and future potential. Computerized Medical Imaging and Graphics, 31:2007.
[14] W. Hersh. Information Retrieval: A Health and Biomedical Perspective. Springer, 2009.
[15] R. B. Haynes, K. A. McKibbon, C. J. Walker, N. Ryan, D. Fitzgerald, and M. F. Ramsden.
Online access to MEDLINE in clinical settings: A study of use and usefulness. Annals of
Internal Medicine, 112(1):78–84, 1990.
[16] B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, J. Diaz, L. A., and K. W. Kinzler.
Cancer genome landscapes. Science, 339(6127):1546–1558, 2013.
[17] P. Edn, C. Ritz, C. Rose, M. Fern, and C. Peterson. Good old clinical markers have similar
power in breast cancer prognosis as microarray gene expression profilers. European Journal
of Cancer, 40(12):1837–1841, 2004.
[18] Rashid Hussain Khokhar, Rui Chen, Benjamin C.M. Fung, and Siu Man Lui. Quantifying
the costs and benefits of privacy-preserving health data publishing. Journal of Biomedical
Informatics, 50:107–121, 2014.
[19] Adam Sadilek, Henry Kautz, and Vincent Silenzio. Modeling spread of disease from social
interactions. In Proceedings of the 6th International AAAI Conference on Weblogs and Social
Media (ICWSM’12), pages 322–329, 2012.
[20] L. Jensen, J. Saric, and P. Bork. Literature mining for the biologist: From information retrieval
to biological discovery. Nature Reviews Genetics, 7(2):119–129, 2006.
[21] P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. Cohen. Frontiers of biomedical text
mining: Current progress. Briefings in Bioinformatics, 8(5):358–375, 2007.
[22] S. M. Meystre, G. K. Savova, K. C. Kipper-Schuler, and J. F. Hurdle. Extracting information
from textual documents in the electronic health record: A review of recent research. Yearbook
of Medical Informatics, pages 128–144, 2008.
[23] Daniel Keim et al. Visual Analytics: Definition, Process, and Challenges. Springer Berlin
Heidelberg, 2008.
[45] http://mdm.sagepub.com/
[46] http://www.ijmijournal.com/
[47] http://www.journals.elsevier.com/artificial-intelligence-in-medicine/
[48] http://clinfowiki.org/wiki/index.php/Leading_Health_Informatics_and_
Medical_Informatics_Journals
[49] http://www.amia.org/
[50] www.imia-medinfo.org/
[51] http://www.efmi.org/
Rajiur Rahman
Department of Computer Science
Wayne State University
Detroit, MI
[email protected]
Chandan K. Reddy
Department of Computer Science
Wayne State University
Detroit, MI
[email protected]
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 History of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Components of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Administrative System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Laboratory System Components & Vital Signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.3 Radiology System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Pharmacy System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.5 Computerized Physician Order Entry (CPOE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.6 Clinical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Coding Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 International Classification of Diseases (ICD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1.1 ICD-9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1.2 ICD-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.1.3 ICD-11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Current Procedural Terminology (CPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.3 Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) . . 32
2.4.4 Logical Observation Identifiers Names and Codes (LOINC) . . . . . . . . . . . . . . . . 33
2.4.5 RxNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.6 International Classification of Functioning, Disability, and Health (ICF) . . . . 35
2.4.7 Diagnosis-Related Groups (DRG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.8 Unified Medical Language System (UMLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.9 Digital Imaging and Communications in Medicine (DICOM) . . . . . . . . . . . . . . 38
2.5 Benefits of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Enhanced Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.2 Averted Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.3 Additional Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Barriers to Adopting EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7 Challenges of Using EHR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8 Phenotyping Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
21
© 2015 Taylor & Francis Group, LLC
22 Healthcare Data Analytics
2.1 Introduction
An Electronic Health Record (EHR) is a digital version of a patient’s medical history. It is a
longitudinal record of patient health information generated by one or several encounters in any
healthcare providing setting. The term is often used interchangeably with EMR (Electronic Med-
ical Record) and CPR (Computer-based Patient Record). It encompasses a full range of data rel-
evant to a patient’s care such as demographics, problems, medications, physician’s observations,
vital signs, medical history, immunizations, laboratory data, radiology reports, personal statistics,
progress notes, and billing data. The EHR system automates the data management process of com-
plex clinical environments and has the potential to streamline the clinician’s workflow. It can gener-
ate a complete record of a patient’s clinical encounter, and support other care-related activities such
as evidence-based decision support, quality management, and outcomes reporting. An EHR sys-
tem integrates data for different purposes. It enables the administrator to utilize the data for billing
purposes, the physician to analyze patient diagnostics information and treatment effectiveness, the
nurse to report adverse conditions, and the researcher to discover new knowledge.
EHR has several advantages over paper-based systems. Storage and retrieval of data is obviously
more efficient using EHRs. It helps to improve quality and convenience of patient care, increase
patient participation in the healthcare process, improve accuracy of diagnoses and health outcomes,
and improve care coordination. It also reduces cost by eliminating the need for paper and other
storage media. It provides the opportunity for research in different disciplines. In 2011, 54% of
physicians had adopted an EHR system, and about three-quarters of adopters reported that using an
EHR system resulted in enhanced patient care [1].
Usually, EHR is maintained within an institution, such as a hospital, clinic, or physician’s office.
An institution will contain the longitudinal records of a particular patient that have been collected
at their end. The institution will not contain the records of all the care provided to the patient at
other venues. Information regarding the general population may be kept in a nationwide or regional
health information system. Depending on the goal, service, venue, and role of the user, EHR can
have different data formats, presentations, and level of detail.
The remainder of this chapter is organized as follows. Section 2.2 discusses a brief history
of EHR development and Section 2.3 provides the components of EHRs. Section 2.4 presents a
comprehensive review of existing coding systems in EHR. The benefits of using EHRs are explained
in more detail in Section 2.5, while the barriers for the widespread adoption of EHRs are discussed
in Section 2.6. Section 2.7 briefly explains some of the challenges of using EHR data. The prominent
phenotyping algorithms are described in Section 2.8 and our discussion is concluded in Section 2.9.