100% found this document useful (15 votes)
322 views84 pages

Healthcare Data Analytics 1st Edition Chandan K. Reddy Download PDF

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 84

Full download ebook at ebookgate.


Healthcare Data Analytics 1st Edition Chandan

K. Reddy


Download more ebook from https://ebookgate.com

More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Big Data Analytics 1st Edition Venkat Ankam


Big data analytics 1st ed Edition Arvind Sathi


Agile Data Science Building Data Analytics Applications

with Hadoop 1st Edition Russell Jurney


Introduction to Data Analytics for Accounting 2nd

Edition --

Data Analytics in Football Positional Data Collection
Modelling and Analysis 1st Edition Daniel Memmert


Intelligent Techniques for Predictive Data Analytics

1st Edition Neha Singh


Textbook of Engineering Drawing 2nd ed Edition K

Venkata Reddy


Statistics for Data Science and Analytics 1st Edition

Peter C. Bruce


Business Analytics Data Analysis Decision Making 5th

Edition S. Christian Albright

H ealthcare
D ata
A nalytics

© 2015 Taylor & Francis Group, LLC

Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A.


This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.

Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava
Jake Y. Chen and Stefano Lonardi
Subrata Das
Ting Yu, Nitesh V. Chawla, and Simeon Simoff
Huan Liu and Hiroshi Motoda
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
Guozhu Dong and James Bailey
Charu C. Aggarawal
Charu C. Aggarawal and Chandan K. Reddy

© 2015 Taylor & Francis Group, LLC

Guojun Gan
Yukio Ohsawa and Katsutoshi Yada
Luís Torgo
James Wu and Stephen Coggeshall
Harvey J. Miller and Jiawei Han
Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker
Chandan K. Reddy and Charu C. Aggarwal
Vagelis Hristidis
Priti Srinivas Sajja and Rajendra Akerkar
Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu
David Skillicorn
João Gama
Ashok N. Srivastava and Jiawei Han
David Lo, Siau-Cheng Khoo, Jiawei Han, and Chao Liu
Zhongfei Zhang and Ruofei Zhang
Tao Li, Mitsunori Ogihara, and George Tzanetakis
Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar

© 2015 Taylor & Francis Group, LLC

Markus Hofmann and Ralf Klinkenberg
Bo Long, Zhongfei Zhang, and Philip S. Yu
Domenico Talia and Paolo Trunfio
Zheng Alan Zhao and Huan Liu
George Fernandez
Naiyang Deng, Yingjie Tian, and Chunhua Zhang
Theophano Mitsa
Ashok N. Srivastava and Mehran Sahami
Xindong Wu and Vipin Kumar
David Skillicorn

© 2015 Taylor & Francis Group, LLC

H ealthcare
D ata
A nalytics

Edited by
Chandan K. Reddy
Wayne State University
Detroit, Michigan, USA

Charu C. Aggarwal
IBM T. J. Watson Research Center
Yorktown Heights, New York, USA

© 2015 Taylor & Francis Group, LLC

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2015 by Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Version Date: 20150202

International Standard Book Number-13: 978-1-4822-3212-7 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at

and the CRC Press Web site at


© 2015 Taylor & Francis Group, LLC


Editor Biographies xxi

Contributors xxiii

Preface xxvii

1 An Introduction to Healthcare Data Analytics 1

Chandan K. Reddy and Charu C. Aggarwal
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Healthcare Data Sources and Basic Analytics . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Biomedical Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Sensor Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Biomedical Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Genomic Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Clinical Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.7 Mining Biomedical Literature . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.8 Social Media Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Advanced Data Analytics for Healthcare . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Clinical Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 Visual Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 Clinico–Genomic Data Integration . . . . . . . . . . . . . . . . . . . . . . 10
1.3.5 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.6 Privacy-Preserving Data Publishing . . . . . . . . . . . . . . . . . . . . . 11
1.4 Applications and Practical Systems for Healthcare . . . . . . . . . . . . . . . . . 12
1.4.1 Data Analytics for Pervasive Health . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Healthcare Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Data Analytics for Pharmaceutical Discoveries . . . . . . . . . . . . . . . 13
1.4.4 Clinical Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . 13
1.4.5 Computer-Aided Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.6 Mobile Imaging for Biomedical Applications . . . . . . . . . . . . . . . . 14
1.5 Resources for Healthcare Data Analytics . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

I Healthcare Data Sources and Basic Analytics 19

2 Electronic Health Records: A Survey 21
Rajiur Rahman and Chandan K. Reddy
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 History of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

© 2015 Taylor & Francis Group, LLC
viii Contents

2.3 Components of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 Administrative System Components . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Laboratory System Components & Vital Signs . . . . . . . . . . . . . . . 24
2.3.3 Radiology System Components . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Pharmacy System Components . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.5 Computerized Physician Order Entry (CPOE) . . . . . . . . . . . . . . . . 26
2.3.6 Clinical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Coding Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 International Classification of Diseases (ICD) . . . . . . . . . . . . . . . . 28 ICD-9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 ICD-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 ICD-11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Current Procedural Terminology (CPT) . . . . . . . . . . . . . . . . . . . 32
2.4.3 Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) . 32
2.4.4 Logical Observation Identifiers Names and Codes (LOINC) . . . . . . . . 33
2.4.5 RxNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.6 International Classification of Functioning, Disability, and Health (ICF) . . 35
2.4.7 Diagnosis-Related Groups (DRG) . . . . . . . . . . . . . . . . . . . . . . 37
2.4.8 Unified Medical Language System (UMLS) . . . . . . . . . . . . . . . . . 37
2.4.9 Digital Imaging and Communications in Medicine (DICOM) . . . . . . . . 38
2.5 Benefits of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Enhanced Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.2 Averted Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.3 Additional Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Barriers to Adopting EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7 Challenges of Using EHR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8 Phenotyping Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3 Biomedical Image Analysis 61

Dirk Padfield, Paulo Mendonca, and Sandeep Gupta
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Biomedical Imaging Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1 Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.2 Positron Emission Tomography . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.3 Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.4 Ultrasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.5 Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.6 Biomedical Imaging Standards and Systems . . . . . . . . . . . . . . . . . 66
3.3 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.1 Template Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.2 Model-Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.3 Data-Driven Detection Methods . . . . . . . . . . . . . . . . . . . . . . . 69
3.4 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.2 Watershed Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4.3 Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.1 Registration Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5.2 Similarity and Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . 79

© 2015 Taylor & Francis Group, LLC

Contents ix

3.5.3 Registration Optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.6 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.6.1 Object Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6.2 Feature Selection and Dimensionality Reduction . . . . . . . . . . . . . . 83
3.6.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 84
3.7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Mining of Sensor Data in Healthcare: A Survey 91

Daby Sow, Kiran K. Turaga, Deepak S. Turaga, and Michael Schmidt
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Mining Sensor Data in Medical Informatics: Scope and Challenges . . . . . . . . 93
4.2.1 Taxonomy of Sensors Used in Medical Informatics . . . . . . . . . . . . . 93
4.2.2 Challenges in Mining Medical Informatics Sensor Data . . . . . . . . . . . 94
4.3 Challenges in Healthcare Data Analysis . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.1 Acquisition Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 Preprocessing Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.3 Transformation Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.4 Modeling Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.5 Evaluation and Interpretation Challenges . . . . . . . . . . . . . . . . . . 98
4.3.6 Generic Systems Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4 Sensor Data Mining Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.4.1 Intensive Care Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . 100 Systems for Data Mining in Intensive Care . . . . . . . . . . . . 100 State-of-the-Art Analytics for Intensive Care Sensor Data
Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.2 Sensor Data Mining in Operating Rooms . . . . . . . . . . . . . . . . . . 103
4.4.3 General Mining of Clinical Sensor Data . . . . . . . . . . . . . . . . . . . 104
4.5 Nonclinical Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5.1 Chronic Disease and Wellness Management . . . . . . . . . . . . . . . . . 108
4.5.2 Activity Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.5.3 Reality Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.6 Summary and Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 117

5 Biomedical Signal Analysis 127

Abhijit Patil, Rajesh Langoju, Suresh Joel, Bhushan D. Patil, and Sahika Genc
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 Types of Biomedical Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.1 Action Potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.2 Electroneurogram (ENG) . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.3 Electromyogram (EMG) . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.4 Electrocardiogram (ECG) . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.5 Electroencephalogram (EEG) . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2.6 Electrogastrogram (EGG) . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2.7 Phonocardiogram (PCG) . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.2.8 Other Biomedical Signals . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3 ECG Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.1 Power Line Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Adaptive 60-Hz Notch Filter . . . . . . . . . . . . . . . . . . . 138 Nonadaptive 60-Hz Notch Filter . . . . . . . . . . . . . . . . . 138 Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . 139

© 2015 Taylor & Francis Group, LLC

x Contents

5.3.2 Electrode Contact Noise and Motion Artifacts . . . . . . . . . . . . . . . . 140 The Least-Mean Squares (LMS) Algorithm . . . . . . . . . . . . 142 The Adaptive Recurrent Filter (ARF) . . . . . . . . . . . . . . . 144
5.3.3 QRS Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.4 Denoising of Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.4.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 148 Denoising for a Single-Channel ECG . . . . . . . . . . . . . . . 149 Denoising for a Multichannel ECG . . . . . . . . . . . . . . . . 150 Denoising Using Truncated Singular Value Decomposition . . . 151
5.4.2 Wavelet Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.4.3 Wavelet Wiener Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.4.4 Pilot Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.5 Multivariate Biomedical Signal Analysis . . . . . . . . . . . . . . . . . . . . . . 156
5.5.1 Non-Gaussianity through Kurtosis: FastICA . . . . . . . . . . . . . . . . . 159
5.5.2 Non-Gaussianity through Negentropy: Infomax . . . . . . . . . . . . . . . 159
5.5.3 Joint Approximate Diagonalization of Eigenmatrices: JADE . . . . . . . . 159
5.6 Cross-Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.6.1 Preprocessing of rs-fMRI . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Slice Acquisition Time Correction . . . . . . . . . . . . . . . . 163 Motion Correction . . . . . . . . . . . . . . . . . . . . . . . . . 163 Registration to High Resolution Image . . . . . . . . . . . . . . 164 Registration to Atlas . . . . . . . . . . . . . . . . . . . . . . . . 165 Physiological Noise Removal . . . . . . . . . . . . . . . . . . . 166 Spatial Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . 168 Temporal Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.6.2 Methods to Study Connectivity . . . . . . . . . . . . . . . . . . . . . . . 169 Connectivity between Two Regions . . . . . . . . . . . . . . . . 170 Functional Connectivity Maps . . . . . . . . . . . . . . . . . . . 171 Graphs (Connectivity between Multiple Nodes) . . . . . . . . . 171 Effective Connectivity . . . . . . . . . . . . . . . . . . . . . . . 172 Parcellation (Clustering) . . . . . . . . . . . . . . . . . . . . . . 172 Independent Component Analysis for rs-fMRI . . . . . . . . . . 173
5.6.3 Dynamics of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.7 Recent Trends in Biomedical Signal Analysis . . . . . . . . . . . . . . . . . . . . 174
5.8 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6 Genomic Data Analysis for Personalized Medicine 187

Juan Cui
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2 Genomic Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2.1 Microarray Data Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2.2 Next-Generation Sequencing Era . . . . . . . . . . . . . . . . . . . . . . 189
6.2.3 Public Repositories for Genomic Data . . . . . . . . . . . . . . . . . . . . 190
6.3 Methods and Standards for Genomic Data Analysis . . . . . . . . . . . . . . . . . 192
6.3.1 Normalization and Quality Control . . . . . . . . . . . . . . . . . . . . . 193
6.3.2 Differential Expression Detection . . . . . . . . . . . . . . . . . . . . . . 195
6.3.3 Clustering and Classification . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.3.4 Pathway and Gene Set Enrichment Analysis . . . . . . . . . . . . . . . . . 196
6.3.5 Genome Sequencing Analysis . . . . . . . . . . . . . . . . . . . . . . . . 197
6.3.6 Public Tools for Genomic Data Analysis . . . . . . . . . . . . . . . . . . . 199

© 2015 Taylor & Francis Group, LLC

Contents xi

6.4 Types of Computational Genomics Studies towards Personalized

Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.4.1 Discovery of Biomarker and Molecular Signatures . . . . . . . . . . . . . 201
6.4.2 Genome-Wide Association Study (GWAS) . . . . . . . . . . . . . . . . . 203
6.4.3 Discovery of Drug Targets . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.4.4 Discovery of Disease Relevant Gene Networks . . . . . . . . . . . . . . . 205
6.5 Genetic and Genomic Studies to the Bedside of Personalized Medicine . . . . . . 206
6.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

7 Natural Language Processing and Data Mining for Clinical Text 219
Kalpana Raja and Siddhartha R. Jonnalagadda
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.2 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2.2 Report Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2.3 Text Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.2.4 Core NLP Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Morphological Analysis . . . . . . . . . . . . . . . . . . . . . . 224 Lexical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Syntactic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 224 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 225 Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.3 Mining Information from Clinical Text . . . . . . . . . . . . . . . . . . . . . . . 226
7.3.1 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Context-Based Extraction . . . . . . . . . . . . . . . . . . . . . 230 Extracting Codes . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.3.2 Current Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Rule-Based Approaches . . . . . . . . . . . . . . . . . . . . . . 234 Pattern-Based Algorithms . . . . . . . . . . . . . . . . . . . . . 235 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . 235
7.3.3 Clinical Text Corpora and Evaluation Metrics . . . . . . . . . . . . . . . . 235
7.3.4 Informatics for Integrating Biology and the Bedside (i2b2) . . . . . . . . . 237
7.4 Challenges of Processing Clinical Reports . . . . . . . . . . . . . . . . . . . . . . 238
7.4.1 Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.2 Confidentiality of Clinical Text . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.3 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.4 Diverse Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.4.5 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.4.6 Intra- and Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.4.7 Interpreting Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5 Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5.1 General Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5.2 EHR and Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.5.3 Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

© 2015 Taylor & Francis Group, LLC

xii Contents

8 Mining the Biomedical Literature 251

Claudiu Mihăilă, Riza Batista-Navarro, Noha Alnazzawi, Georgios
Kontonatsios, Ioannis Korkontzelos, Rafal Rak, Paul Thompson, and Sophia
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.2.1 Corpora Types and Formats . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.2.2 Annotation Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.2.3 Reliability of Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.3 Terminology Acquisition and Management . . . . . . . . . . . . . . . . . . . . . 259
8.3.1 Term Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
8.3.2 Term Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
8.4 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
8.4.1 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Approaches to Named Entity Recognition . . . . . . . . . . . . 263 Progress and Challenges . . . . . . . . . . . . . . . . . . . . . . 265
8.4.2 Coreference Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Biomedical Coreference-Annotated Corpora . . . . . . . . . . . 266 Approaches to Biomedical Coreference Resolution . . . . . . . . 267 Advancing Biomedical Coreference Resolution . . . . . . . . . 268
8.4.3 Relation and Event Extraction . . . . . . . . . . . . . . . . . . . . . . . . 269
8.5 Discourse Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
8.5.1 Discourse Relation Recognition . . . . . . . . . . . . . . . . . . . . . . . 273
8.5.2 Functional Discourse Annotation . . . . . . . . . . . . . . . . . . . . . . 274 Annotation Schemes and Corpora . . . . . . . . . . . . . . . . . 275 Discourse Cues . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Automated Recognition of Discourse Information . . . . . . . . 277
8.6 Text Mining Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.7.1 Semantic Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.7.2 Statistical Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . 281
8.7.3 Semi-Automatic Data Curation . . . . . . . . . . . . . . . . . . . . . . . . 282
8.8 Integration with Clinical Text Mining . . . . . . . . . . . . . . . . . . . . . . . . 283
8.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

9 Social Media Analytics for Healthcare 309

Alexander Kotov
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
9.2 Social Media Analysis for Detection and Tracking of Infectious Disease
Outbreaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
9.2.1 Outbreak Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Using Search Query and Website Access Logs . . . . . . . . . . 313 Using Twitter and Blogs . . . . . . . . . . . . . . . . . . . . . . 314
9.2.2 Analyzing and Tracking Outbreaks . . . . . . . . . . . . . . . . . . . . . 319
9.2.3 Syndromic Surveillance Systems Based on Social Media . . . . . . . . . . 320
9.3 Social Media Analysis for Public Health Research . . . . . . . . . . . . . . . . . 322
9.3.1 Topic Models for Analyzing Health-Related Content . . . . . . . . . . . . 323
9.3.2 Detecting Reports of Adverse Medical Events and Drug Reactions . . . . . 325
9.3.3 Characterizing Life Style and Well-Being . . . . . . . . . . . . . . . . . . 327
9.4 Analysis of Social Media Use in Healthcare . . . . . . . . . . . . . . . . . . . . . 328

© 2015 Taylor & Francis Group, LLC

Contents xiii

9.4.1 Social Media as a Source of Public Health Information . . . . . . . . . . . 328

9.4.2 Analysis of Data from Online Doctor and Patient Communities . . . . . . 329
9.5 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 333

II Advanced Data Analytics for Healthcare 341

10 A Review of Clinical Prediction Models 343
Chandan K. Reddy and Yan Li
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
10.2 Basic Statistical Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.2.2 Generalized Additive Model . . . . . . . . . . . . . . . . . . . . . . . . . 346
10.2.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Multiclass Logistic Regression . . . . . . . . . . . . . . . . . . 347 Polytomous Logistic Regression . . . . . . . . . . . . . . . . . 347 Ordered Logistic Regression . . . . . . . . . . . . . . . . . . . 348
10.2.4 Bayesian Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Naı̈ve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . 349 Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . 349
10.2.5 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10.3 Alternative Clinical Prediction Models . . . . . . . . . . . . . . . . . . . . . . . 351
10.3.1 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
10.3.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 352
10.3.3 Cost-Sensitive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
10.3.4 Advanced Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . 354 Multiple Instance Learning . . . . . . . . . . . . . . . . . . . . 354 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 354 Sparse Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 355
10.4 Survival Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
10.4.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Survival Data and Censoring . . . . . . . . . . . . . . . . . . . 356 Survival and Hazard Function . . . . . . . . . . . . . . . . . . . 357
10.4.2 Nonparametric Survival Analysis . . . . . . . . . . . . . . . . . . . . . . 359 Kaplan–Meier Curve and Clinical Life Table . . . . . . . . . . . 359 Mantel–Haenszel Test . . . . . . . . . . . . . . . . . . . . . . . 361
10.4.3 Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . . . . . 362 The Basic Cox Model . . . . . . . . . . . . . . . . . . . . . . . 362 Estimation of the Regression Parameters . . . . . . . . . . . . . 363 Penalized Cox Models . . . . . . . . . . . . . . . . . . . . . . . 363
10.4.4 Survival Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Survival Tree Building Methods . . . . . . . . . . . . . . . . . . 365 Ensemble Methods with Survival Trees . . . . . . . . . . . . . . 365
10.5 Evaluation and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10.5.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Brier Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Other Evaluation Metrics Based on Confusion Matrix . . . . . . 367 ROC Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 C-index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

© 2015 Taylor & Francis Group, LLC

xiv Contents

10.5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Internal Validation Methods . . . . . . . . . . . . . . . . . . . . 370 External Validation Methods . . . . . . . . . . . . . . . . . . . 371
10.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

11 Temporal Data Mining for Healthcare Data 379

Iyad Batal
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
11.2 Association Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
11.2.1 Classical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
11.2.2 Temporal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
11.3 Temporal Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
11.3.1 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . 384 Medical Applications . . . . . . . . . . . . . . . . . . . . . . . 385
11.3.2 Time-Interval Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . 386 Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . 386 Medical Applications . . . . . . . . . . . . . . . . . . . . . . . 388
11.4 Sensor Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
11.5 Other Temporal Modeling Methods . . . . . . . . . . . . . . . . . . . . . . . . . 393
11.5.1 Convolutional Event Pattern Discovery . . . . . . . . . . . . . . . . . . . 393
11.5.2 Patient Prognostic via Case-Based Reasoning . . . . . . . . . . . . . . . . 394
11.5.3 Disease Progression Modeling . . . . . . . . . . . . . . . . . . . . . . . . 395
11.6 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
11.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

12 Visual Analytics for Healthcare 403

David Gotz, Jesus Caban, and Annie T. Chen
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
12.2 Introduction to Visual Analytics and Medical Data Visualization . . . . . . . . . . 404
12.2.1 Clinical Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
12.2.2 Standard Techniques to Visualize Medical Data . . . . . . . . . . . . . . . 405
12.2.3 High-Dimensional Data Visualization . . . . . . . . . . . . . . . . . . . . 409
12.2.4 Visualization of Imaging Data . . . . . . . . . . . . . . . . . . . . . . . . 411
12.3 Visual Analytics in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
12.3.1 Visual Analytics in Public Health and Population Research . . . . . . . . . 413 Geospatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . 413 Temporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 415 Beyond Spatio-Temporal Visualization . . . . . . . . . . . . . . 416
12.3.2 Visual Analytics for Clinical Workflow . . . . . . . . . . . . . . . . . . . 417
12.3.3 Visual Analytics for Clinicians . . . . . . . . . . . . . . . . . . . . . . . . 419 Temporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 419 Patient Progress and Guidelines . . . . . . . . . . . . . . . . . . 420 Other Clinical Methods . . . . . . . . . . . . . . . . . . . . . . 420
12.3.4 Visual Analytics for Patients . . . . . . . . . . . . . . . . . . . . . . . . . 421 Assisting Comprehension . . . . . . . . . . . . . . . . . . . . . 422 Condition Management . . . . . . . . . . . . . . . . . . . . . . 422 Integration into Healthcare Contexts . . . . . . . . . . . . . . . 423
12.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

© 2015 Taylor & Francis Group, LLC

Contents xv

13 Predictive Models for Integrating Clinical and Genomic Data 433

Sanjoy Dey, Rohit Gupta, Michael Steinbach, and Vipin Kumar
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
13.1.1 What Is Clinicogenomic Integration? . . . . . . . . . . . . . . . . . . . . 435
13.1.2 Different Aspects of Clinicogenomic Studies . . . . . . . . . . . . . . . . 436
13.2 Issues and Challenges in Integrating Clinical and Genomic Data . . . . . . . . . . 436
13.3 Different Types of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
13.3.1 Stages of Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Early Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Late Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Intermediate Integration . . . . . . . . . . . . . . . . . . . . . . 440
13.3.2 Stage of Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . 441 Two-Step Methods . . . . . . . . . . . . . . . . . . . . . . . . . 441 Combined Clinicogenomic Models . . . . . . . . . . . . . . . . 442
13.4 Different Goals of Integrative Studies . . . . . . . . . . . . . . . . . . . . . . . . 443
13.4.1 Improving the Prognostic Power Only . . . . . . . . . . . . . . . . . . . . 443 Two-Step Linear Models . . . . . . . . . . . . . . . . . . . . . 443 Two-Step Nonlinear Models . . . . . . . . . . . . . . . . . . . . 444 Single-Step Sparse Models . . . . . . . . . . . . . . . . . . . . 445 Comparative Studies . . . . . . . . . . . . . . . . . . . . . . . . 445
13.4.2 Assessing the Additive Prognostic Effect of Clinical Variables over the Ge-
nomic Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Developing Clinicogenomic Models Biased Towards Clinical
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Hypothesis Testing Frameworks . . . . . . . . . . . . . . . . . . 447 Incorporating Prior Knowledge . . . . . . . . . . . . . . . . . . 448
13.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
13.5.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
13.5.2 Validation Procedures for Predictive Models . . . . . . . . . . . . . . . . . 450
13.5.3 Assessing Additional Predictive Values . . . . . . . . . . . . . . . . . . . 451
13.5.4 Reliability of the Clinicogenomic Integrative Studies . . . . . . . . . . . . 452
13.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

14 Information Retrieval for Healthcare 467

William R. Hersh
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
14.2 Knowledge-Based Information in Healthcare and Biomedicine . . . . . . . . . . . 468
14.2.1 Information Needs and Seeking . . . . . . . . . . . . . . . . . . . . . . . 469
14.2.2 Changes in Publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
14.3 Content of Knowledge-Based Information Resources . . . . . . . . . . . . . . . . 471
14.3.1 Bibliographic Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
14.3.2 Full-Text Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
14.3.3 Annotated Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
14.3.4 Aggregated Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
14.4 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
14.4.1 Controlled Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . 476
14.4.2 Manual Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
14.4.3 Automated Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
14.5 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
14.5.1 Exact-Match Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
14.5.2 Partial-Match Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486

© 2015 Taylor & Francis Group, LLC

xvi Contents

14.5.3 Retrieval Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

14.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
14.6.1 System-Oriented Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 490
14.6.2 User-Oriented Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 493
14.7 Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
14.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496

15 Privacy-Preserving Data Publishing Methods in Healthcare 507

Yubin Park and Joydeep Ghosh
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
15.2 Data Overview and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 509
15.3 Privacy-Preserving Publishing Methods . . . . . . . . . . . . . . . . . . . . . . . 511
15.3.1 Generalization and Suppression . . . . . . . . . . . . . . . . . . . . . . . 511
15.3.2 Synthetic Data Using Multiple Imputation . . . . . . . . . . . . . . . . . . 516
15.3.3 PeGS: Perturbed Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . 517
15.3.4 Randomization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
15.3.5 Data Swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
15.4 Challenges with Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
15.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

III Applications and Practical Systems for Healthcare 531

16 Data Analytics for Pervasive Health 533
Giovanni Acampora, Diane J. Cook, Parisa Rashidi, and Athanasios V. Vasilakos
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
16.2 Supporting Infrastructure and Technology . . . . . . . . . . . . . . . . . . . . . . 535
16.2.1 BANs: Body Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . 535
16.2.2 Dense/Mesh Sensor Networks for Smart Living Environments . . . . . . . 537
16.2.3 Sensor Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 Ambient Sensor Architecture . . . . . . . . . . . . . . . . . . . 539 BANs: Hardware and Devices . . . . . . . . . . . . . . . . . . . 539 Recent Trends in Sensor Technology . . . . . . . . . . . . . . . 541
16.3 Basic Analytic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
16.3.1 Supervised Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
16.3.2 Unsupervised Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 544
16.3.3 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
16.4 Advanced Analytic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
16.4.1 Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Activity Models . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Activity Complexity . . . . . . . . . . . . . . . . . . . . . . . . 547
16.4.2 Behavioral Pattern Discovery . . . . . . . . . . . . . . . . . . . . . . . . 547
16.4.3 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
16.4.4 Planning and Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
16.4.5 Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
16.4.6 Anonymization and Privacy Preserving Techniques . . . . . . . . . . . . . 549
16.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
16.5.1 Continuous Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Continuous Health Monitoring . . . . . . . . . . . . . . . . . . 551 Continuous Behavioral Monitoring . . . . . . . . . . . . . . . . 551 Monitoring for Emergency Detection . . . . . . . . . . . . . . . 552
16.5.2 Assisted Living . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552

© 2015 Taylor & Francis Group, LLC

Contents xvii

16.5.3 Therapy and Rehabilitation . . . . . . . . . . . . . . . . . . . . . . . . . . 554

16.5.4 Persuasive Well-Being Applications . . . . . . . . . . . . . . . . . . . . . 556
16.5.5 Emotional Well-Being . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
16.5.6 Smart Hospitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
16.6 Conclusions and Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

17 Fraud Detection in Healthcare 577

Varun Chandola, Jack Schryver, and Sreenivas Sukumar
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
17.2 Understanding Fraud in the Healthcare System . . . . . . . . . . . . . . . . . . . 579
17.3 Definition and Types of Healthcare Fraud . . . . . . . . . . . . . . . . . . . . . . 580
17.4 Identifying Healthcare Fraud from Data . . . . . . . . . . . . . . . . . . . . . . . 582
17.4.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
17.4.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
17.5 Knowledge Discovery-Based Solutions for Identifying Fraud . . . . . . . . . . . . 585
17.5.1 Identifying Fraudulent Episodes . . . . . . . . . . . . . . . . . . . . . . . 585
17.5.2 Identifying Fraudulent Claims . . . . . . . . . . . . . . . . . . . . . . . . 586 A Bayesian Approach to Identifying Fraudulent Claims . . . . . 587 Non-Bayesian Approaches . . . . . . . . . . . . . . . . . . . . 587
17.5.3 Identifying Fraudulent Providers . . . . . . . . . . . . . . . . . . . . . . . 588 Analyzing Networks for Identifying Coordinated Frauds . . . . . 588 Constructing a Provider Social Network . . . . . . . . . . . . . 589 Relevance for Identifying Fraud . . . . . . . . . . . . . . . . . . 591
17.5.4 Temporal Modeling for Identifying Fraudulent Behavior . . . . . . . . . . 593 Change-Point Detection with Statistical Process Control Tech-
niques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Anomaly Detection Using the CUSUM Statistic . . . . . . . . . 594 Supervised Learning for Classifying Provider Profiles . . . . . . 595
17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596

18 Data Analytics for Pharmaceutical Discoveries 599

Shobeir Fakhraei, Eberechukwu Onukwugha, and Lise Getoor
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
18.1.1 Pre-marketing Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
18.1.2 Post-marketing Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
18.1.3 Data Sources and Other Applications . . . . . . . . . . . . . . . . . . . . 602
18.2 Chemical and Biological Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
18.2.1 Constructing a Network Representation . . . . . . . . . . . . . . . . . . . 603
18.2.2 Interaction Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . 605 Single Similarity–Based Methods . . . . . . . . . . . . . . . . . 605 Multiple Similarity–Based Methods . . . . . . . . . . . . . . . . 607
18.3 Spontaneous Reporting Systems (SRSs) . . . . . . . . . . . . . . . . . . . . . . . 608
18.3.1 Disproportionality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 609
18.3.2 Multivariate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
18.4 Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
18.5 Patient-Generated Data on the Internet . . . . . . . . . . . . . . . . . . . . . . . . 612
18.6 Biomedical Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
18.7 Summary and Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 615

© 2015 Taylor & Francis Group, LLC

xviii Contents

19 Clinical Decision Support Systems 625

Martin Alther and Chandan K. Reddy
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
19.2 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
19.2.1 Early CDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
19.2.2 CDSS Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
19.3 Various Types of CDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
19.3.1 Knowledge-Based CDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Inference Engine . . . . . . . . . . . . . . . . . . . . . . . . . . 632 Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . 633 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
19.3.2 Nonknowledge-Based CDSS . . . . . . . . . . . . . . . . . . . . . . . . . 634 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 634 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 635
19.4 Decision Support during Care Provider Order Entry . . . . . . . . . . . . . . . . 635
19.5 Diagnostic Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
19.6 Human-Intensive Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
19.7 Challenges of CDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
19.7.1 The Grand Challenges of CDSS . . . . . . . . . . . . . . . . . . . . . . . 640 Need to Improve the Effectiveness of CDSS . . . . . . . . . . . 640 Need to Create New CDSS Interventions . . . . . . . . . . . . . 641 Disseminate Existing CDS Knowledge and Interventions . . . . 641
19.7.2 R.L. Engle’s Critical and Non-Critical CDS Challenges . . . . . . . . . . . 642 Non-Critical Issues . . . . . . . . . . . . . . . . . . . . . . . . 642 Critical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
19.7.3 Technical Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Adding Structure to Medical Knowledge . . . . . . . . . . . . . 643 Knowledge Representation Formats . . . . . . . . . . . . . . . . 644 Data Representation . . . . . . . . . . . . . . . . . . . . . . . . 644 Special Data Types . . . . . . . . . . . . . . . . . . . . . . . . 645
19.7.4 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 Rule-Based and Early Bayesian Systems . . . . . . . . . . . . . 646 Causal Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . 646 Probabilistic Reasoning . . . . . . . . . . . . . . . . . . . . . . 647 Case-Based Reasoning . . . . . . . . . . . . . . . . . . . . . . 647
19.7.5 Human–Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . 648
19.8 Legal and Ethical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
19.8.1 Legal Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
19.8.2 Regulation of Decision Support Software . . . . . . . . . . . . . . . . . . 650
19.8.3 Ethical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
19.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

20 Computer-Assisted Medical Image Analysis Systems 657

Shu Liao, Shipeng Yu, Matthias Wolf, Gerardo Hermosillo, Yiqiang Zhan,
Yoshihisa Shinagawa, Zhigang Peng, Xiang Sean Zhou, Luca Bogoni, and
Marcos Salganicoff
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
20.2 Computer-Aided Diagnosis/Detection of Diseases . . . . . . . . . . . . . . . . . 660
20.2.1 Lung Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

© 2015 Taylor & Francis Group, LLC

Contents xix

20.2.2 Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

20.2.3 Colon Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
20.2.4 Pulmonary Embolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
20.3 Medical Imaging Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
20.3.1 Automatic Prostate T2 MRI Segmentation . . . . . . . . . . . . . . . . . . 662
20.3.2 Robust Spine Labeling for Spine Imaging Planning . . . . . . . . . . . . . 666
20.3.3 Joint Space Measurement in the Knee . . . . . . . . . . . . . . . . . . . . 671
20.3.4 Brain PET Attenuation Correction without CT . . . . . . . . . . . . . . . 673
20.3.5 Saliency-Based Rotation Invariant Descriptor for Wrist Detection in Whole-
Body CT images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
20.3.6 PET MR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
20.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678

21 Mobile Imaging and Analytics for Biomedical Data 685

Stephan M. Jonas and Thomas M. Deserno
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
21.2 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
21.2.1 Projection Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
21.2.2 Cross-Sectional Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
21.2.3 Functional Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
21.2.4 Mobile Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
21.3 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
21.3.1 Visualization Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
21.3.2 Output Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
21.3.3 2D Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
21.3.4 3D Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
21.3.5 Mobile Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
21.4 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
21.4.1 Preprocessing and Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 700
21.4.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
21.4.3 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
21.4.4 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
21.4.5 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
21.4.6 Evaluation of Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . 705
21.4.7 Mobile Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
21.5 Image Management and Communication . . . . . . . . . . . . . . . . . . . . . . 709
21.5.1 Standards for Communication . . . . . . . . . . . . . . . . . . . . . . . . 709
21.5.2 Archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
21.5.3 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
21.5.4 Mobile Image Management . . . . . . . . . . . . . . . . . . . . . . . . . 711
21.6 Summary and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 713

Index 719

© 2015 Taylor & Francis Group, LLC

© 2015 Taylor & Francis Group, LLC
Editor Biographies

Chandan K. Reddy is an Associate Professor in the Department of Computer Science at Wayne

State University. He received his PhD from Cornell University and MS from Michigan State Univer-
sity. His primary research interests are in the areas of data mining and ma-
chine learning with applications to healthcare, bioinformatics, and social
network analysis. His research is funded by the National Science Founda-
tion, the National Institutes of Health, Department of Transportation, and
the Susan G. Komen for the Cure Foundation. He has published over 50
peer-reviewed articles in leading conferences and journals. He received the
Best Application Paper Award at the ACM SIGKDD conference in 2010
and was a finalist of the INFORMS Franz Edelman Award Competition in
2011. He is a senior member of IEEE and a life member of the ACM.

Charu C. Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T. J.

Watson Research Center in Yorktown Heights, New York. He completed his BS from IIT Kan-
pur in 1993 and his PhD from the Massachusetts Institute of Technology in 1996. He has
published more than 250 papers in refereed conferences and journals, and
has applied for or been granted more than 80 patents. He is an author or
editor of 13 books, including the first comprehensive book on outlier anal-
ysis. Because of the commercial value of his patents, he has thrice been
designated a Master Inventor at IBM. He is a recipient of an IBM Corporate
Award (2003) for his work on bioterrorist threat detection in data streams,
a recipient of the IBM Outstanding Innovation Award (2008) for his scien-
tific contributions to privacy technology, a recipient of the IBM Outstanding
Technical Achievement Award (2009) for his work on data streams, and a
recipient of an IBM Research Division Award (2008) for his contributions to System S. He also re-
ceived the EDBT 2014 Test of Time Award for his work on condensation-based privacy-preserving
data mining. He has served as conference chair and associate editor at many reputed conferences
and journals in data mining, general co-chair of the IEEE Big Data Conference (2014), and is editor-
in-chief of the ACM SIGKDD Explorations. He is a fellow of the ACM, SIAM and the IEEE, for
“contributions to knowledge discovery and data mining algorithms.”

© 2015 Taylor & Francis Group, LLC
© 2015 Taylor & Francis Group, LLC

Giovanni Acampora Diane J. Cook

Nottingham Trent University Washington State University
Nottingham, UK Pullman, WA

Charu C. Aggarwal Juan Cui

IBM T. J. Watson Research Center University of Nebraska-Lincoln
Yorktown Heights, New York Lincoln, NE

Noha Alnazzawi Thomas M. Deserno

University of Manchester RWTH Aachen University
Manchester, UK Aachen, Germany

Martin Alther Sanjoy Dey

Wayne State University University of Minnesota
Detroit, MI Minneapolis, MN

Sophia Ananiadou Shobeir Fakhraei

University of Manchester University of Maryland
Manchester, UK College Park, MD

Iyad Batal Sahika Genc

General Electric Global Research GE Global Research
San Ramon, CA Niskayuna, NY

Riza Batista-Navarro Lise Getoor

University of Manchester University of California
Manchester, UK Santa Cruz, CA

Luca Bogoni Joydeep Ghosh

Siemens Medical Solutions The University of Texas at Austin
Malvern, PA Austin, TX

Jesus Caban David Gotz

Walter Reed National Military Medical Center University of North Carolina at Chapel Hill
Bethesda, MD Chapel Hill, NC

Varun Chandola Rohit Gupta

State University of New York at Buffalo University of Minnesota
Buffalo, NY Minneapolis, MN

Annie T. Chen Sandeep Gupta

University of North Carolina at Chapel Hill GE Global Research
Chapel Hill, NC Niskayuna, NY

© 2015 Taylor & Francis Group, LLC
xxiv Healthcare Data Analytics

Gerardo Hermosillo Eberechukwu Onukwugha

Siemens Medical Solutions University of Maryland
Malvern, PA Baltimore, MD
William R. Hersh
Dirk Padfield
Oregon Health & Science University
Portland, OR GE Global Research
Niskayuna, NY
Suresh Joel
GE Global Research Yubin Park
Bangalore, India The University of Texas at Austin
Austin, TX
Stephan M. Jonas
RWTH Aachen University
Abhijit Patil
Aachen, Germany
GE Global Research
Siddhartha R. Jonnalagadda Bangalore, India
Northwestern University
Chicago, IL Bhushan D. Patil
GE Global Research
Georgios Kontonatsios Bangalore, India
University of Manchester
Manchester, UK Zhigang Peng
Siemens Medical Solutions
Ioannis Korkontzelos
Malvern, PA
University of Manchester
Manchester, UK
Rajiur Rahman
Alexander Kotov Wayne State University
Wayne State University Detroit, MI
Detroit, MI
Kalpana Raja
Vipin Kumar Northwestern University
University of Minnesota Chicago, IL
Minneapolis, MN
Rafal Rak
Rajesh Langoju University of Manchester
GE Global Research Manchester, UK
Bangalore, India
Parisa Rashidi
Yan Li
University of Florida
Wayne State University
Gainesville, FL
Detroit, MI

Shu Liao Chandan K. Reddy

Siemens Medical Solutions Wayne State University
Malvern, PA Detroit, MI

Paulo Mendonca Marcos Salganicoff

GE Global Research Siemens Medical Solutions
Niskayuna, NY Malvern, PA

Claudiu Mihăilă Michael Schmidt

University of Manchester Columbia University Medical Center
Manchester, UK New York, NY

© 2015 Taylor & Francis Group, LLC

Contributors xxv

Jack Schryver Deepak S. Turaga

Oak Ridge National Laboratory IBM T. J. Watson Research Center
Oakridge, TN Yorktown Heights, NY
Xiang Sean Zhou
Siemens Medical Solutions Kiran K. Turaga
Malvern, PA Medical College of Wisconsin
Milwaukee, WI
Yoshihisa Shinagawa
Siemens Medical Solutions Athanasios V. Vasilakos
Malvern, PA University of Western Macedonia
Daby Sow Kozani, Greece
IBM T. J. Watson Research Center
Yorktown Heights, NY Matthias Wolf
Siemens Medical Solutions
Michael Steinbach
Malvern, PA
University of Minnesota
Minneapolis, MN
Shipeng Yu
Sreenivas Sukumar Siemens Medical Solutions
Oak Ridge National Laboratory Malvern, PA
Oakridge, TN
Paul Thompson Yiqiang Zhan
University of Manchester Siemens Medical Solutions
Manchester, UK Malvern, PA

© 2015 Taylor & Francis Group, LLC

© 2015 Taylor & Francis Group, LLC

Innovations in computing technologies have revolutionized healthcare in recent years. The analyt-
ical style of reasoning has not only changed the way in which information is collected and stored
but has also played an increasingly important role in the management and delivery of healthcare. In
particular, data analytics has emerged as a promising tool for solving problems in various healthcare-
related disciplines. This book will present a comprehensive review of data analytics in the field of
healthcare. The goal is to provide a platform for interdisciplinary researchers to learn about the
fundamental principles, algorithms, and applications of intelligent data acquisition, processing, and
analysis of healthcare data. This book will provide readers with an understanding of the vast num-
ber of analytical techniques for healthcare problems and their relationships with one another. This
understanding includes details of specific techniques and required combinations of tools to design
effective ways of handling, retrieving, analyzing, and making use of healthcare data. This book
will provide a unique perspective of healthcare related opportunities for developing new computing
From a researcher and practitioner perspective, a major challenge in healthcare is its interdis-
ciplinary nature. The field of healthcare has often seen advances coming from diverse disciplines
such as databases, data mining, information retrieval, image processing, medical researchers, and
healthcare practitioners. While this interdisciplinary nature adds to the richness of the field, it also
adds to the challenges in making significant advances. Computer scientists are usually not trained in
domain-specific medical concepts, whereas medical practitioners and researchers also have limited
exposure to the data analytics area. This has added to the difficulty in creating a coherent body of
work in this field. The result has often been independent lines of work from completely different
perspectives. This book is an attempt to bring together these diverse communities by carefully and
comprehensively discussing the most relevant contributions from each domain.
The book provides a comprehensive overview of the healthcare data analytics field as it stands
today, and to educate the community about future research challenges and opportunities. Even
though the book is structured as an edited collection of chapters, special care was taken during the
creation of the book to cover healthcare topics exhaustively by coordinating the contributions from
various authors. Focus was also placed on reviews and surveys rather than individual research results
in order to emphasize comprehensiveness in coverage. Each book chapter is written by prominent
researchers and experts working in the healthcare domain. The chapters in the book are divided into
three major categories:
• Healthcare Data Sources and Basic Analytics: These chapters discuss the details about
the various healthcare data sources and the analytical techniques that are widely used in the
processing and analysis of such data. The various forms of patient data include electronic
health records, biomedical images, sensor data, biomedical signals, genomic data, clinical
text, biomedical literature, and data gathered from social media.
• Advanced Data Analytics for Healthcare: These chapters deal with the advanced data ana-
lytical methods focused on healthcare. These include the clinical prediction models, temporal
pattern mining methods, and visual analytics. In addition, other advanced methods such as
data integration, information retrieval, and privacy-preserving data publishing will also be

© 2015 Taylor & Francis Group, LLC
xxviii Healthcare Data Analytics

• Applications and Practical Systems for Healthcare: These chapters focus on the applica-
tions of data analytics and the relevant practical systems. It will cover the applications of data
analytics to pervasive healthcare, fraud detection, and drug discovery. In terms of the practi-
cal systems, it covers clinical decision support systems, computer assisted medical imaging
systems, and mobile imaging systems.
It is hoped that this comprehensive book will serve as a compendium to students, researchers,
and practitioners. Each chapter is structured as a “survey-style” article discussing the prominent
research issues and the advances made on that research topic. Special effort was taken in ensuring
that each chapter is self-contained and the background required from other chapters is minimal.
Finally, we hope that the topics discussed in this book will lead to further developments in the field
of healthcare data analytics that can help in improving the health and well-being of people. We be-
lieve that research in the field of healthcare data analytics will continue to grow in the years to come.

Acknowledgment: This work was supported in part by National Science Foundation grant
No. 1231742.

© 2015 Taylor & Francis Group, LLC

Chapter 1
An Introduction to Healthcare Data Analytics

Chandan K. Reddy
Department of Computer Science
Wayne State University
Detroit, MI
[email protected]

Charu C. Aggarwal
IBM T. J. Watson Research Center
Yorktown Heights, NY
[email protected]

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Healthcare Data Sources and Basic Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Biomedical Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Sensor Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Biomedical Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Genomic Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Clinical Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.7 Mining Biomedical Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.8 Social Media Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Advanced Data Analytics for Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Clinical Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 Visual Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 Clinico–Genomic Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.5 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.6 Privacy-Preserving Data Publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Applications and Practical Systems for Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Data Analytics for Pervasive Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Healthcare Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Data Analytics for Pharmaceutical Discoveries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.4 Clinical Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.5 Computer-Aided Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.6 Mobile Imaging for Biomedical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Resources for Healthcare Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

© 2015 Taylor & Francis Group, LLC
2 Healthcare Data Analytics

1.1 Introduction
While the healthcare costs have been constantly rising, the quality of care provided to the pa-
tients in the United States have not seen considerable improvements. Recently, several researchers
have conducted studies which showed that by incorporating the current healthcare technologies, they
are able to reduce mortality rates, healthcare costs, and medical complications at various hospitals.
In 2009, the US government enacted the Health Information Technology for Economic and Clinical
Health Act (HITECH) that includes an incentive program (around $27 billion) for the adoption and
meaningful use of Electronic Health Records (EHRs).
The recent advances in information technology have led to an increasing ease in the ability to
collect various forms of healthcare data. In this digital world, data becomes an integral part of health-
care. A recent report on Big Data suggests that the overall potential of healthcare data will be around
$300 billion [12]. Due to the rapid advancements in the data sensing and acquisition technologies,
hospitals and healthcare institutions have started collecting vast amounts of healthcare data about
their patients. Effectively understanding and building knowledge from healthcare data requires de-
veloping advanced analytical techniques that can effectively transform data into meaningful and
actionable information. General computing technologies have started revolutionizing the manner in
which medical care is available to the patients. Data analytics, in particular, forms a critical com-
ponent of these computing technologies. The analytical solutions when applied to healthcare data
have an immense potential to transform healthcare delivery from being reactive to more proactive.
The impact of analytics in the healthcare domain is only going to grow more in the next several
years. Typically, analyzing health data will allow us to understand the patterns that are hidden in
the data. Also, it will help the clinicians to build an individualized patient profile and can accurately
compute the likelihood of an individual patient to suffer from a medical complication in the near
Healthcare data is particularly rich and it is derived from a wide variety of sources such as
sensors, images, text in the form of biomedical literature/clinical notes, and traditional electronic
records. This heterogeneity in the data collection and representation process leads to numerous
challenges in both the processing and analysis of the underlying data. There is a wide diversity in the
techniques that are required to analyze these different forms of data. In addition, the heterogeneity
of the data naturally creates various data integration and data analysis challenges. In many cases,
insights can be obtained from diverse data types, which are otherwise not possible from a single
source of the data. It is only recently that the vast potential of such integrated data analysis methods
is being realized.
From a researcher and practitioner perspective, a major challenge in healthcare is its interdisci-
plinary nature. The field of healthcare has often seen advances coming from diverse disciplines such
as databases, data mining, information retrieval, medical researchers, and healthcare practitioners.
While this interdisciplinary nature adds to the richness of the field, it also adds to the challenges in
making significant advances. Computer scientists are usually not trained in domain-specific medical
concepts, whereas medical practitioners and researchers also have limited exposure to the mathe-
matical and statistical background required in the data analytics area. This has added to the difficulty
in creating a coherent body of work in this field even though it is evident that much of the available
data can benefit from such advanced analysis techniques. The result of such a diversity has often led
to independent lines of work from completely different perspectives. Researchers in the field of data
analytics are particularly susceptible to becoming isolated from real domain-specific problems, and
may often propose problem formulations with excellent technique but with no practical use. This
book is an attempt to bring together these diverse communities by carefully and comprehensively
discussing the most relevant contributions from each domain. It is only by bringing together these
diverse communities that the vast potential of data analysis methods can be harnessed.

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 3

Chapter 2:
Electronic Health Records

Chapter 9:
Chapter 3: Images
Social Media

Data Sources
Chapter 8:
Chapter 4: Sensors & Basic
Biomedical Literature

Chapter 7:
Chapter 5: Signals
Clinical Notes

Chapter 6: Genomic

Chapter 10:

Chapter 11: Chapter 15:

Temporal Data Mining Data Privacy


Chapter 12: Chapter 14:

Chapter 13:

Chapter 16:
Pervasive Health

Chapter 17: Chapter 21:

Chapter 18: Chapter 20:
Drug Discovery CAD Systems

Chapter 19:
Decision Support

FIGURE 1.1: The overall organization of the book’s contents.

© 2015 Taylor & Francis Group, LLC

4 Healthcare Data Analytics

Another major challenge that exists in the healthcare domain is the “data privacy gap” between
medical researchers and computer scientists. Healthcare data is obviously very sensitive because it
can reveal compromising information about individuals. Several laws in various countries, such as
the Health Insurance Portability and Accountability Act (HIPAA) in the United States, explicitly
forbid the release of medical information about individuals for any purpose, unless safeguards are
used to preserve privacy. Medical researchers have natural access to healthcare data because their
research is often paired with an actual medical practice. Furthermore, various mechanisms exist in
the medical domain to conduct research studies with voluntary participants. Such data collection is
almost always paired with anonymity and confidentiality agreements.
On the other hand, acquiring data is not quite as simple for computer scientists without a proper
collaboration with a medical practitioner. Even then, there are barriers in the acquisition of data.
Clearly, many of these challenges can be avoided if accepted protocols, privacy technologies, and
safeguards are in place. Therefore, this book will also address these issues. Figure 1.1 provides an
overview of the organization of the book’s contents. This book is organized into three parts:
1. Healthcare Data Sources and Basic Analytics: This part discusses the details of various
healthcare data sources and the basic analytical methods that are widely used in the pro-
cessing and analysis of such data. The various forms of patient data that is currently being
collected in both clinical and non-clinical environments will be studied. The clinical data will
have the structured electronic health records and biomedical images. Sensor data has been
receiving a lot attention recently. Techniques for mining sensor data and biomedical signal
analysis will be presented. Personalized medicine has gained a lot of importance due to the
advancements in genomic data. Genomic data analysis involves several statistical techniques.
These will also be elaborated. Patients’ in-hospital clinical data will also include a lot of un-
structured data in the form of clinical notes. In addition, the domain knowledge that can be
extracted by mining the biomedical literature, will also be discussed. The fundamental data
mining, machine learning, information retrieval, and natural language processing techniques
for processing these data types will be extensively discussed. Finally, behavioral data captured
through social media will also be discussed.
2. Advanced Data Analytics for Healthcare: This part deals with the advanced analytical meth-
ods focused on healthcare. This includes the clinical prediction models, temporal data mining
methods, and visual analytics. Integrating heterogeneous data such as clinical and genomic
data is essential for improving the predictive power of the data that will also be discussed.
Information retrieval techniques that can enhance the quality of biomedical search will be
presented. Data privacy is an extremely important concern in healthcare. Privacy-preserving
data publishing techniques will therefore be presented.
3. Applications and Practical Systems for Healthcare: This part focuses on the practical ap-
plications of data analytics and the systems developed using data analytics for healthcare
and clinical practice. Examples include applications of data analytics to pervasive healthcare,
fraud detection, and drug discovery. In terms of the practical systems, we will discuss the de-
tails about the clinical decision support systems, computer assisted medical imaging systems,
and mobile imaging systems.
These different aspects of healthcare are related to one another. Therefore, the chapters in each
of the aforementioned topics are interconnected. Where necessary, pointers are provided across
different chapters, depending on the underlying relevance. This chapter is organized as follows.
Section 1.2 discusses the main data sources that are commonly used and the basic techniques for
processing them. Section 1.3 discusses advanced techniques in the field of healthcare data analytics.
Section 1.4 discusses a number of applications of healthcare analysis techniques. An overview of
resources in the field of healthcare data analytics is presented in Section 1.5. Section 1.6 presents
the conclusions.

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 5

1.2 Healthcare Data Sources and Basic Analytics

In this section, the various data sources and their impact on analytical algorithms will be dis-
cussed. The heterogeneity of the sources for medical data mining is rather broad, and this creates
the need for a wide variety of techniques drawn from different domains of data analytics.

1.2.1 Electronic Health Records

Electronic health records (EHRs) contain a digitized version of a patient’s medical history. It
encompasses a full range of data relevant to a patient’s care such as demographics, problems, med-
ications, physician’s observations, vital signs, medical history, laboratory data, radiology reports,
progress notes, and billing data. Many EHRs go beyond a patient’s medical or treatment history and
may contain additional broader perspectives of a patient’s care. An important property of EHRs is
that they provide an effective and efficient way for healthcare providers and organizations to share
with one another. In this context, EHRs are inherently designed to be in real time and they can in-
stantly be accessed and edited by authorized users. This can be very useful in practical settings. For
example, a hospital or specialist may wish to access the medical records of the primary provider. An
electronic health record streamlines the workflow by allowing direct access to the updated records in
real time [30]. It can generate a complete record of a patient’s clinical encounter, and support other
care-related activities such as evidence-based decision support, quality management, and outcomes
reporting. The storage and retrieval of health-related data is more efficient using EHRs. It helps
to improve quality and convenience of patient care, increase patient participation in the healthcare
process, improve accuracy of diagnoses and health outcomes, and improve care coordination [29].
Various components of EHRs along with the advantages, barriers, and challenges of using EHRs
are discussed in Chapter 2.

1.2.2 Biomedical Image Analysis

Medical imaging plays an important role in modern-day healthcare due to its immense capability
in providing high-quality images of anatomical structures in human beings. Effectively analyzing
such images can be useful for clinicians and medical researchers since it can aid disease monitoring,
treatment planning, and prognosis [31]. The most popular imaging modalities used to acquire a
biomedical image are magnetic resonance imaging (MRI), computed tomography (CT), positron
emission tomography (PET), and ultrasound (U/S). Being able to look inside of the body without
hurting the patient and being able to view the human organs has tremendous implications on human
health. Such capabilities allow the physicians to better understand the cause of an illness or other
adverse conditions without cutting open the patient.
However, merely viewing such organs with the help of images is just the first step of the pro-
cess. The final goal of biomedical image analysis is to be able to generate quantitative information
and make inferences from the images that can provide far more insights into a medical condition.
Such analysis has major societal significance since it is the key to understanding biological systems
and solving health problems. However, it includes many challenges since the images are varied,
complex, and can contain irregular shapes with noisy values. A number of general categories of
research problems that arise in analyzing images are object detection, image segmentation, image
registration, and feature extraction. All these challenges when resolved will enable the generation
of meaningful analytic measurements that can serve as inputs to other areas of healthcare data ana-
lytics. Chapter 3 discusses a broad overview of the main medical imaging modalities along with a
wide range of image analysis approaches.

© 2015 Taylor & Francis Group, LLC

6 Healthcare Data Analytics

1.2.3 Sensor Data Analysis

Sensor data [2] is ubiquitous in the medical domain both for real time and for retrospective
analysis. Several forms of medical data collection instruments such as electrocardiogram (ECG),
and electroencaphalogram (EEG) are essentially sensors that collect signals from various parts of the
human body [32]. These collected data instruments are sometimes used for retrospective analysis,
but more often for real-time analysis. Perhaps, the most important use-case of real-time analysis
is in the context of intensive care units (ICUs) and real-time remote monitoring of patients with
specific medical conditions. In all these cases, the volume of the data to the processed can be rather
large. For example, in an ICU, it is not uncommon for the sensor to receive input from hundreds of
data sources, and alarms need to be triggered in real time. Such applications necessitate the use of
big-data frameworks and specialized hardware platforms. In remote-monitoring applications, both
the real-time events and a long-term analysis of various trends and treatment alternatives is of great
While rapid growth in sensor data offers significant promise to impact healthcare, it also intro-
duces a data overload challenge. Hence, it becomes extremely important to develop novel data ana-
lytical tools that can process such large volumes of collected data into meaningful and interpretable
knowledge. Such analytical methods will not only allow for better observing patients’ physiological
signals and help provide situational awareness to the bedside, but also provide better insights into
the inefficiencies in the healthcare system that may be the root cause of surging costs. The research
challenges associated with the mining of sensor data in healthcare settings and the sensor mining
applications and systems in both clinical and non-clinical settings is discussed in Chapter 4.

1.2.4 Biomedical Signal Analysis

Biomedical Signal Analysis consists of measuring signals from biological sources, the origin
of which lies in various physiological processes. Examples of such signals include the electroneu-
rogram (ENG), electromyogram (EMG), electrocardiogram (ECG), electroencephalogram (EEG),
electrogastrogram (EGG), phonocardiogram (PCG), and so on. The analysis of these signals is vital
in diagnosing the pathological conditions and in deciding an appropriate care pathway. The mea-
surement of physiological signals gives some form of quantitative or relative assessment of the state
of the human body. These signals are acquired from various kinds of sensors and transducers either
invasively or non-invasively.
These signals can be either discrete or continuous depending on the kind of care or severity
of a particular pathological condition. The processing and interpretation of physiological signals is
challenging due to the low signal-to-noise ratio (SNR) and the interdependency of the physiological
systems. The signal data obtained from the corresponding medical instruments can be copiously
noisy, and may sometimes require a significant amount of preprocessing. Several signal processing
algorithms have been developed that have significantly enhanced the understanding of the physi-
ological processes. A wide variety of methods are used for filtering, noise removal, and compact
methods [36]. More sophisticated analysis methods including dimensionality reduction techniques
such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and wavelet
transformation have also been widely investigated in the literature. A broader overview of many of
these techniques may also be found in [1, 2]. Time-series analysis methods are discussed in [37, 40].
Chapter 5 presents an overview of various signal processing techniques used for processing biomed-
ical signals.

1.2.5 Genomic Data Analysis

A significant number of diseases are genetic in nature, but the nature of the causality between
the genetic markers and the diseases has not been fully established. For example, diabetes is well

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 7

known to be a genetic disease; however, the full set of genetic markers that make an individual
prone to diabetes are unknown. In some other cases, such as the blindness caused by Stargardt
disease, the relevant genes are known but all the possible mutations have not been exhaustively
isolated. Clearly, a broader understanding of the relationships between various genetic markers,
mutations, and disease conditions has significant potential in assisting the development of various
gene therapies to cure these conditions. One will be mostly interested in understanding what kind
of health-related questions can be addressed through in-silico analysis of the genomic data through
typical data-driven studies. Moreover, translating genetic discoveries into personalized medicine
practice is a highly non-trivial task with a lot of unresolved challenges. For example, the genomic
landscapes in complex diseases such as cancers are overwhelmingly complicated, revealing a high
order of heterogeneity among different individuals. Solving these issues will be fitting a major piece
of the puzzle and it will bring the concept of personalized medicine much more closer to reality.
Recent advancements made in the biotechnologies have led to the rapid generation of large
volumes of biological and medical information and advanced genomic research. This has also led
to unprecedented opportunities and hopes for genome scale study of challenging problems in life
science. For example, advances in genomic technology made it possible to study the complete ge-
nomic landscape of healthy individuals for complex diseases [16]. Many of these research directions
have already shown promising results in terms of generating new insights into the biology of hu-
man disease and to predict the personalized response of the individual to a particular treatment.
Also, genetic data are often modeled either as sequences or as networks. Therefore, the work in
this field requires a good understanding of sequence and network mining techniques. Various data
analytics-based solutions are being developed for tackling key research problems in medicine such
as identification of disease biomarkers and therapeutic targets and prediction of clinical outcome.
More details about the fundamental computational algorithms and bioinformatics tools for genomic
data analysis along with genomic data resources are discussed in Chapter 6.

1.2.6 Clinical Text Mining

Most of the information about patients is encoded in the form of clinical notes. These notes
are typically stored in an unstructured data format and is the backbone of much of healthcare data.
These contain the clinical information from the transcription of dictations, direct entry by providers,
or use of speech recognition applications. These are perhaps the richest source of unexploited in-
formation. It is needless to say that the manual encoding of this free-text form on a broad range of
clinical information is too costly and time consuming, though it is limited to primary and secondary
diagnoses, and procedures for billing purposes. Such notes are notoriously challenging to analyze
automatically due to the complexity involved in converting clinical text that is available in free-text
to a structured format. It becomes hard mainly because of their unstructured nature, heterogeneity,
diverse formats, and varying context across different patients and practitioners.
Natural language processing (NLP) and entity extraction play an important part in inferring
useful knowledge from large volumes of clinical text to automatically encoding clinical information
in a timely manner [22]. In general, data preprocessing methods are more important in these contexts
as compared to the actual mining techniques. The processing of clinical text using NLP methods is
more challenging when compared to the processing of other texts due to the ungrammatical nature
of short and telegraphic phrases, dictations, shorthand lexicons such as abbreviations and acronyms,
and often misspelled clinical terms. All these problems will have a direct impact on the various
standard NLP tasks such as shallow or full parsing, sentence segmentation, text categorization, etc.,
thus making the clinical text processing highly challenging. A wide range of NLP methods and data
mining techniques for extracting information from the clinical text are discussed in Chapter 7.

© 2015 Taylor & Francis Group, LLC

8 Healthcare Data Analytics

1.2.7 Mining Biomedical Literature

A significant number of applications rely on evidence from the biomedical literature. The latter
is copious and has grown significantly over time. The use of text mining methods for the long-term
preservation, accessibility, and usability of digitally available resources is important in biomedical
applications relying on evidence from scientific literature. Text mining methods and tools offer novel
ways of applying new knowledge discovery methods in the biomedical field [21][20]. Such tools
offer efficient ways to search, extract, combine, analyze and summarize textual data, thus supporting
researchers in knowledge discovery and generation. One of the major challenges in biomedical text
mining is the multidisciplinary nature of the field. For example, biologists describe chemical com-
pounds using brand names, while chemists often use less ambiguous IUPAC-compliant names or
unambiguous descriptors such as International Chemical Identifiers. While the latter can be handled
with cheminformatics tools, text mining techniques are required to extract less precisely defined
entities and their relations from the literature. In this context, entity and event extraction methods
play a key role in discovering useful knowledge from unstructured databases. Because the cost
of curating such databases is too high, text mining methods offer new opportunities for their ef-
fective population, update, and integration. Text mining brings about other benefits to biomedical
research by linking textual evidence to biomedical pathways, reducing the cost of expert knowledge
validation, and generating hypotheses. The approach provides a general methodology to discover
previously unknown links and enhance the way in which biomedical knowledge is organized. More
details about the challenges and algorithms for biomedical text mining are discussed in Chapter 8.

1.2.8 Social Media Analysis

The rapid emergence of various social media resources such as social networking sites,
blogs/microblogs, forums, question answering services, and online communities provides a wealth
of information about public opinion on various aspects of healthcare. Social media data can be
mined for patterns and knowledge that can be leveraged to make useful inferences about popula-
tion health and public health monitoring. A significant amount of public health information can
be gleaned from the inputs of various participants at social media sites. Although most individ-
ual social media posts and messages contain little informational value, aggregation of millions of
such messages can generate important knowledge [4, 19]. Effectively analyzing these vast pieces of
knowledge can significantly reduce the latency in collecting such complex information.
Previous research on social media analytics for healthcare has focused on capturing aggregate
health trends such as outbreaks of infectious diseases, detecting reports of adverse drug interactions,
and improving interventional capabilities for health-related activities. Disease outbreak detection is
often strongly reflected in the content of social media and an analysis of the history of the content
provides valuable insights about disease outbreaks. Topic models are frequently used for high-level
analysis of such health-related content. An additional source of information in social media sites
is obtained from online doctor and patient communities. Since medical conditions recur across
different individuals, the online communities provide a valuable source of knowledge about various
medical conditions. A major challenge in social media analysis is that the data is often unreliable,
and therefore the results must be interpreted with caution. More discussion about the impact of
social media analytics in improving healthcare is given in Chapter 9.

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 9

1.3 Advanced Data Analytics for Healthcare

This section will discuss a number of advanced data analytics methods for healthcare. These
techniques include various data mining and machine learning models that need to be adapted to the
healthcare domain.

1.3.1 Clinical Prediction Models

Clinical prediction forms a critical component of modern-day healthcare. Several prediction
models have been extensively investigated and have been successfully deployed in clinical practice
[26]. Such models have made a tremendous impact in terms of diagnosis and treatment of diseases.
Most successful supervised learning methods that have been employed for clinical prediction tasks
fall into three categories: (i) Statistical methods such as linear regression, logistic regression, and
Bayesian models; (ii) Sophisticated methods in machine learning and data mining such as decision
trees and artificial neural networks; and (iii) Survival models that aim to predict survival outcomes.
All of these techniques focus on discovering the underlying relationship between covariate variables,
which are also known as attributes and features, and a dependent outcome variable.
The choice of the model to be used for a particular healthcare problem primarily depends on
the outcomes to be predicted. There are various kinds of prediction models that are proposed in the
literature for handling such a diverse variety of outcomes. Some of the most common outcomes in-
clude binary and continuous forms. Other less common forms are categorical and ordinal outcomes.
In addition, there are also different models proposed to handle survival outcomes where the goal
is to predict the time of occurrence of a particular event of interest. These survival models are also
widely studied in the context of clinical data analysis in terms of predicting the patient’s survival
time. There are different ways of evaluating and validating the performance of these prediction mod-
els. Different prediction models along with various kinds of evaluation mechanisms in the context
of healthcare data analytics will be discussed in Chapter 10.

1.3.2 Temporal Data Mining

Healthcare data almost always contain time information and it is inconceivable to reason and
mine these data without incorporating the temporal dimension. There are two major sources of
temporal data generated in the healthcare domain. The first is the electronic health records (EHR)
data and the second is the sensor data. Mining the temporal dimension of EHR data is extremely
promising as it may reveal patterns that enable a more precise understanding of disease manifesta-
tion, progression and response to therapy. Some of the unique characteristics of EHR data (such as
of heterogeneous, sparse, high-dimensional, irregular time intervals) makes conventional methods
inadequate to handle them. Unlike EHR data, sensor data are usually represented as numeric time
series that are regularly measured in time at a high frequency. Examples of these data are phys-
iological data obtained by monitoring the patients on a regular basis and other electrical activity
recordings such as electrocardiogram (ECG), electroencephalogram (EEG), etc. Sensor data for a
specific subject are measured over a much shorter period of time (usually several minutes to several
days) compared to the longitudinal EHR data (usually collected across the entire lifespan of the
Given the different natures of EHR data and sensor data, the choice of appropriate temporal data
mining methods for these types of data are often different. EHR data are usually mined using tem-
poral pattern mining methods, which represent data instances (e.g., patients’ records) as sequences
of discrete events (e.g., diagnosis codes, procedures, etc.) and then try to find and enumerate sta-
tistically relevant patterns that are embedded in the data. On the other hand, sensor data are often

© 2015 Taylor & Francis Group, LLC

10 Healthcare Data Analytics

analyzed using signal processing and time-series analysis techniques (e.g., wavelet transform, inde-
pendent component analysis, etc.) [37, 40]. Chapter 11 presents a detailed survey and summarizes
the literature on temporal data mining for healthcare data.

1.3.3 Visual Analytics

The ability to analyze and identify meaningful patterns in multimodal clinical data must be ad-
dressed in order to provide a better understanding of diseases and to identify patterns that could
be affecting the clinical workflow. Visual analytics provides a way to combine the strengths of hu-
man cognition with interactive interfaces and data analytics that can facilitate the exploration of
complex datasets. Visual analytics is a science that involves the integration of interactive visual
interfaces with analytical techniques to develop systems that facilitate reasoning over, and interpre-
tation of, complex data [23]. Visual analytics is popular in many aspects of healthcare data analysis
because of the wide variety of insights that such an analysis provides. Due to the rapid increase of
health-related information, it becomes critical to build effective ways of analyzing large amounts
of data by leveraging human–computer interaction and graphical interfaces. In general, providing
easily understandable summaries of complex healthcare data is useful for a human in gaining novel
In the evaluation of many diseases, clinicians are presented with datasets that often contain hun-
dreds of clinical variables. The multimodal, noisy, heterogeneous, and temporal characteristics of
the clinical data pose significant challenges to the users while synthesizing the information and ob-
taining insights from the data [24]. The amount of information being produced by healthcare organi-
zations opens up opportunities to design new interactive interfaces to explore large-scale databases,
to validate clinical data and coding techniques, and to increase transparency within different depart-
ments, hospitals, and organizations. While many of the visual methods can be directly adopted from
the data mining literature [11], a number of methods, which are specific to the healthcare domain,
have also been designed. A detailed discussion on the popular data visualization techniques used
in clinical settings and the areas in healthcare that benefit from visual analytics are discussed in
Chapter 12.

1.3.4 Clinico–Genomic Data Integration

Human diseases are inherently complex in nature and are usually governed by a complicated in-
terplay of several diverse underlying factors, including different genomic, clinical, behavioral, and
environmental factors. Clinico–pathological and genomic datasets capture the different effects of
these diverse factors in a complementary manner. It is essential to build integrative models consid-
ering both genomic and clinical variables simultaneously so that they can combine the vital infor-
mation that is present in both clinical and genomic data [27]. Such models can help in the design
of effective diagnostics, new therapeutics, and novel drugs, which will lead us one step closer to
personalized medicine [17].
This opportunity has led to an emerging area of integrative predictive models that can be built
by combining clinical and genomic data, which is called clinico–genomic data integration. Clinical
data refers to a broad category of a patient’s pathological, behavioral, demographic, familial, en-
vironmental and medication history, while genomic data refers to a patient’s genomic information
including SNPs, gene expression, protein and metabolite profiles. In most of the cases, the goal of
the integrative study is biomarker discovery which is to find the clinical and genomic factors related
to a particular disease phenotype such as cancer vs. no cancer, tumor vs. normal tissue samples, or
continuous variables such as the survival time after a particular treatment. Chapter 13 provides a
comprehensive survey of different challenges with clinico–genomic data integration along with the
different approaches that aim to address these challenges with an emphasis on biomarker discovery.

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 11

1.3.5 Information Retrieval

Although most work in healthcare data analytics focuses on mining and analyzing patient-related
data, additional information for use in this process includes scientific data and literature. The tech-
niques most commonly used to access this data include those from the field of information retrieval
(IR). IR is the field concerned with the acquisition, organization, and searching of knowledge-based
information, which is usually defined as information derived and organized from observational or
experimental research [14]. The use of IR systems has become essentially ubiquitous. It is estimated
that among individuals who use the Internet in the United States, over 80 percent have used it to
search for personal health information and virtually all physicians use the Internet.
Information retrieval models are closely related to the problems of clinical and biomedical text
mining. The basic objective of using information retrieval is to find the content that a user wanted
based on his requirements. This typically begins with the posing of a query to the IR system. A
search engine matches the query to content items through metadata. The two key components of
IR are: Indexing, which is the process of assigning metadata to the content, and retrieval, which
is the process of the user entering the query and retrieving relevant content. The most well-known
data structure used for efficient information retrieval is the inverted index where each document
is associated with an identifier. Each word then points to a list of document identifiers. This kind
of representation is particularly useful for a keyword search. Furthermore, once a search has been
conducted, mechanisms are required to rank the possibly large number of results, which might have
been retrieved. A number of user-oriented evaluations have been performed over the years looking
at users of biomedical information and measuring the search performance in clinical settings [15].
Chapter 14 discusses a number of information retrieval models for healthcare along with evaluation
of such retrieval models.

1.3.6 Privacy-Preserving Data Publishing

In the healthcare domain, the definition of privacy is commonly accepted as “a person’s right and
desire to control the disclosure of their personal health information” [25]. Patients’ health-related
data is highly sensitive because of the potentially compromising information about individual partic-
ipants. Various forms of data such as disease information or genomic information may be sensitive
for different reasons. To enable research in the field of medicine, it is often important for medical or-
ganizations to be able to share their data with statistical experts. Sharing personal health information
can bring enormous economical benefits. This naturally leads to concerns about the privacy of in-
dividuals being compromised. The data privacy problem is one of the most important challenges in
the field of healthcare data analytics. Most privacy preservation methods reduce the representation
accuracy of the data so that the identification of sensitive attributes of an individual is compromised.
This can be achieved by either perturbing the sensitive attribute, perturbing attributes that serve as
identification mechanisms, or a combination of the two. Clearly, this process required the reduction
in the accuracy of data representation. Therefore, privacy preservation almost always incurs the cost
of losing some data utility. Therefore, the goal of privacy preservation methods is to optimize the
trade-off between utility and privacy. This ensures that the amount of utility loss at a given level of
privacy is as little as possible.
The major steps in privacy-preserving data publication algorithms [5][18] are the identification
of an appropriate privacy metric and level for a given access setting and data characteristics, ap-
plication of one or multiple privacy-preserving algorithm(s) to achieve the desired privacy level,
and postanalyzing the utility of the processed data. These three steps are repeated until the desired
utility and privacy levels are jointly met. Chapter 15 focuses on applying privacy-preserving algo-
rithms to healthcare data for secondary-use data publishing and interpretation of the usefulness and
implications of the processed data.

© 2015 Taylor & Francis Group, LLC

12 Healthcare Data Analytics

1.4 Applications and Practical Systems for Healthcare

In the final set of chapters in this book, we will discuss the practical healthcare applications and
systems that heavily utilize data analytics. These topics have evolved significantly in the past few
years and are continuing to gain a lot of momentum and interest. Some of these methods, such as
fraud detection, are not directly related to medical diagnosis, but are nevertheless important in this

1.4.1 Data Analytics for Pervasive Health

Pervasive health refers to the process of tracking medical well-being and providing long-term
medical care with the use of advanced technologies such as wearable sensors. For example, wearable
monitors are often used for measuring the long-term effectiveness of various treatment mechanisms.
These methods, however, face a number of challenges, such as knowledge extraction from the large
volumes of data collected and real-time processing. However, recent advances in both hardware
and software technologies (data analytics in particular) have made such systems a reality. These
advances have made low cost intelligent health systems embedded within the home and living envi-
ronments a reality [33].
A wide variety of sensor modalities can be used when developing intelligent health systems,
including wearable and ambient sensors [28]. In the case of wearable sensors, sensors are attached
to the body or woven into garments. For example, 3-axis accelerometers distributed over an individ-
ual’s body can provide information about the orientation and movement of the corresponding body
part. In addition to these advancements in sensing modalities, there has been an increasing interest
in applying analytics techniques to data collected from such equipment. Several practical healthcare
systems have started using analytical solutions. Some examples include cognitive health monitor-
ing systems based on activity recognition, persuasive systems for motivating users to change their
health and wellness habits, and abnormal health condition detection systems. A detailed discussion
on how various analytics can be used for supporting the development of intelligent health systems
along with supporting infrastructure and applications in different healthcare domains is presented in
Chapter 16.

1.4.2 Healthcare Fraud Detection

Healthcare fraud has been one of the biggest problems faced by the United States and costs sev-
eral billions of dollars every year. With growing healthcare costs, the threat of healthcare fraud is
increasing at an alarming pace. Given the recent scrutiny of the inefficiencies in the US healthcare
system, identifying fraud has been on the forefront of the efforts towards reducing the healthcare
costs. One could analyze the healthcare claims data along different dimensions to identify fraud. The
complexity of the healthcare domain, which includes multiple sets of participants, including health-
care providers, beneficiaries (patients), and insurance companies, makes the problem of detecting
healthcare fraud equally challenging and makes it different from other domains such as credit card
fraud detection and auto insurance fraud detection. In these other domains, the methods rely on con-
structing profiles for the users based on the historical data and they typically monitor deviations in
the behavior of the user from the profile [7]. However, in healthcare fraud, such approaches are not
usually applicable, because the users in the healthcare setting are the beneficiaries, who typically are
not the fraud perpetrators. Hence, more sophisticated analysis is required in the healthcare sector to
identify fraud.
Several solutions based on data analytics have been investigated for solving the problem of
healthcare fraud. The primary advantages of data-driven fraud detection are automatic extraction

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 13

of fraud patterns and prioritization of suspicious cases [3]. Most of such analysis is performed
with respect to an episode of care, which is essentially a collection of healthcare provided to a
patient under the same health issue. Data-driven methods for healthcare fraud detection can be
employed to answer the following questions: Is a given episode of care fraudulent or unnecessary?
Is a given claim within an episode fraudulent or unnecessary? Is a provider or a network of providers
fraudulent? We discuss the problem of fraud in healthcare and existing data-driven methods for fraud
detection in Chapter 17.

1.4.3 Data Analytics for Pharmaceutical Discoveries

The cost of successful novel chemistry-based drug development often reaches millions of dol-
lars, and the time to introduce the drug to market often comes close to a decade [34]. The high failure
rate of drugs during this process, make the trial phases known as the “valley of death.” Most new
compounds fail during the FDA approval process in clinical trials or cause adverse side effects.
Interdisciplinary computational approaches that combine statistics, computer science, medicine,
chemoinformatics, and biology are becoming highly valuable for drug discovery and development.
In the context of pharmaceutical discoveries, data analytics can potentially limit the search space
and provide recommendations to the domain experts for hypothesis generation and further analysis
and experiments.
Data analytics can be used in several stages of drug discovery and development to achieve dif-
ferent goals. In this domain, one way to categorize data analytical approaches is based on their
application to pre-marketing and post-marketing stages of the drug discovery and development pro-
cess. In the pre-marketing stage, data analytics focus on discovery activities such as finding signals
that indicate relations between drugs and targets, drugs and drugs, genes and diseases, protein and
diseases, and finding biomarkers. In the post-marketing stage an important application of data an-
alytics is to find indications of adverse side effects for approved drugs. These methods provide a
list of potential drug side effect associations that can be used for further studies. Chapter 18 pro-
vides more discussion of the applications of data analytics for pharmaceutical discoveries including
drug-target interaction prediction and pharmacovigilance.

1.4.4 Clinical Decision Support Systems

Clinical Decision Support Systems (CDSS) are computer systems designed to assist clinicians
with patient-related decision making, such as diagnosis and treatment [6]. CDSS have become a
crucial component in the evaluation and improvement of patient treatment since they have shown to
improve both patient outcomes and cost of care [35]. They can help in minimizing analytical errors
by notifying the physician of potentially harmful drug interactions, and their diagnostic procedures
have been shown to enable more accurate diagnoses. Some of the main advantages of CDSS are
their ability in decision making and determining optimal treatment strategies, aiding general health
policies by estimating the clinical and economic outcomes of different treatment methods and even
estimating treatment outcomes under certain conditions. The main reason for the success of CDSS
are their electronic nature, seemless integration with clinical workflows, providing decision support
at the appropriate time/location. Two particular fields of healthcare where CDSS have been ex-
tremely influential are pharmacy and billing. CDSS can help pharmacies to look for negative drug
interactions and then report them to the corresponding patient’s ordering professional. In the billing
departments, CDSS have been used to devise treatment plans that provide an optimal balance of
patient care and financial expense [9]. A detailed survey of different aspects of CDSS along with
various challenges associated with their usage in clinical practice is discussed in Chapter 19.

© 2015 Taylor & Francis Group, LLC

14 Healthcare Data Analytics

1.4.5 Computer-Aided Diagnosis

Computer-aided diagnosis/detection (CAD) is a procedure in radiology that supports radiolo-
gists in reading medical images [13]. CAD tools in general refer to fully automated second reader
tools designed to assist the radiologist in the detection of lesions. There is a growing consensus
among clinical experts that the use of CAD tools can improve the performance of the radiologist.
The radiologist first performs an interpretation of the images as usual, while the CAD algorithms
is running in the background or has already been precomputed. Structures identified by the CAD
algorithm are then highlighted as regions of interest to the radiologist. The principal value of CAD
tools is determined not by its stand-alone performance, but rather by carefully measuring the incre-
mental value of CAD in normal clinical practice, such as the number of additional lesions detected
using CAD. Secondly, CAD systems must not have a negative impact on patient management (for
instance, false positives that cause the radiologist to recommend unnecessary biopsies and follow-
From the data analytics perspective, new CAD algorithms aim at extracting key quantitative
features, summarizing vast volumes of data, and/or enhancing the visualization of potentially ma-
lignant nodules, tumors, or lesions in medical images. The three important stages in the CAD data
processing are candidate generation (identifying suspicious regions of interest), feature extraction
(computing descriptive morphological or texture features), and classification (differentiating can-
didates that are true lesions from the rest of the candidates based on candidate feature vectors).
A detailed overview of some CAD approaches to different diseases emphasizing the specific chal-
lenges in diagnosis and detection, and a series of case studies that apply advanced data analytics in
medical imaging applications is presented in Chapter 20.

1.4.6 Mobile Imaging for Biomedical Applications

Mobile imaging refers to the application of portable computers such as smartphones or tablet
computers to store, visualize, and process images with and without connections to servers, the In-
ternet, or the cloud. Today, portable devices provide sufficient computational power for biomedical
image processing and smart devices have been introduced in the operation theater. While many tech-
niques for biomedical image acquisition will always require special equipment, the regular camera
is one of the most widely used imaging modality in hospitals. Mobile technology and smart devices,
especially smartphones, allows new ways of easier imaging at the patient’s bedside and possess the
possibility to be made into a diagnostic tool that can be used by medical professionals. Smartphones
usually contain at least one high-resolution camera that can be used for image formation. Several
challenges arise during the acquisition, visualization, analysis, and management of images in mo-
bile environments. A more detailed discussion about mobile imaging and its challenges is given in
Chapter 21.

1.5 Resources for Healthcare Data Analytics

There are several resources available in this field. We will now discuss the various books, jour-
nals, and organizations that provide further information on this exciting area of healthcare infor-
matics. A classical book in the field of healthcare informatics is [39]. There are several other books
that target a specific topic of work (in the context of healthcare) such as information retrieval [10],
statistical methods [38], evaluation methods [8], and clinical decision support systems [6, 9].
There are a few popular organizations that are primarily involved with medical informatics re-
search. They are American Medical Informatics Association (AMIA) [49], International Medical
Informatics Association (IMIA) [50], and the European Federation for Medical Informatics (EFMI)

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 15

[51]. These organizations usually conduct annual conferences and meetings that are well attended
by researchers working in healthcare informatics. The meetings typically discuss new technologies
for capturing, processing, and analyzing medical data. It is a good meeting place for new researchers
who would like to start research in this area.
The following are some of the well-reputed journals that publish top-quality research works in
healthcare data analytics: Journal of the American Medical Informatics Association (JAMIA) [41],
Journal of Biomedical Informatics (JBI) [42], Journal of Medical Internet Research [43], IEEE
Journal of Biomedical and Health Informatics [44], Medical Decision Making [45], International
Journal of Medical Informatics (IJMI) [46], and Artificial Intelligence in Medicine [47]. A more
comprehensive list of journals in the field of healthcare and biomedical informatics along with
details is available here [48].
Due to the privacy of the medical data that typically contains highly sensitive patient informa-
tion, the research work in the healthcare data analytics has been fragmented into various places.
Many researchers work with a specific hospital or a healthcare facility that are usually not willing
to share their data due to obvious privacy concerns. However, there are a wide variety of public
repositories available for researchers to design and apply their own models and algorithms. Due
to the diversity in healthcare research, it will be a cumbersome task to compile all the healthcare
repositories at a single location. Specific health data repositories dealing with a particular healthcare
problem and data sources are listed in the corresponding chapters where the data is discussed. We
hope that these repositories will be useful for both existing and upcoming researchers who do not
have access to the health data from hospitals and healthcare facilities.

1.6 Conclusions
The field of healthcare data analytics has seen significant strides in recent years because of hard-
ware and software technologies, which have increased the ease of the data collection process. The
advancement of the field has, however, faced a number of challenges because of its interdisciplinary
nature, privacy constraints in data collection and dissemination mechanisms, and the inherently un-
structured nature of the data. In some cases, the data may have very high volume, which requires
real-time analysis and insights. In some cases, the data may be complex, which may require special-
ized retrieval and analytical techniques. The advances in data collection technologies, which have
enabled the field of analytics, also pose new challenges because of their efficiency in collecting
large amounts of data. The techniques used in the healthcare domain are also very diverse because
of the inherent variations in the underlying data type. This book provides a comprehensive overview
of these different aspects of healthcare data analytics, and the various research challenges that still
need to be addressed.

[1] Charu C. Aggarwal. Data Streams: Models and Algorithms. Springer. 2007.
[2] Charu C. Aggarwal. Managing and Mining Sensor Data. Springer. 2013.
[3] Charu C. Aggarwal. Outlier Analysis. Springer. 2013.
[4] Charu C. Aggarwal. Social Network Data Analytics. Springer, 2011.

© 2015 Taylor & Francis Group, LLC

16 Healthcare Data Analytics

[5] Charu C. Aggarwal and Philip S. Yu. Privacy-Preserving Data Mining: Models and Algo-
rithms. Springer. 2008.
[6] Eta S Berner. Clinical Decision Support Systems. Springer, 2007.
[7] Richard J. Bolton, and David J. Hand. Statistical fraud detection: A review. Statistical Science,
17(3):235–249, 2002.
[8] Charles P. Friedman. Evaluation Methods in Biomedical Informatics. Springer, 2006.
[9] Robert A. Greenes. Clinical Decision Support: The Road Ahead. Academic Press, 2011.
[10] William Hersh. Information Retrieval: A Health and Biomedical Perspective. Springer, 2008.
[11] Daniel A. Keim. Information visualization and visual data mining. IEEE Transactions on Vi-
sualization and Computer Graphics, 8(1):1–8, 2002.
[12] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers. Big data:
The next frontier for innovation, competition, and productivity. McKinsey Global Institute
Report, May 2011.
[13] Kunio Doi. Computer-aided diagnosis in medical imaging: Historical review, current status
and future potential. Computerized Medical Imaging and Graphics, 31:2007.
[14] W. Hersh. Information Retrieval: A Health and Biomedical Perspective. Springer, 2009.
[15] R. B. Haynes, K. A. McKibbon, C. J. Walker, N. Ryan, D. Fitzgerald, and M. F. Ramsden.
Online access to MEDLINE in clinical settings: A study of use and usefulness. Annals of
Internal Medicine, 112(1):78–84, 1990.
[16] B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, J. Diaz, L. A., and K. W. Kinzler.
Cancer genome landscapes. Science, 339(6127):1546–1558, 2013.
[17] P. Edn, C. Ritz, C. Rose, M. Fern, and C. Peterson. Good old clinical markers have similar
power in breast cancer prognosis as microarray gene expression profilers. European Journal
of Cancer, 40(12):1837–1841, 2004.
[18] Rashid Hussain Khokhar, Rui Chen, Benjamin C.M. Fung, and Siu Man Lui. Quantifying
the costs and benefits of privacy-preserving health data publishing. Journal of Biomedical
Informatics, 50:107–121, 2014.
[19] Adam Sadilek, Henry Kautz, and Vincent Silenzio. Modeling spread of disease from social
interactions. In Proceedings of the 6th International AAAI Conference on Weblogs and Social
Media (ICWSM’12), pages 322–329, 2012.
[20] L. Jensen, J. Saric, and P. Bork. Literature mining for the biologist: From information retrieval
to biological discovery. Nature Reviews Genetics, 7(2):119–129, 2006.
[21] P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. Cohen. Frontiers of biomedical text
mining: Current progress. Briefings in Bioinformatics, 8(5):358–375, 2007.
[22] S. M. Meystre, G. K. Savova, K. C. Kipper-Schuler, and J. F. Hurdle. Extracting information
from textual documents in the electronic health record: A review of recent research. Yearbook
of Medical Informatics, pages 128–144, 2008.
[23] Daniel Keim et al. Visual Analytics: Definition, Process, and Challenges. Springer Berlin
Heidelberg, 2008.

© 2015 Taylor & Francis Group, LLC

An Introduction to Healthcare Data Analytics 17

[24] K. Wongsuphasawat, J. A. Guerra Gmez, C. Plaisant, T. D. Wang, M. Taieb-Maimon, and B.

Shneiderman. LifeFlow: Visualizing an overview of event sequences. In Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, 1747-1756. ACM, 2011.
[25] Thomas C. Rindfieisch. Privacy, information technology, and health care. Communications of
the ACM, 40(8):92–100, 1997.
[26] E. W. Steyerberg. Clinical Prediction Models. Springer, 2009.
[27] E. E. Schadt. Molecular networks as sensors and drivers of common human diseases. Nature,
461(7261):218–223, 2009.
[28] Min Chen, Sergio Gonzalez, Athanasios Vasilakos, Huasong Cao, and Victor C. Leung. Body
area networks: A survey. Mobile Networks and Applications, 16(2):171–193, April 2011.
[29] Catherine M. DesRoches et al. Electronic health records in ambulatory carea national survey
of physicians. New England Journal of Medicine 359(1):50–60, 2008.
[30] Richard Hillestad et al. Can electronic medical record systems transform health care? Potential
health benefits, savings, and costs. Health Affairs 24(5):1103–1117, 2005.
[31] Stanley R. Sternberg, Biomedical image processing. Computer 16(1):22–34, 1983.
[32] G. Acampora, D. J. Cook, P. Rashidi, A. V. Vasilakos. A survey on ambient intelligence in
healthcare, Proceedings of the IEEE, 101(12):2470–2494, Dec. 2013.
[33] U. Varshney. Pervasive healthcare and wireless health monitoring. Mobile Networks and
Applications 12(2–3):113–127, 2007.
[34] Steven M. Paul, Daniel S. Mytelka, Christopher T. Dunwiddie, Charles C. Persinger,
Bernard H. Munos, Stacy R. Lindborg, and Aaron L. Schacht. How to improve R&D pro-
ductivity: The pharmaceutical industry’s grand challenge. Nature Reviews Drug Discovery 9
(3):203–214, 2010.
[35] R. Amarasingham, L. Plantinga, M. Diener-West, D. Gaskin, and N. Powe. Clinical infor-
mation technologies and inpatient outcomes: A multiple hospital study. Archives of Internal
Medicine 169(2):108–114, 2009.
[36] Athanasios Papoulis. Signal Analysis. McGraw-Hill: New York, 1978.
[37] Robert H. Shumway and David S. Stoffer. Time-Series Analysis and Its Applications: With R
Examples. Springer: New York, 2011.
[38] Robert F. Woolson and William R. Clarke. Statistical Methods for the Analysis of Biomedical
Data, Volume 371. John Wiley & Sons, 2011.
[39] Edward H. Shortliffe and James J. Cimino. Biomedical Informatics. Springer, 2006.
[40] Mitsa Thephano. Temporal Data Mining. Chapman and Hall/CRC Press, 2010.
[41] http://jamia.bmj.com/
[42] http://www.journals.elsevier.com/journal-of-biomedical-informatics/
[43] http://www.jmir.org/
[44] http://jbhi.embs.org/

© 2015 Taylor & Francis Group, LLC

18 Healthcare Data Analytics

[45] http://mdm.sagepub.com/
[46] http://www.ijmijournal.com/
[47] http://www.journals.elsevier.com/artificial-intelligence-in-medicine/
[48] http://clinfowiki.org/wiki/index.php/Leading_Health_Informatics_and_
[49] http://www.amia.org/
[50] www.imia-medinfo.org/
[51] http://www.efmi.org/

© 2015 Taylor & Francis Group, LLC

Part I

Healthcare Data Sources and

Basic Analytics

© 2015 Taylor & Francis Group, LLC

Chapter 2
Electronic Health Records: A Survey

Rajiur Rahman
Department of Computer Science
Wayne State University
Detroit, MI
[email protected]

Chandan K. Reddy
Department of Computer Science
Wayne State University
Detroit, MI
[email protected]

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 History of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Components of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Administrative System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Laboratory System Components & Vital Signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.3 Radiology System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Pharmacy System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.5 Computerized Physician Order Entry (CPOE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.6 Clinical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Coding Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 International Classification of Diseases (ICD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 ICD-9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 ICD-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 ICD-11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Current Procedural Terminology (CPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.3 Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) . . 32
2.4.4 Logical Observation Identifiers Names and Codes (LOINC) . . . . . . . . . . . . . . . . 33
2.4.5 RxNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.6 International Classification of Functioning, Disability, and Health (ICF) . . . . 35
2.4.7 Diagnosis-Related Groups (DRG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.8 Unified Medical Language System (UMLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.9 Digital Imaging and Communications in Medicine (DICOM) . . . . . . . . . . . . . . 38
2.5 Benefits of EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Enhanced Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.2 Averted Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.3 Additional Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Barriers to Adopting EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7 Challenges of Using EHR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8 Phenotyping Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

© 2015 Taylor & Francis Group, LLC
22 Healthcare Data Analytics

2.1 Introduction
An Electronic Health Record (EHR) is a digital version of a patient’s medical history. It is a
longitudinal record of patient health information generated by one or several encounters in any
healthcare providing setting. The term is often used interchangeably with EMR (Electronic Med-
ical Record) and CPR (Computer-based Patient Record). It encompasses a full range of data rel-
evant to a patient’s care such as demographics, problems, medications, physician’s observations,
vital signs, medical history, immunizations, laboratory data, radiology reports, personal statistics,
progress notes, and billing data. The EHR system automates the data management process of com-
plex clinical environments and has the potential to streamline the clinician’s workflow. It can gener-
ate a complete record of a patient’s clinical encounter, and support other care-related activities such
as evidence-based decision support, quality management, and outcomes reporting. An EHR sys-
tem integrates data for different purposes. It enables the administrator to utilize the data for billing
purposes, the physician to analyze patient diagnostics information and treatment effectiveness, the
nurse to report adverse conditions, and the researcher to discover new knowledge.
EHR has several advantages over paper-based systems. Storage and retrieval of data is obviously
more efficient using EHRs. It helps to improve quality and convenience of patient care, increase
patient participation in the healthcare process, improve accuracy of diagnoses and health outcomes,
and improve care coordination. It also reduces cost by eliminating the need for paper and other
storage media. It provides the opportunity for research in different disciplines. In 2011, 54% of
physicians had adopted an EHR system, and about three-quarters of adopters reported that using an
EHR system resulted in enhanced patient care [1].
Usually, EHR is maintained within an institution, such as a hospital, clinic, or physician’s office.
An institution will contain the longitudinal records of a particular patient that have been collected
at their end. The institution will not contain the records of all the care provided to the patient at
other venues. Information regarding the general population may be kept in a nationwide or regional
health information system. Depending on the goal, service, venue, and role of the user, EHR can
have different data formats, presentations, and level of detail.
The remainder of this chapter is organized as follows. Section 2.2 discusses a brief history
of EHR development and Section 2.3 provides the components of EHRs. Section 2.4 presents a
comprehensive review of existing coding systems in EHR. The benefits of using EHRs are explained
in more detail in Section 2.5, while the barriers for the widespread adoption of EHRs are discussed
in Section 2.6. Section 2.7 briefly explains some of the challenges of using EHR data. The prominent
phenotyping algorithms are described in Section 2.8 and our discussion is concluded in Section 2.9.

2.2 History of EHR

The first known medical record can be traced back to the fifth century B.C. when Hippocrates
prescribed two goals for medical records [2]:
• A medical record should accurately reflect the course of disease.
• A medical record should indicate the probable cause of disease.
Although these two goals are still appropriate, EHR has a lot more to offer. Modern EHR can
provide additional functionalities that could not be performed using paper-based systems.

© 2015 Taylor & Francis Group, LLC

Another random document with
no related content on Scribd:
[49] Cong. Globe, p. 3, 1st Sess. 35th Cong., p. 2,977.
[50] “The American Irish,” pp. 70-1-3.
[51] Cong. Globe, Appendix, 1st Sess., 35th Cong., p. 430.
[52] “Father Curley tells me that John C. Calhoun used to come
to the College to talk philosophy with old Father Dzierozynsky.”
Extract from a letter of the late Father J. S. Sumner, of
Georgetown College, to the author.
On Christmas Day, 1858, having been elected Sheriff of the City
and County of New York, November 2d of that year, Mr. Kelly
resigned his seat in the Thirty-fifth Congress. He remained in
Washington at his post until it was necessary to go to New York to
enter upon his new office; but in refreshing contrast to those
Representatives in a subsequent Congress, the Forty-second, who
voted themselves back-pay, he declined, after his election as Sheriff,
to draw any salary at all for his service as a member of Congress.
The total number of votes cast at the election for Sheriff was
69,088, of which John Kelly received 39,090, and William H.
Albertson received 29,837, scattering 161. Kelly was the regular
nominee of the Democratic party of the city. His majority was 9,092.
He entered with characteristic energy upon the duties of Sheriff,
that most ancient of county officers known to the common law, Vice-
comes to the Earl, as Blackstone calls him. The difficulties and
responsibilities of this office in New York are peculiarly great. The
reported cases upon Sheriff’s law in that city indicate the immense
number of statutes applicable to the office, and the subtleties,
refinements, and nice legal distinctions, together with the liabilities,
which constantly press upon the Sheriff in the discharge of his
duties. As laymen nearly always have been elected to the office, it
was the rule, before Kelly’s term, for incumbents to rely for guidance
upon legal advisers and prompters behind the scenes, whose special
knowledge of business was supplemented by professional knowledge
of law, and by training and experience in the office. But John Kelly
set resolutely to work with his law books, for it is one of the leading
traits of his character to perform conscientiously whatever duties are
imposed upon him, and he was determined to delegate to no one
else a labor which the people had elected him to do himself. While
he was in the office the Under-Sheriff ceased to be the High-Sheriff.
After reading one or two good elementary books, he next applied
himself to the Code of Procedure, the Revised Statutes, and
Reported Cases, and wrote out a syllabus, or private digest for
himself, of opinions delivered in the lower Courts and the Court of
Appeals in relation to Sheriff’s law. To master such questions he
worked with unflagging zeal, not only by day but far into the night,
during the greater part of his term. In the meantime he acquired
familiarity with the routine and usages of the office. Thus equipped,
he was perhaps the first Sheriff who thoroughly understood the
duties of the office, and discharged them in person. He became a
favorite among the members of the bar, and was an authority,
theoretically and practically, upon disputed questions of Sheriff’s law.
In the Sheriff’s Court Mr. Kelly himself presided over the intelligent
juries there empanelled. He heard arguments of counsel, passed
upon authorities cited, was conversant in the law applicable to
cases, and in the opinion of leading members of the profession he
displayed a judicial mind of high order.
The best body of jurors in the United States is undoubtedly the
Sheriff’s Jury in New York city. The members of this jury are chosen
annually by an eminent Commission of judicial and other high
officers, and are selected from among the foremost citizens in the
community, whose wealth, intelligence, and established character
afford a guarantee of their freedom from improper influences. Large
fines for absence are imposed, and cheerfully paid. An annual
banquet, known of all men, ubique gentium, as the Sheriff’s Jury’s
Dinner, is provided for with the ample sum thus accumulated.
Delmonico’s choicest menu is laid under requisition, and a
distinguished and brilliant company is always brought together.
That accomplished and discerning gentleman, Mr. Rosewell G.
Rolston, President of the Farmers’ Loan and Trust Company of New
York, was one of the members of the Sheriff’s Jury during Mr. Kelly’s
term. He once expressed to the writer of these pages his high
respect for the Sheriff, and descanted upon his sturdy qualities,
saying, that while he was a stern and austere man to look at, he
was, nevertheless, brimful of kindly human nature. After mentioning
some occurrences which had come under his own observation, he
said, with no little earnestness, “John Kelly is a love of a man, a
grand fellow undoubtedly.”
Under-Sheriffs had presided at the trial of Sheriff’s cases before
Mr. Kelly’s entry into the office. The Jury was surprised now to see
the usual rule broken, and the new Sheriff going upon the bench
himself. The more experienced members gave each other a smile of
astonishment and a knowing wink, for they suspected that Kelly was
led away by zeal, and by ignorance of the mysteries of the law, into
whose knotty labyrinths he would be plunged presently by wrangling
lawyers. But Mr. Rolston and his fellow-jurors quickly discovered that
the imperturbable Sheriff behaved like a veteran under legal fire,
and the lawyers themselves were surprised to find him not only
familiar with questions at issue, both of traverse and demurrer, but
practically master of the situation. He had broken the precedent, and
what had been before a fiction was now a fact, a Sheriff of New York
who knew more about his office than any of his subordinates. John
Kelly made a reputation for honesty and capacity as Sheriff, which in
the whole history of the office has never been excelled by any man
who has occupied it. The best evidence of this is found in the fact
that at the earliest moment when he was eligible under the
Constitution of the State, namely, at the expiration of the term of
Sheriff Lynch, his immediate successor, John Kelly was renominated
and re-elected Sheriff of New York. He is the only man since the
foundation of the Government who has been elected twice to this
important office. In the early day, before the Hamiltonian or
monarchical features of the State Constitution had been abolished,
and the Jeffersonian or elective principle had been substituted for
them by constitutional amendment, the Governor and Council held
the appointment, not only of judicial and other great officers, a most
fruitful source of corruption and centralization, but they were
likewise clothed with the power to appoint Sheriffs and County
Clerks in the several counties of the State. But twice only, in the
early history of the State, did the Council of Appointment at Albany
select the same men to fill a second term as Sheriff of the city and
county of New York. Marinus Willett was appointed Sheriff of New
York in 1784, and served until 1787. He was re-appointed in 1791,
and held until 1795. Benjamin Ferris also held the office by
appointment from 1808 to 1810, and again from 1811 to 1813. On
the 6th of November, 1864, John Kelly, who had filled the office so
faithfully from 1859 to 1861, was re-elected Sheriff of New York, an
unprecedented honor, as well as endorsement of his official integrity,
now bestowed for the first time in the history of the city, by the
people themselves, upon any individual.
At this election there were three candidates in the field, two
Democrats and a Republican, but after an exciting canvass John
Kelly led the poll by a plurality of nearly 6,000, his Republican
competitor coming next. The whole number of votes for Sheriff was
106,707, of which Kelly received 42,022, John W. Farmer 36,477, and
Michael Connolly, commonly called the “Big Judge,” 28,099. The
number of scattering votes was 109. Mr. Kelly’s second term expired
December 31, 1867. That it was a repetition of the first one in his
fidelity to the important interests and duties confided to his charge,
was universally declared at the time, without one whisper of dissent.
In the fierce conflicts of party fifteen years after his first term as
Sheriff, and seven years after the second, when his talents and
commanding position in the community had made him a formidable
antagonist, John Kelly’s official integrity as Sheriff was called in
question for the first time by certain political opponents, whose
misconduct he had exposed, and whose arbitrary acts he had
resisted. These tardy shafts of malice fell harmless at his feet.
In the year 1868, eleven months after he had ceased to be Sheriff
a second time, a still handsomer testimonial to the stainlessness of
his character was tendered to him than that implied in his re-election
as Sheriff; an emphatic endorsement of his qualifications for the
highest civic preferment was received by him when the Democratic
Union of New York nominated him for Mayor of the city against A.
Oakey Hall, the candidate of the Tweed Ring. In a laudable and
patriotic attempt to drive the Ring from power at the Charter
election of November, 1868, New York’s best citizens,—merchants,
bankers, tradesmen, mechanics, and members of the various
professions, turned to John Kelly to lead them, to the man whose
admirable administration of the trusts he had previously held as
Alderman, Congressman, and Sheriff, afforded satisfactory proof of
his fitness to grapple with the Ring, and if elected, to crush it, and
restore honesty and economy in the various municipal offices.
Among those who looked to Mr. Kelly at this interesting and critical
hour in the history of New York, as a safe leader against the
notorious triumvirate of Tweed, Sweeny and Connolly, were Samuel
J. Tilden, Andrew H. Green, Augustus Schell, and still another—tell it
not in Gath! mention it not in the streets of Ascalon! for it is
surprising to relate—Nelson J. Waterbury himself. Yes, in the very
next year after John Kelly had ceased to be Sheriff, this gentleman,
who has since lavished so much savage abuse upon him for mythical
misdeeds as Sheriff, the self-same Nelson J. Waterbury was an
enthusiastic supporter of John Kelly for Mayor of New York.
The support which Mr. Tilden was disposed to bestow upon Mr.
Kelly was a more important incident of that eventful campaign. For a
long time they had been intimate acquaintances, and Tilden not only
looked upon Kelly as a man of invincible honesty, but recognized in
him a born leader of men. It was a most unfortunate thing that Mr.
Kelly’s health, at this particular juncture, was so much impaired that
it was not possible for him to stand the strain of such a contest, or,
indeed, of any contest at all. The blackest chapter in the history of
New York was about to be written. He felt the magnitude of the
occasion, and rose from a sick bed to go meet the people half way,
when they called him to lead them in the fight. No personal sacrifice
could be too great, not even life itself, when the stakes were the
reformation of the public service, and the rescue of a million people
from the corrupt domination of such a Ring. “You will never live to
reach the army,” said Voltaire to the feeble and emaciated Mareschal
de Saxe, as the leader was setting out for Fontenoy. “The object
now,” replied the fiery commander, “is not to live, but to go.” But Mr.
Kelly, however willing to act his part, soon found that nature’s
barriers are not to be overcome. The hand which had rejoiced in its
strength was relaxed and powerless under wasting illness, and like
that of Old Priam, telumque imbelle, no longer could strike an
effectual blow. He was, indeed, destined to smite the Tweed Ring a
death-blow, but not now, nor until four years had come and gone,
when, with health restored, and energies all on fire, he drove them
from Tammany Hall, and inscribed his name among the benefactors
of New York. He lived, like Saxe, to fight and win his Fontenoy.
From early life Mr. Kelly had suffered from bronchial troubles,
which always were increased by public speaking. His mind is
intensely active. “I must be occupied in some way,” he once said to a
friend, “and I can’t sit still five minutes without doing something. I
cannot be an idler.”[53] Whatever he undertook to do, his faculties
became concentrated upon the task until it was accomplished. His
occupations for a long time had been engrossing and laborious, and
his health had suffered under the strain. “For twenty years,” to
repeat the remark of the editor of the Utica Observer, quoted in a
preceding chapter of this volume, “he had devoted several hours of
every day to the pursuit of literature and science,” and at length his
constitution was seriously impaired. Domestic afflictions also came
upon him about this period, and his physical maladies were
increased fourfold.
John Kelly had entered into wedlock when a very young man, and
for twenty years his circle of domesticity was unclouded by a single
shadow. His wife, nèe McIlhargy, was the daughter of an Irish
adopted citizen of New York, and an interesting family, a son and
two daughters, grew up to the verge of manhood and womanhood
about him. Mrs. Kelly, whom the present writer knew well, and
greatly respected for the excellent but unostentatious qualities of her
character, was a good wife, a devoted mother and a pious Christian
woman. In the year 1866 she fell a victim to consumption. Her son
Hugh, a bright and winning young man, just as he had turned his
twenty-first year, succumbed to the same disease, and followed his
mother to the grave. Symptoms of consumption also appeared in the
daughters, and it was evident that death had marked them both for
its early victims. To a man of John Kelly’s strongly affectionate
nature, wrapped up in his home and family, these visitations falling
upon him like unmerciful disasters, one after another in quick
succession, proved well nigh irreparable. His health already
impaired, gave way entirely, and his friends were seriously
apprehensive of his own early demise.
It was in the midst of these afflictions that he was nominated for
Mayor against A. Oakey Hall. He was placed in nomination by the
Democratic Union, which held its convention at Masonic Hall,
November 18, 1868, and he received on the first ballot 240 votes, to
51 for John W. Chanler, and 1 each for John McKeon and Fernando
Wood. On the second ballot John Kelly received every vote in the
convention, and was declared the unanimous nominee for Mayor. A
committee was appointed by the chair, Mr. Roswell D. Hatch, to
notify Mr. Kelly of his nomination, and to invite him before the
convention. The chairman of this committee was Mr. Nelson J.
Waterbury. After some time Mr. Kelly entered the hall escorted by Mr.
Waterbury, by whom he was presented to the convention in
appropriate terms, as the reform candidate for Mayor.
He was warmly received, and made a brief speech, vigorously
denouncing the Tweed and Sweeny Ring, which had usurped control
of Tammany Hall. He referred in terms of praise to those honest
Democrats, many of whom he saw before him, who formerly like
himself had been identified with the Wigwam, but who had retired
from it in disgust, as he himself had done when the Ring obtained
control. “I see many gentlemen in this convention,” said Mr. Kelly,
“who formerly were associated with me in Tammany Hall, and who
felt the same grievances there which I myself have experienced. I
have no desire for this nomination, but while I have not sought it, I
will only say this, I shall stand by those who have so generously
nominated me for Mayor, and if elected, I will discharge the duties of
the office honestly and faithfully. In accepting your nomination I fully
realize that both yourselves and myself will have to work strenuously
against the corrupt men opposing us, if we expect to secure victory.
But by working together in good faith we can succeed, for the
people of New York feel the importance of the contest, and the
necessity of putting down the bad men who have obtained control of
the city government. I accept your nomination, and if elected will do
the best in my power to realize all your legitimate expectations.”[54]
Abram R. Lawrence was nominated for Corporation Counsel. The
candidacy of Mr. Kelly greatly alarmed the Ring leaders and their
Republican allies. The latter sought to control the Republican
convention which was held the next day, and force through a
straight Republican ticket for Mayor and Corporation Counsel, as the
most effective way to secure the election of A. Oakey Hall. But
fortunately there was a reform element among the Republicans, as
well as among the Democrats, and the opponents of the Ring were
in a majority in the Republican city convention. That excellent
citizen, Mr. Sinclair Tousey, was President of this convention. The
main struggle was between those who favored the endorsement of
John Kelly for Mayor, and, therefore, wished the convention to
adjourn over, and those who advocated the prompt nomination of a
straight Republican ticket. The latter class was led by Charles S.
Spencer, who vehemently demanded immediate action. But the
opponents of Spencer prevailed, and secured an adjournment to the
following Monday. “It was understood,” remarked the Herald of
November 20th, “that the party of compromise was engaged in
fixing up quite a neat little arrangement, by which the Republicans
would endorse the nomination of John Kelly for Mayor, in
consideration of having Mr. Shaw substituted for Mr. Lawrence as
candidate for Corporation Counsel. The compromisers gave out that
Spencer and the party of action were simply acting in the interest of
Tammany Hall in endeavoring to have the Republican convention
make regular nominations.”
In this campaign the Herald opposed John Kelly, and championed
A. Oakey Hall for Mayor. This was not evidence of any complicity on
the part of that paper in the misconduct of the Ring, for in 1868
there was no positive proof in possession of the public of the
criminality of the Ring, and hence the Herald or any other journal
was not justly obnoxious to unfavorable criticism at that early day in
the history of the plunderers for advocating the election of Hall. “The
Ring,” says Mr. Tilden in his history of its overthrow, “became
completely organized and matured on the 1st of January, 1869,
when Mr. A. Oakey Hall became Mayor. Its duration was through
1869, 1870 and 1871.”[55]
The morning after Mr. Kelly’s nomination the Herald declared for A.
Oakey Hall and against Kelly, in one of those plausible leading
articles by which it has so long and so remarkably influenced public
opinion for or against men and measures. The reference to Mr. Kelly
as a nabob was an adroit campaign stroke, and although he was
living quite unostentatiously in a modest three-story brick house at
the corner of 38th Street and Lexington Avenue, an impression was
created that he was surrounded by princely opulence, in the
fashionable quarter among the millionaires. The Herald editorial was
as follows:
“John Kelly is a good citizen and a respectable man; but he has
already been elected by the Tammany Democracy, to which he owes
all his past political favors, to the offices of Councilman, Alderman,
member of Congress, and twice to the valuable position of Sheriff of
New York, being the only man, we believe, who has held that
lucrative office a second term. John Kelly was brought up a lad in
the Herald office, when he first came to New York, and was well
brought up; but he went into politics in spite of his early training. We
supported him for office while he was poor and lived in the locality
of the Fourteenth Ward. Now that he has made himself a millionaire,
and lives like a nabob in the high locality of one of the most
fashionable avenues of uppertendom, we think he should be
satisfied, and give place to others who have not enjoyed such good
“If the Democrats nominate A. Oakey Hall, as it is said they will,
as their candidate for Mayor, he will no doubt be elected by a large
majority. He will suit those who take a pride in the dignity of the city,
because he is a man of superior ability, a profound thinker, an
eloquent talker, and understands thoroughly the details of the
municipal government.”[56]
The Ring men got thoroughly frightened after the adjournment of
the Republican City Convention without a nomination, for it was
becoming quite clear that independent citizens, both outside and
inside of the respective political parties, meant to support Mr. Kelly
for Mayor against the Ring candidate. This state of things caused the
Herald to discard special pleading respecting the “nabobs of
uppertendom,” and to redouble its attacks on Kelly. He was now
denounced as a deserter for having retired from Tammany Hall, and
joined the opponents of William M. Tweed. “The fight,” said the
Herald, “is to be made against the Democratic organization with the
object of breaking down Tammany, and thus giving the death-blow
to the regular Democracy in its stronghold. The Tribune, Times and
World are co-laborers in this work—the two former openly, and the
latter in an underhanded but not less vindictive manner. They are
preparing to unite on John Kelly, who has deserted the Democratic
organization for the purpose of leading the Republican forces in the
battle. District Attorney A. Oakey Hall will be the Democratic
nominee, and will no doubt be elected; but it will be one of the
greatest fights we have ever had over a Charter election, as the
breaking down of the Democratic organization at this end of the
State would be the death-blow of the party, and is therefore a stake
worth playing for by the Republicans, who feel the loss of power in
New York very severely.”[57]
Against this pretended but sham regularity, not only Mr. Kelly, but
Mr. Tilden also revolted. “Weighty pressure,” says Tilden, “was
brought on me from powerful men all over the State to ‘save the
party.’ I denied that the system of organization then in use in the city
had any moral right to be considered regular, or to bind the
Democratic masses. I told the State Convention that I felt it to be
my duty to oppose any man who would not go for making the
government of this city what it ought to be, at whatever cost, at
whatever sacrifice. If they did not deem that ‘regular,’ I would resign
as chairman of the State Committee.”[58]
The exertion made by Mr. Kelly in leaving a sick bed to go before
the Democratic Union City Convention to accept its nomination for
Mayor, increased the illness from which he suffered. His physician
called eminent doctors into consultation, and it was the opinion of
them all that his continuance in active political movements would
have a fatal result. This professional decision was communicated to
Mr. Kelly by that eminent physician, the late Dr. Marion Sims. Thus
admonished that the excitement of the campaign would kill him, Mr.
Kelly, on the 27th of November, reluctantly sent in his withdrawal
from the Mayoralty contest to the Executive Committee of the
Democratic Union, and the vacancy was filled by the nomination of
Mr. Frederick A. Conkling.
Mr. Kelly, who was a sufferer from insomnia, soon after sailed with
his two daughters for Europe. He made an extended tour in Europe,
Asia and Africa, visiting, among other places, the Holy Land. He first
went to Ireland as a pilgrim would return to the home of his fathers,
spending some time in the beautiful Island of Saints, where
Christianity made its only bloodless conquest in the world. During
fourteen hundred years, while other Christian nations have rushed
back into infidelity and again become Christian, Ireland has never
lapsed into infidelity, nor into a scoffing, Godless philosophy, the
invariable accompaniment of unbelief and paganism. After visiting
the various capitals of Europe,—London, Paris, Vienna, Berlin,
Madrid, St. Petersburg, and other places, he repaired to Rome, the
city of the soul, the Niobe of nations, shrine of saints and martyrs, of
doctors and confessors, where he spent a considerable period in rest
and retirement, and in viewing its wonderful ruins, monuments, and
churches. Repairing to Holy Land, Mr. Kelly remained for some time
at Jerusalem, the cradle of Christianity; which Titus, in fulfilment of
prophecy, left not a stone upon a stone of; where Christ had walked
about among the people, and where He died upon Calvary.
In contemplating scenes associated with the earthly life and death
of the Redeemer, the traveler no doubt derived comfort in his own
bereavements, dignified by such a fellowship of suffering as was
there. What a lesson of humility the ignominious Cross must have
preached to his reflective mind. He was leading a contemplative life,
and his letters at this period dwell much upon the Mount of Olives,
the Way of the Cross, and the Holy Sepulchre. He had read
somewhere in allegory of the contest in which the trees of the forest
are represented as debating among themselves who should be their
king. Had the contest occurred in the days of the Redeemer, small
chance the ignoble tree of the Cross would have had to win the
crown. Mr. Kelly had read Cardinal Wiseman’s beautiful thoughts on
the subject. “Apply the allegory,” said he once in a circle of his
friends, “and let us enter some forest of Judea filled with stately
trees, lofty, tapering pine, and royal cedar, and hear the proud
possessor give orders as to how their worth should be realized into
wealth. He says to the forester: ‘See that elegant and towering tree
which has reached the maturity of its growth, how nobly will it rise
above the splendid galley and bear itself in the fell fury of the wind,
without breaking or bending, and carry the riches of the earth from
one flourishing port to another. Cut it down and destine it for this
noble work. And this magnificent cedar, overcasting all around it with
the solemnity of its shade, worthy to have been built by Solomon
into the temple of God, such that David might have sung its praises
on his inspired lyre; let it be carefully and brilliantly polished, and
embarked to send to the imperial city, there to adorn those
magnificent halls, in which all the splendor of Rome is gathered; and
there, richly gilded and adorned, it shall be an object of admiration
for ages to come.’ ‘It is well, my lord,’ replies his servant, ‘but this
strange, this worthless tree, which seems presumptuously to spring
up, beneath the shadow of those splendid shafts, what shall we do
with it? it is fitted for no great, no noble work.’ ‘Cut it down, and, if
of no other use, why, it will make a cross for the first malefactor!’”
Strange counsels of men! The soaring pine dashed the freight that
it bore against the rocks, and rolled a wreck upon the beach. The
noble cedar witnessed the revels of imperial Rome, and fell by the
earthquake, or in the fire kindled by the barbarians, charred into
ashes. But that ignoble tree, spurned by proud man and put to the
most ignominious of uses, bore the price of the world’s redemption
upon Calvary, its every fragment has been gathered up, and
treasured and enshrined, and in every age it has been considered
worth all that the world dotes on, and sets its heart on. An Empress
crossed the seas and searched among the tombs of the dead for
that material wood of the Cross of Christ. For that holy rood was
built a magnificent church on Mount Sion. For it the Emperor
Heraclius made war on the King of Persia; and when he had
recovered it, bore it as his Master had borne it before, barefoot and
in humble garb to Calvary. For that tree Constantine the Great built a
noble church, yet standing among the ruins of the palaces of Rome,
and brought the very earth from the Savior’s own land, as though
none were worthy to be there save that upon which had first fallen
the precious blood of redemption. For eighteen hundred years this
relic has been the most priceless treasure of Christians. Its smallest
fragment has been enshrined and vestured in gold and precious
stones, and housed and sheltered in magnificent temples piled up
with the richest materials and noblest productions of art. The
ignoble tree which the world despised has conquered the world
Mr. Kelly’s correspondence at this time made it apparent that he
had ceased to feel interest in the busy trifles of politicians, and that
his thoughts were directed to problems of the moral world, to
reveries upon the mysteries of redemption, like that outlined in the
preceding allegory upon the Cross, and to the works of mercy, both
spiritual and corporal. He brought back from Palestine souvenirs and
patristic relics of much interest. He had familiarized himself with the
topography of the hallowed scenes of Holy Land, and those who
have heard him describe them and relate the history and traditions
connected with them, have been struck with his reverence as a
narrator, as well as with his closeness as an observer of manners,
customs and places. While he was abroad unfounded rumors
reached New York that John Kelly had withdrawn from the world, in
order to spend the remainder of his days in monastic retirement.
Perhaps this story originated from the circumstance that he travelled
much in the company of clergymen in Europe. Vicar-General Quinn
of New York was his companion on the Continent. The late Bishop
McGill of Richmond, Virginia, a man of ascetic tastes and profound
learning, often shared Mr. Kelly’s carriage in the latter’s drives about
Rome. Another thing which may have given color to the rumor was
the fact that Mr. Kelly had educated, and was still educating, many
young men for the ecclesiastical state, not only American youths, but
those of Irish and German and Swiss nationalities. While he was in
Switzerland his attention was directed by his daughters to a pious
little boy, the son of a poor gardener, who with another boy of
wealthy parentage, served at the altar every morning. The wealthy
man’s son soon departed for the University, when Mr. Kelly sent for
the son of the gardener, and finding that he wished to become a
religious, told him that he would afford him the means to carry out
his purpose, and amid the grateful tears and prayers of the boy’s
parents, he sent him to a renowned German University, and defrayed
all his expenses until he was graduated. That boy has since become
a learned scholar and minister at the altar. While Mr. Kelly was in
Rome he became warmly interested in the American College, a noble
seat of learning in that city for the training of young ecclesiastics for
the American Missions, and he generously established a bursary in
the College. He gave to its President, Dr. Chatard, who since has
been raised to the Episcopate, five thousand dollars for the
maintenance of this charitable Kelly foundation. It reflected no credit
upon the managers of the New York Cooper Institute meeting, held
in 1884, to denounce the spoliation of the Propaganda, of which the
American College at Rome is a part, to have omitted one of its
benefactors, and so prominent a representative man as John Kelly,
from the list of the officers and speakers of that meeting. Those
managers were then burning incense to Monsignor Capel, a clerical
gentleman of know—ledge, not knowledge, who thinks American
Catholics are too illiterate yet awhile to aspire to a University.
The beautiful pictures in stained glass, which adorn the windows
of St. Patrick’s Cathedral in New York, are, with the exception of the
examples in the French Cathedral in Chartres, perhaps unsurpassed
in modern times, as figured scenes from the Scriptures and lives of
the saints. In this pictorial religious epic is a beautiful window placed
there by John Kelly in memory of his lost ones, or more correctly of
those members of his family who have been called to the better life.
“Before quitting the Sanctuary,” says the writer of a pamphlet
descriptive of the exterior and interior of the Cathedral, “we will
bend our steps towards the Lady Chapel. The window in the first bay
represents the Presentation of the Blessed Virgin in the Temple. The
high priest, in gorgeous vesture, advances to receive the child, while
St. Joachim and St. Anne modestly remain standing behind. The
friends of the family are assembled to witness the ceremony. This
bears the inscription, ‘John Kelly—in memoriam.’”[59]
Some years before the completion of the new Cathedral, and while
Mr. Kelly was in Rome, he gave an order to a celebrated artist in that
city of art treasures to execute for him four great oil paintings
representing the Baptism of our Lord, the Marriage feast of Cana,
the Return of the Prodigal Son, and St. Patrick preaching at Tara. He
afterwards embraced two additional scenes from sacred history in
his scheme, the Ascension of Our Lord, and the Assumption of the
Blessed Virgin. The artist, Galliardi, produced a noble work after the
best masters. These six magnificent paintings were sent from Rome
to America as a present from Mr. Kelly to St. Patrick’s Cathedral, and
are the only paintings in canvas upon the walls of that grand church.
When he was in England he visited a region inhabited almost
entirely by miners—English, Irish and Welsh. Those people were, to
a great extent, ignorant of the truths of Christianity, and there were
no facilities in the wild mountain region they inhabited to improve
their moral condition. Working in the mines day and night, and
constantly exposed to death in the midst of their subterranean toil,
these poor people appealed to friends at a distance to send them a
clergyman to minister to their spiritual wants. The appeal was
answered, and the Reverend Mr. Dealy arrived there to open a
mission a short time before Mr. Kelly visited that part of England.
The clergyman found himself destitute of every worldly appliance for
a proper ministration of the functions of his spiritual office, no
church, no school-house, no charitable home or asylum for the sick
and helpless, all things, in a word, wanting, and no adequate means
to provide them. He was an excellent and zealous man, and he
stated his situation, and the necessities of the people to Mr. Kelly. He
told him that if he had the money to build a church and school-
house, incalculable good might be done. He poured his story into
sympathetic ears. Help was promised, and faithfully was the promise
kept. Mr. Dealy some time after, upon Mr. Kelly’s invitation, set sail
for America, and took up his residence in the latter’s house. When
Mr. Kelly reached home he organized a movement among those of
his immediate friends, whose opulence and charity admitted of the
appeal, and in the course of a few months Mr. Dealy, as he informed
the writer of these pages, was the fortunate possessor of a purse of
over twelve thousand dollars, inclusive of Mr. Kelly’s own handsome
donation. Those poor miners in England soon had their church, and
a school for their children, and their pastor had reason to bless the
day when he first made the acquaintance of the subject of this
After John Kelly had re-entered the field of politics, and even
when immersed in public affairs, his charity and philanthropy
continued to be the controlling principles of his conduct. During the
past five or six years he has been a frequent lecturer in various cities
of the Union. His lectures, respectively upon the Sisters of Charity,
the Early Jesuit Missionaries in North America, and upon the Irish
Settlers in North and South America, were replete with historical
information and sound practical instruction, and wherever he
appeared on the platform as a lecturer he always drew crowded
houses. Mr. Kelly realized from his lectures, which he delivered
repeatedly in the North, South and West, over fifty thousand dollars,
and this immense sum he gave in charity to educate and clothe the
poor, to build schools, or to lift the burden of debt from charitable
institutions. His heart was in his work. He would not allow one penny
of the proceeds of his lectures to be diverted from the sweet uses of
charity for his traveling expenses, but in every instance, wherever he
went to lecture, he insisted on paying his railroad fare, and hotel
bills, out of his own pocket.
Bagenal, the London traducer of the American Irish, with
unblushing mendacity, classes John Kelly as a leader of “shoulder-
hitters and ballot-stuffers,” and ignorantly accuses him of being an
enemy of Irish colonization in the West. The simple truth is that Kelly
is one of the originators and prime leaders in the movement to get
poor emigrants out of the overcrowded Eastern cities, and has
contributed thousands of dollars to make their colonization in the
West a success.
Dr. Ireland, Bishop of St. Paul, Minnesota, one of the great
pioneers in this benign scheme, while speaking kindly of Mr. Bagenal
in a letter to the present writer, still shows how erroneous he is in
his strictures upon Mr. Kelly. The Bishop’s comment upon Bagenal, is
as follows: “He is mistaken, of course, in his remarks about Mr. John
Kelly. But I do not think he will be sorry to be set right. He mixes up
Mr. Kelly with the average politicians of New York—not knowing, as I
know, Mr. Kelly’s exceptional qualities, his sterling honesty, his true
love for his fellow-Irishmen, and his general nobility of character.”[60]
When he retired from politics in 1868, Mr. Kelly had resolved to
enter upon that field no more. Chastened by domestic affliction, and
loss of health, the plan of his life was changed. Public station had
lost its charm for him. To feed the hungry, clothe the naked, and
open the doors of colleges, or advanced schools, to those whose
talents were good, but who were too poor to gain admittance, these
things afforded to him his greatest pleasure. He sought out the
companionship of holy men, and of holy books. Thomas à Kempis
became his vade mecum. He took more delight in the pages of the
Following of Christ than he had ever known in the conflicts of
politics, either in the halls of Congress or the city of New York. It
was not altogether surprising, therefore, that people’s conjectures
should consign him to the prospective seclusion of a monastery, and
that rumors to that effect should have gained circulation. The New
York Times, on one occasion, shortly after Mr. Kelly’s second
marriage, made editorial reference to these rumors, and spoke of
him as that remarkable individual who had escaped being a monk at
Rome, in order to become the nephew of a Cardinal in America.
These revelations of the inner life of John Kelly are not laid before
the public without a great deal of reluctance. Some may think it
were better to keep them back until after his death, and the writer
knows perfectly well that no one else would prohibit their publication
at any time, or under any conceivable circumstances more sternly
than John Kelly himself. But these pages have been written without
consultation with any human being in the world, and recollecting the
unparalleled and shameful abuse which this man has been subjected
to for doing his duty as God has given him to see it, the writer is
resolved to tell the truth about him, and let the unprejudiced reader
know something of his real character. Indeed hardly a tithe of those
charities and good works of John Kelly which are within the personal
knowledge of the present writer, have been mentioned in these
pages. During the war for the Union, especially, were the kindly
impulses of his nature displayed. He went about among the hospitals
visiting and cheering the sick and despondent, supplying articles for
their relief and money for their wants, and doing what he could for
the wounded. He did not confine these ministrations to the hospitals
in New York, but went to Washington and got a pass from Edwin M.
Stanton, Secretary of War, whom he had known well in former years,
to visit the Army of the Potomac, and particularly the camp
hospitals. Thither he repaired, and extended his aid not only to New
York soldiers but to those of other States, with characteristic zeal
and liberality. A letter was published in the New York World,
November 1st, 1875, from Mr. James Murphy, in which reference is
made to one of Mr. Kelly’s visits to the army in Virginia.
“I well recollect,” said the writer, “that thirteen years ago, when I
was a soldier in the Second Army Corps of the Army of the Potomac,
and stationed at Stafford Heights, Virginia, opposite Fredericksburg,
I had the pleasure of meeting Mr. John Kelly. His mission was one of
the noblest that man ever followed. He was going round from
hospital to hospital, and from tent to tent, visiting the sick and
wounded of the poor and neglected soldiers of the New York
regiments, to see to their wants, and alleviate their sufferings as
much as lay within his power, and questioning them as to their
treatment as compared with the treatment of the soldiers of other
States.” Many persons in the border States, as those adjoining the
scene of military operations were called, who were guilty of no
disloyal acts, were nevertheless made victims of spies and
detectives, and they and their families suffered great hardships. One
of these was Mr. John Henry Waring, a prominent and wealthy
citizen of Prince George’s County, Maryland, whose property was
confiscated, whose large family, mostly ladies, were banished, and
who was himself imprisoned for the war in Fort Delaware. This was
the work of Baker, the notorious detective, and a more cruel
persecution hardly occurred during the war. Mr. Kelly was appealed
to on behalf of Mr. Waring, and after he was satisfied that injustice
had been done to that excellent citizen, he went to Washington and
saw Mr. Lincoln, and Secretaries Stanton and Montgomery Blair, on
behalf of the Waring family and estate. But Baker had poisoned the
mind of Stanton against the Warings, and, notwithstanding the
Secretary’s regard for Mr. Kelly, he refused the clemency that was
asked. Mr. Kelly returned to New York, and enlisted in Mr. Waring’s
favor the powerful co-operation of Governor Morgan, Archbishop
Hughes, Thurlow Weed, James T. Brady, and about fifty other
leading men, and, thus strengthened, he renewed the appeal for
justice and executive clemency. Postmaster General Blair had
become warmly interested in the case, and to him Mr. Kelly confided
the petition of the citizens of New York named above, and Mr. Blair
in conjunction with Mr. Kelly ceased not to press the case until Mr.
Waring was liberated, his family were recalled from banishment, and
his beautiful home and plantation on the Patuxent river were
restored to him.
Mr. Kelly returned from Europe in the fall of 1871, much improved
in health, but not yet restored to his old vigor. The present writer
gave to Mr. J. E. Mallet, of Washington, D. C., who was going to
Europe, a letter of introduction to Mr. Kelly, while the latter was
abroad. Although they were near each other several times in Europe,
Mr. Mallet did not become acquainted with Mr. Kelly until they
accidentally met on the same steamship, the Republic, in returning
to America. In a letter published in the Baltimore Catholic Mirror, Mr.
Mallet gave an interesting account of this voyage, and of the
amusements improvised on shipboard. “One evening,” said he, “we
organized a musical and literary entertainment. The chairman made
a speech, a lady played a fine musical composition, a gentleman
gave a recitation, a young bride sang a beautiful ballad, Hon. John
Kelly, of New York, sang in excellent style an amusing Irish song,
then a duet was sung by two ladies, some one sang a French song,
Father Sheehy sang an Irish ballad on St. Patrick, and the
entertainment concluded, and the assemblage dispersed during the
reading by the Rev. Dr. Arnot, of one of his old sermons.”
“A valued friend had given me a letter of introduction to Mr. Kelly,
to present in France or Switzerland, but I met that gentleman only
on the wharf at Liverpool, and then almost accidentally. Mr. Kelly has
travelled throughout Europe and the Holy Land, and is one of the
most interesting travelling companions whom I have ever met. I was
particularly pleased with his manner of presenting the true history
of, and reasons for certain religious and national practices in Ireland
and Italy, in opposition to the theories and suppositions of certain of
our fellow-voyagers, who ignorantly calumniated the one, and
ridiculed the other.”
During the three years of Mr. Kelly’s absence in Europe, New York
had been given over to every form of official rascality and plunder.
No sooner had he reached the city than he was besieged by leading
citizens, such as Mr. Tilden, Mr. Schell, Mr. Hewitt, Mr. Belmont, Mr.
Chanler, Mr. Clark, Mr. Green and others, all of whom urged him to
take the lead in a movement for the overthrow of the Tweed Ring.
To each one of these gentlemen he said that it was not in accord
with the plan of life which he had marked out for himself for the
future, to re-enter the field of active politics. But his friends
redoubled their importunities. They told him there was no other man
in New York, scarcely one in the United States, so well fitted as
himself to head such a movement, and that in the lifetime of but
very few persons did so grand an opportunity offer itself to serve the
people as that which now awaited him. His friends finally prevailed,
his private plans were changed, and his memorable reappearance in
New York politics occurred in the year 1872. “My health remains
about the same as when I saw you,” said Mr. Kelly, in 1872, in a
letter to the present writer. “I was compelled to take part, for the
reason that my old associates would not take No for answer. My
active participation has not helped me much in point of health, nor
does it seem possible for me to live in New York without being more
or less mixed up in politics.” In an interview published in the New
York World, October 18, 1875, Mr. Kelly explained more fully how he
was induced to return to politics. Details omitted, the salient points
of that interview were as follows: “When I returned from Europe in
the fall of 1871, it was my intention to have nothing to do with
politics at all. I had been sorely afflicted by the loss of my family,
and I wanted to spend the rest of my life as a private business man.
I was met by a number of leading men, who told me that during my
absence the Democratic party in the city had become utterly
demoralized, and that the Grant Republicans, taking advantage of
this state of affairs, had come into full possession in this great
Democratic city, and they begged me to assume an active part. I
had hundreds of the leading men in the city here at my house,
asking me to take hold and help them up. After much importunity, I
consented, and threw my whole heart into the work. I suppose I
have some foresight. I think I generally see things pretty clearly, and
this is probably why they trust to my judgment. Whenever I fail to
win their confidence it will be an easy matter for them to dispense
with me. I am not commissioned as a leader by any constituted
authority. But as what power and influence I have depend entirely
upon the good will and confidence of the people who choose to
recognize me as a leader, and listen to my advice, I am wholly in
their hands, and they can keep me or reject me any day.”
Mr. Kelly’s part in public affairs prior to 1872 had been creditable
and marked by ability, but there were other public men who, in like
circumstances, had attained equal or greater distinction. In the year
1872 he was called upon to prove whether he was endowed with
that highest of all the gifts of Heaven, the capacity to lead men in a
supreme emergency, and it is not the language of eulogy to say that
he displayed consummate ability as such a leader; and that his
courage, coolness and good judgment enabled him to achieve
results which no other citizen of New York, with similar resources at
command, and similar obstacles in his way, could have
yours truly
John Kelly

In a city of a million inhabitants, where a Government had

prevailed for years, such as disgraced Rome in the days of Caligula,
when the tyrant made his horse a Roman Consul; or in the epoch
from Tiberius to Nero, when folly, crime and profligacy ran riot in all
departments of the Empire, such as Tacitus describes so vividly in
the Annals, and in the immortal Life of Agricola; in such a state of
affairs it was an enormous task for John Kelly to head a successful
movement against a Ring intrenched in office, with millions of stolen
money at command, and backed up by a purchased Legislature. This
task he undertook and accomplished, and history will record the fact
on its imperishable page that the gallant attack upon the Ring in the
Courts and Legislature, by Charles O’Conor and Samuel J. Tilden,
was not crowned with final success until John Kelly carried the war
into Tammany Hall, and drove the Ring politicians from its portals.
O’Conor and Tilden scotched the snake in 1871, and John Kelly killed
it in 1872. Tammany Hall, the cradle of American Democracy, whose
patriotic Sachems in the year 1819 were addressed in a speech by
Andrew Jackson,[61] and in long friendly letters at the same period
by Thomas Jefferson, the elder Adams, and James Madison,[62] was
rescued from disgrace and placed again in control of honest men in
1872 by John Kelly. Not only the political organization, but the
Tammany Society was wrested from the control of the Ring. No
political contest in the history of the city of New York was more
stubbornly fought on both sides, or has been followed by happier
results to the people at large. If great public service entitles a man
to rank among the worthies of the Republic, John Kelly won that title
when he succeeded in expelling the Ring men from Tammany Hall.
His victory marked an epoch. The Board of Sachems of the
Tammany Society for 1871, and the Board for 1872 tell the story of
this great revolution:
1871. 1872.
—— ——
Grand Sachem: Grand Sachem:
William M. Tweed. Augustus Schell.
—— ——
Sachems: Sachems:
Richard B. Connolly, Charles O’Conor,
Peter B. Sweeny, Samuel J. Tilden,
A. Oakey Hall, John Kelly,
Joseph Dowling, Horatio Seymour,
Samuel B. Garvin, Sanford E. Church,
etc. August Belmont,
Abram S. Hewitt,

On the retirement of Mr. Belmont from the Chairmanship of the

National Democratic Committee, in 1872, that distinguished position
was tendered to Mr. Kelly at the meeting of the National Convention
in Baltimore. But domestic affliction had again visited him about that
time, in the death in New York of his only surviving daughter, his
elder daughter having died some time before in a city in Spain,
where her father had taken her in a vain pursuit of health. Cast
down by these afflictions, Mr. Kelly declined the Chairmanship of the
National Committee of his party, but suggested his old friend Mr.
Schell, who was elected Chairman. “Who is John Kelly?” asked some
of the younger delegates at Baltimore, when they heard his name
mentioned as their first choice by the New York delegation. They
were informed by Mr. Schell that Mr. Kelly was detained at home in
the house of mourning, but that he was a great leader in New York
politics, and a true patriot in public life; and that he had sat in
Congress before many of those young men were well out of the
It was about this time that the Committee of Seventy set out to
reform the city government, but those worthy old gentlemen soon
became engaged in an amusing scramble for office, and beyond
putting their chairman, General Dix, in the Governor’s chair, and
another of their number, Mr. Havemeyer, in that of Mayor, they did
not set the river on fire, nor perform any of the twelve labors of
Hercules. As soon as the Committee of Seventy became known as
office-seekers, their usefulness was at an end. John Kelly sought no
office, for he had to fight a battle with office-holders, then a
synonym for corruptionists, and he appreciated the magnitude of the
struggle more correctly than to leave it in anybody’s power to say
that the Ring men and the Reform element, the latter marshalled by
Tilden and himself, were fighting over the offices. A mere scramble
for office between the Ins and Outs is always a vulgar thing. When
they became place-hunters, the Committee of Seventy ceased to be
reformers. Kelly, with better statesmanship, sought no office, and
would accept none. When every other event in his life has been
forgotten, his memorable battle in the County Convention of 1872
will still be remembered. A fiercer one was never fought in American
politics. To employ the words of Mr. Tilden, in his history of the
overthrow of the Tammany Ring, Kelly had to confront on that
occasion, “an organization which held the influence growing out of
the employment of twelve thousand persons, and the disbursement
of thirty millions a year; which had possession of all the machinery
of local government, dominated the judiciary and police, and swayed
the officers of election.”[63]
Harry Genet was leader of the Ring men in the Convention. Prize-
fighters and heelers swarmed upon the floor; and when Samuel B.
Garvin was again placed in nomination for District-Attorney, the
fighters and heelers roared themselves hoarse with applause. Mr.
Kelly took the floor to oppose Garvin, when he was interrupted by
Genet. He replied to the latter in scathing language, arraigned him
and Garvin with the utmost severity, and although hissed by the
hirelings of the Ring, and interrupted by volleys of oaths, John Kelly
kept the mob in sufficient restraint until he caught the eye of the
chairman, and moved an adjournment to 3 o’clock the next day. Mr.
Schell, who was in the chair, put the motion to adjourn, and it was
carried, in spite of the protests of the mob.
The next day the same emissaries of the Ring were there to
overwhelm the Convention again, but this time Kelly was prepared
for them. He had a force stationed at the doors of Tammany Hall,
and no man, not a delegate to the Convention, and not provided
with a delegate’s ticket, was allowed to enter the building. The police
and city authorities were on the side of the desperadoes, but no
policeman was allowed inside the premises. This bold stand of Mr.
Kelly had the desired effect. By his personal intrepidity, and
readiness to resist attack, he cowed the rowdies, and no others but
delegates got into the Convention. Garvin was defeated, and Charles
Donohue was nominated for District-Attorney. Abram R. Lawrence
was nominated for Mayor. It was in that day’s struggle that the
backbone of the Ring was broken, and it ceased to be a compact
organization, and melted away after that day’s defeat. Havemeyer of
the Committee of Seventy was elected Mayor, with Lawrence a close
second, and O’Brien a bad third. Phelps beat Donohue for District-
Attorney. But Reformed Tammany, in spite of predictions to the
contrary, polled a surprisingly large vote, and although it did not
elect, it was a vote of confidence in John Kelly, and discerning men
saw that the future belonged to the old organization. Mr. Havemeyer,
who had been an excellent Mayor in early life, now proved a failure.
His defiance of the Supreme Court in the case of Police
Commissioners Charlick and Gardner raised a storm of indignation
about his head, and led to his reprimand by Governor Dix, who
threatened his removal from office. Charlick and Gardner had been
indicted for a violation of the election laws, and Mr. Kelly was very
active in bringing on their trial. They were convicted by the Jury, and
sentenced by Judge Brady to pay a fine of $250 each, but conviction
carried with it a still severer penalty, forfeiture of their offices and
disability to fill them by reappointment. The Mayor’s attempt to
reappoint them was an act of surprising folly, but when the
Governor’s reprimand reached him, with the statement that his age,
and near completion of his term of office, alone saved him from
removal for contumacy, Mayor Havemeyer’s rage vented itself in an
extravagantly abusive attack on John Kelly. He held Mr. Kelly
responsible for the trial of Charlick and Gardner, and after
astounding the community by defying the Supreme Court with a vain
attempt to re-instate the guilty officials, he brought the matter to an
impotent conclusion by pouring out a torrent of abuse upon John
Kelly, and assailing his record for honesty when he was Sheriff of
New York. During all the long years which had elapsed since Mr.
Kelly had held that office, not one syllable had ever been uttered
derogatory to his exalted character for honesty as Sheriff, until
Mayor Havemeyer made his reckless charges. Smarting under a
sense of humiliation after the Gardner-Charlick fiasco, the Mayor
allowed bad temper to get the mastery of his judgment, and the
explosion of wrath against Mr. Kelly followed. The animus of the
attack was perfectly apparent on its face, and the good sense of the
people was not imposed upon by the revengeful ebullitions of the
angry old gentleman. Mr. Kelly promptly instituted a suit for
damages, but on the very day the trial began, by a remarkable
coincidence Mayor Havemeyer, stricken by apoplexy, fell dead in his
office. The passionate events of the moment were forgotten, and a
sense of sorrow pervaded the community. Mr. Havemeyer’s long and
honorable career was remembered, and the unfortunate passage in
his last days was generally, and justly imputed to the misguided
counsels of his friends.
The Tammany Democrats were completely victorious at the
election of 1873. Those able lawyers, Charles Donohue and Abram
R. Lawrence, were elected to the Supreme Court. The late William
Walsh and the late Wm. C. Connor, both excellent men, were elected
County Clerk and Sheriff. Again, in 1874, victory perched on the
standards of Mr. Kelly. This time its dimensions were larger. In
addition to a Mayor (Mr. Wickham), and other city officers, a
Governor (Mr. Tilden), and other State officers, were chosen by
overwhelming Democratic majorities.
Mr. Kelly had been the first man to suggest Mr. Tilden’s nomination
for Governor. His splendid services in the war on the Ring pointed
him out as the fit candidate of his party. Tired out, after his long
labors, Mr. Tilden, in 1874, went to Europe to enjoy the first holiday
he had allowed himself for years. But such was his confidence in the
judgment of Mr. Kelly, that a cable message from that friend was
sufficient to cause him to cancel his engagements in Europe, give up
his tour, and take passage in the first steamer for New York. The
Canal Ring was in motion against Tilden’s nomination, and Kelly, who
had found this out, thought there was no time for delay. Tilden at
first expressed disinclination for the office, but the Tammany Chief
had set his heart on his nomination, and the author of these pages
has heard Mr. Tilden say that Mr. Kelly’s persistency finally controlled
his decision, and won his acquiescence. One of the leading delegates
to the Convention of 1874 was Mr. William Purcell, editor of the
Rochester Union. “To John Kelly,” said Purcell editorially, shortly after
the election, “more than any other man does Governor Tilden owe
his nomination and his majority at the election. Governor Tilden was
personally present at the nominating convention, in close counsel
with Mr. Kelly, than whom he lauded no man higher for his personal
honesty, his political integrity, and his purity of purpose.”
Mr. Tilden was a constant visitor at Mr. Kelly’s house during this
period, and no two men could have evinced more respect and
friendship for each other. The last time Mr. Tilden attended a
meeting in Tammany Hall was at the election of Sachems on the
third Monday of April, 1874. The late Matthew T. Brennan and others
ran an opposition or anti-Kelly ticket, and so anxious was Mr. Tilden
for the defeat of this movement that he came down to the Wigwam,
and took an active part in favor of the regular ticket. He sat with Mr.
Kelly, and when the result was announced warmly congratulated him
upon the victory.
In the latter part of January, 1875, a few weeks after Mr. Tilden’s
inauguration as Governor, the author spent a morning at his
residence in Gramercy Park, and there met ex-Governor Seymour
and Mr. Kelly, in company with Governor Tilden. The conversation of
these three distinguished men, in the abandon of social intercourse
around the hearthstone of Gramercy Park, was very agreeable and
entertaining. The author was an attentive listener and observer, and
afterwards, on the same day, wrote out in his diary his impressions
of these three celebrated New Yorkers. Although ten years have

You might also like