100% found this document useful (10 votes)
327 views14 pages

Machine Learning in Medicine A Complete Overview Exclusive Download

The book 'Machine Learning in Medicine - a Complete Overview' provides a comprehensive guide to machine learning methodologies applicable in the medical field, addressing the challenges faced by clinicians in analyzing large datasets. It covers various machine learning techniques, including cluster models, (log)linear models, and rules models, with practical examples and step-by-step analyses. This resource is designed for medical professionals and students, emphasizing the importance of machine learning in improving health care outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (10 votes)
327 views14 pages

Machine Learning in Medicine A Complete Overview Exclusive Download

The book 'Machine Learning in Medicine - a Complete Overview' provides a comprehensive guide to machine learning methodologies applicable in the medical field, addressing the challenges faced by clinicians in analyzing large datasets. It covers various machine learning techniques, including cluster models, (log)linear models, and rules models, with practical examples and step-by-step analyses. This resource is designed for medical professionals and students, emphasizing the importance of machine learning in improving health care outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Machine Learning in Medicine a Complete Overview

Visit the link below to download the full version of this book:

https://medipdf.com/product/machine-learning-in-medicine-a-complete-overview/

Click Download Now


Ton J. Cleophas • Aeilko H. Zwinderman

Machine Learning in
Medicine - a Complete
Overview
With the help from HENNY I. CLEOPHAS-ALLERS,
BChem
Ton J. Cleophas Aeilko H. Zwinderman
Department Medicine Department Biostatistics and Epidemiology
Albert Schweitzer Hospital Academic Medical Center
Sliedrecht, The Netherlands Amsterdam, The Netherlands

Additional material to this book can be downloaded from http://extras.springer.com.

ISBN 978-3-319-15194-6 ISBN 978-3-319-15195-3 (eBook)


DOI 10.1007/978-3-319-15195-3

Library of Congress Control Number: 2015930334

Springer Cham Heidelberg New York Dordrecht London


© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www.


springer.com)
Preface

The amount of data stored in the world’s databases doubles every 20 months, as
estimated by Usama Fayyad, one of the founders of machine learning and co-author
of the book Advances in Knowledge Discovery and Data Mining (ed. by the
American Association for Artificial Intelligence, Menlo Park, CA, USA, 1996), and
clinicians, familiar with traditional statistical methods, are at a loss to analyze them.
Traditional methods have, indeed, difficulty to identify outliers in large datasets,
and to find patterns in big data and data with multiple exposure/outcome variables.
In addition, analysis-rules for surveys and questionnaires, which are currently com-
mon methods of data collection, are, essentially, missing. Fortunately, the new dis-
cipline, machine learning, is able to cover all of these limitations.
So far, medical professionals have been rather reluctant to use machine learning.
Ravinda Khattree, co-author of the book Computational Methods in Biomedical
Research (ed. by Chapman & Hall, Baton Rouge, LA, USA, 2007) suggests that
there may be historical reasons: technological (doctors are better than computers
(?)), legal, cultural (doctors are better trusted). Also, in the field of diagnosis mak-
ing, few doctors may want a computer checking them, are interested in collabora-
tion with a computer or with computer engineers.
Adequate health and health care will, however, soon be impossible without
proper data supervision from modern machine learning methodologies like cluster
models, neural networks, and other data mining methodologies. The current book is
the first publication of a complete overview of machine learning methodologies for
the medical and health sector, and it was written as a training companion, and as a
must-read, not only for physicians and students, but also for anyone involved in the
process and progress of health and health care.
Some of the 80 chapters have already appeared in Springer’s Cookbook Briefs,
but they have been rewritten and updated. All of the chapters have two core charac-
teristics. First, they are intended for current usage, and they are, particularly, con-
cerned with improving that usage. Second, they try and tell what readers need to
know in order to understand the methods.

v
vi Preface

In a nonmathematical way, stepwise analyses of the below three most important


classes of machine learning methods will be reviewed:
Cluster and classification models (Chaps. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, and 18),
(Log)linear models (Chaps. 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, and 49),
Rules models (Chaps. 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, and 80).
The book will include basic methodologies like typology of medical data,
quantile-quantile plots for making a start with your data, rate analysis and trend
analysis as more powerful alternatives to risk analysis and traditional tests, probit
models for binary effects on treatment frequencies, higher order polynomes for cir-
cadian phenomena, contingency tables and its myriad applications. Particularly,
Chaps. 9, 14, 15, 18, 45, 48, 49, 79, and 80 will review these methodologies.
Chapter 7 describes the use of visualization processes instead of calculus meth-
ods for data mining. Chapter 8 describes the use of trained clusters, a scientifically
more appropriate alternative to traditional cluster analysis. Chapter 69 describes
evolutionary operations (evops), and the evop calculators, already widely used for
chemical and technical process improvement.
Various automated analyses and simulation models are in Chaps. 4, 29, 31, and
32. Chapters 67, 70, 71 review spectral plots, Bayesian networks, and support vec-
tor machines. A first description of several methods already employed by technical
and market scientists, and of their suitabilities for clinical research, is given in
Chaps. 37, 38, 39, and 56 (ordinal scalings for inconsistent intervals, loglinear mod-
els for varying incident risks, and iteration methods for cross-validations).
Modern methodologies like interval censored analyses, exploratory analyses
using pivoting trays, repeated measures logistic regression, doubly multivariate
analyses for health assessments, and gamma regression for best fit prediction of
health parameters are reviewed in Chaps. 10, 11, 12, 13, 16, 17, 42, 46, and 47.
In order for the readers to perform their own analyses, SPSS data files of the
examples are given in extras.springer.com, as well as XML (eXtended Markup
Language), SPS (Syntax), and ZIP (compressed) files for outcome predictions in
future patients. Furthermore, four csv type excel files are available for data analysis
in the Konstanz information miner (Knime) and Weka (Waikato University New
Zealand) miner, widely approved free machine learning software packages on the
internet since 2006. Also a first introduction is given to SPSS modeler (SPSS’ data
mining workbench, Chaps. 61, 64, 65), and to SPSS Amos, the graphical and non-
graphical data analyzer for the identification of cause-effect relationships as prin-
ciple goal of research (Chaps. 48 and 49). The free Davidwees polynomial grapher
is used in Chap. 79.
This book will demonstrate that machine learning performs sometimes better
than traditional statistics does. For example, if the data perfectly fit the cut-offs
for node splitting, because, e.g., ages > 55 years give an exponential rise in
infarctions, then decision trees, optimal binning, and optimal scaling will be better
Preface vii

analysis-methods than traditional regression methods with age as continuous


predictor. Machine learning may have little options for adjusting confounding and
interaction, but you can add propensity scores and interaction variables to almost
any machine learning method.
Each chapter will start with purposes and scientific questions. Then, step-by-step
analyses, using both real data and simulated data examples, will be given. Finally, a
paragraph with conclusion, and references to the corresponding sites of three intro-
ductory textbooks previously written by the same authors, is given.

Lyon, France Ton J. Cleophas


December 2015 Aeilko H. Zwinderman
Contents

Part I Cluster and Classification Models

1 Hierarchical Clustering and K-Means Clustering to Identify


Subgroups in Surveys (50 Patients) ....................................................... 3
General Purpose ........................................................................................ 3
Specific Scientific Question ...................................................................... 3
Hierarchical Cluster Analysis.................................................................... 4
K-Means Cluster Analysis......................................................................... 6
Conclusion................................................................................................. 7
Note ........................................................................................................... 8
2 Density-Based Clustering to Identify Outlier Groups
in Otherwise Homogeneous Data (50 Patients) .................................... 9
General Purpose ........................................................................................ 9
Specific Scientific Question ...................................................................... 9
Density-Based Cluster Analysis ................................................................ 10
Conclusion................................................................................................. 11
Note ........................................................................................................... 11
3 Two Step Clustering to Identify Subgroups and Predict Subgroup
Memberships in Individual Future Patients (120 Patients) ................ 13
General Purpose ........................................................................................ 13
Specific Scientific Question ...................................................................... 13
The Computer Teaches Itself to Make Predictions ................................... 14
Conclusion................................................................................................. 15
Note ........................................................................................................... 15
4 Nearest Neighbors for Classifying New Medicines
(2 New and 25 Old Opioids) ................................................................... 17
General Purpose ........................................................................................ 17
Specific Scientific Question ...................................................................... 17

ix
x Contents

Example..................................................................................................... 17
Conclusion................................................................................................. 24
Note ........................................................................................................... 24
5 Predicting High-Risk-Bin Memberships (1,445 Families) ................... 25
General Purpose ........................................................................................ 25
Specific Scientific Question ...................................................................... 25
Example..................................................................................................... 25
Optimal Binning ........................................................................................ 26
Conclusion................................................................................................. 29
Note ........................................................................................................... 29
6 Predicting Outlier Memberships (2,000 Patients) ................................ 31
General Purpose ........................................................................................ 31
Specific Scientific Question ...................................................................... 31
Example..................................................................................................... 31
Conclusion................................................................................................. 34
Note ........................................................................................................... 34
7 Data Mining for Visualization of Health Processes (150 Patients)...... 35
General Purpose ........................................................................................ 35
Primary Scientific Question ...................................................................... 35
Example..................................................................................................... 36
Knime Data Miner..................................................................................... 37
Knime Workflow ....................................................................................... 38
Box and Whiskers Plots ............................................................................ 39
Lift Chart ................................................................................................... 39
Histogram .................................................................................................. 40
Line Plot .................................................................................................... 41
Matrix of Scatter Plots .............................................................................. 42
Parallel Coordinates .................................................................................. 43
Hierarchical Cluster Analysis with SOTA (Self Organizing
Tree Algorithm) ........................................................................................ 44
Conclusion................................................................................................. 45
Note ........................................................................................................... 46
8 Trained Decision Trees for a More Meaningful Accuracy
(150 Patients) ........................................................................................... 47
General Purpose ........................................................................................ 47
Primary Scientific Question ...................................................................... 47
Example..................................................................................................... 48
Downloading the Knime Data Miner ........................................................ 49
Knime Workflow ....................................................................................... 50
Conclusion................................................................................................. 52
Note ........................................................................................................... 52
Contents xi

9 Typology of Medical Data (51 Patients) ................................................ 53


General Purpose ........................................................................................ 53
Primary Scientific Question ...................................................................... 54
Example..................................................................................................... 54
Nominal Variable .................................................................................. 55
Ordinal Variable.................................................................................... 56
Scale Variable ....................................................................................... 57
Conclusion................................................................................................. 59
Note ........................................................................................................... 60
10 Predictions from Nominal Clinical Data (450 Patients) ...................... 61
General Purpose ........................................................................................ 61
Primary Scientific Question ...................................................................... 61
Example..................................................................................................... 61
Conclusion................................................................................................. 65
Note ........................................................................................................... 65
11 Predictions from Ordinal Clinical Data (450 Patients)........................ 67
General Purpose ........................................................................................ 67
Primary Scientific Question ...................................................................... 67
Example..................................................................................................... 68
Conclusion................................................................................................. 70
Note ........................................................................................................... 70
12 Assessing Relative Health Risks (3,000 Subjects)................................. 71
General Purpose ........................................................................................ 71
Primary Scientific Question ...................................................................... 71
Example..................................................................................................... 71
Conclusion................................................................................................. 75
Note ........................................................................................................... 75
13 Measuring Agreement (30 Patients) ...................................................... 77
General Purpose ........................................................................................ 77
Primary Scientific Question ...................................................................... 77
Example..................................................................................................... 77
Conclusion................................................................................................. 79
Note ........................................................................................................... 79
14 Column Proportions for Testing Differences Between
Outcome Scores (450 Patients) ............................................................... 81
General Purpose ........................................................................................ 81
Specific Scientific Question ...................................................................... 81
Example..................................................................................................... 81
Conclusion................................................................................................. 85
Note ........................................................................................................... 85
xii Contents

15 Pivoting Trays and Tables for Improved Analysis


of Multidimensional Data (450 Patients)............................................... 87
General Purpose ........................................................................................ 87
Primary Scientific Question ...................................................................... 87
Example..................................................................................................... 87
Conclusion................................................................................................. 94
Note ........................................................................................................... 94
16 Online Analytical Procedure Cubes, a More Rapid Approach
to Analyzing Frequencies (450 Patients) ............................................... 95
General Purpose ........................................................................................ 95
Primary Scientific Question ...................................................................... 95
Example..................................................................................................... 95
Conclusion................................................................................................. 99
Note ........................................................................................................... 99
17 Restructure Data Wizard for Data Classified the Wrong Way
(20 Patients) ............................................................................................. 101
General Purpose ........................................................................................ 101
Primary Scientific Question ...................................................................... 103
Example..................................................................................................... 103
Conclusion................................................................................................. 104
Note ........................................................................................................... 104
18 Control Charts for Quality Control of Medicines
(164 Tablet Desintegration Times) ......................................................... 105
General Purpose ........................................................................................ 105
Primary Scientific Question ...................................................................... 105
Example..................................................................................................... 106
Conclusion................................................................................................. 109
Note ........................................................................................................... 110

Part II (Log) Linear Models

19 Linear, Logistic, and Cox Regression for Outcome Prediction


with Unpaired Data (20, 55, and 60 Patients) ....................................... 113
General Purpose ........................................................................................ 113
Specific Scientific Question ...................................................................... 113
Linear Regression, the Computer Teaches Itself to Make Predictions...... 114
Conclusion................................................................................................. 116
Note ........................................................................................................... 116
Logistic Regression, the Computer Teaches Itself to Make Predictions ... 116
Conclusion................................................................................................. 118
Note ........................................................................................................... 118
Cox Regression, the Computer Teaches Itself to Make Predictions ......... 118
Conclusion................................................................................................. 121
Note ........................................................................................................... 121
Contents xiii

20 Generalized Linear Models for Outcome Prediction


with Paired Data (100 Patients and 139 Physicians) ............................ 123
General Purpose ........................................................................................ 123
Specific Scientific Question ...................................................................... 123
Generalized Linear Modeling, the Computer Teaches
Itself to Make Predictions ......................................................................... 123
Conclusion................................................................................................. 125
Generalized Estimation Equations, the Computer Teaches
Itself to Make Predictions ......................................................................... 126
Conclusion................................................................................................. 129
Note ........................................................................................................... 129
21 Generalized Linear Models Event-Rates (50 Patients) ........................ 131
General Purpose ........................................................................................ 131
Specific Scientific Question ...................................................................... 131
Example..................................................................................................... 131
The Computer Teaches Itself to Make Predictions ................................... 132
Conclusion................................................................................................. 135
Note ........................................................................................................... 135
22 Factor Analysis and Partial Least Squares (PLS)
for Complex-Data Reduction (250 Patients) ......................................... 137
General Purpose ........................................................................................ 137
Specific Scientific Question ...................................................................... 137
Factor Analysis .......................................................................................... 138
Partial Least Squares Analysis (PLS) ........................................................ 140
Traditional Linear Regression ................................................................... 142
Conclusion................................................................................................. 142
Note ........................................................................................................... 142
23 Optimal Scaling of High-Sensitivity Analysis
of Health Predictors (250 Patients) ........................................................ 143
General Purpose ........................................................................................ 143
Specific Scientific Question ...................................................................... 143
Traditional Multiple Linear Regression .................................................... 144
Optimal Scaling Without Regularization .................................................. 145
Optimal Scaling With Ridge Regression................................................... 146
Optimal Scaling With Lasso Regression ................................................... 147
Optimal Scaling With Elastic Net Regression........................................... 147
Conclusion................................................................................................. 148
Note ........................................................................................................... 148
24 Discriminant Analysis for Making a Diagnosis
from Multiple Outcomes (45 Patients) .................................................. 149
General Purpose ........................................................................................ 149
Specific Scientific Question ...................................................................... 149
The Computer Teaches Itself to Make Predictions ................................... 150
Conclusion................................................................................................. 153
Note ........................................................................................................... 153
xiv Contents

25 Weighted Least Squares for Adjusting Efficacy Data


with Inconsistent Spread (78 Patients) .................................................. 155
General Purpose ........................................................................................ 155
Specific Scientific Question ...................................................................... 155
Weighted Least Squares ............................................................................ 156
Conclusion................................................................................................. 158
Note ........................................................................................................... 158
26 Partial Correlations for Removing Interaction Effects
from Efficacy Data (64 Patients) ............................................................ 159
General Purpose ........................................................................................ 159
Specific Scientific Question ...................................................................... 159
Partial Correlations.................................................................................... 160
Conclusion................................................................................................. 162
Note ........................................................................................................... 163
27 Canonical Regression for Overall Statistics
of Multivariate Data (250 Patients) ....................................................... 165
General Purpose ........................................................................................ 165
Specific Scientific Question ...................................................................... 165
Canonical Regression ................................................................................ 166
Conclusion................................................................................................. 169
Note ........................................................................................................... 169
28 Multinomial Regression for Outcome Categories (55 Patients).......... 171
General Purpose ........................................................................................ 171
Specific Scientific Question ...................................................................... 171
The Computer Teaches Itself to Make Predictions ................................... 172
Conclusion................................................................................................. 174
Note ........................................................................................................... 174
29 Various Methods for Analyzing Predictor Categories
(60 and 30 Patients) ................................................................................. 175
General Purpose ........................................................................................ 175
Specific Scientific Questions ..................................................................... 175
Example 1.................................................................................................. 175
Example 2.................................................................................................. 179
Conclusion................................................................................................. 182
Note ........................................................................................................... 182
30 Random Intercept Models for Both Outcome
and Predictor Categories (55 patients) .................................................. 183
General Purpose ........................................................................................ 183
Specific Scientific Question ...................................................................... 184
Example..................................................................................................... 184
Conclusion................................................................................................. 187
Note ........................................................................................................... 187
Contents xv

31 Automatic Regression for Maximizing Linear Relationships


(55 patients).............................................................................................. 189
General Purpose ........................................................................................ 189
Specific Scientific Question ...................................................................... 189
Data Example ........................................................................................... 189
The Computer Teaches Itself to Make Predictions ................................... 192
Conclusion................................................................................................. 193
Note ........................................................................................................... 194
32 Simulation Models for Varying Predictors (9,000 Patients) ................ 195
General Purpose ........................................................................................ 195
Specific Scientific Question ...................................................................... 195
Instead of Traditional Means and Standard Deviations, Monte
Carlo Simulations of the Input and Outcome Variables are Used
to Model the Data. This Enhances Precision, Particularly,
With non-Normal Data ............................................................................. 196
Conclusion................................................................................................. 200
Note ........................................................................................................... 201
33 Generalized Linear Mixed Models for Outcome Prediction
from Mixed Data (20 Patients) ............................................................... 203
General Purpose ........................................................................................ 203
Specific Scientific Question ...................................................................... 203
Example..................................................................................................... 203
Conclusion................................................................................................. 206
Note ........................................................................................................... 206
34 Two-Stage Least Squares (35 Patients) ................................................. 207
General Purpose ........................................................................................ 207
Primary Scientific Question ...................................................................... 207
Example..................................................................................................... 208
Conclusion................................................................................................. 210
Note ........................................................................................................... 210
35 Autoregressive Models for Longitudinal Data
(120 Mean Monthly Population Records) ............................................. 211
General Purpose ........................................................................................ 211
Specific Scientific Question ...................................................................... 211
Example..................................................................................................... 212
Conclusion................................................................................................. 216
Note ........................................................................................................... 217
36 Variance Components for Assessing the Magnitude
of Random Effects (40 Patients)............................................................. 219
General Purpose ........................................................................................ 219
Primary Scientific Question ...................................................................... 219
Example..................................................................................................... 220
Conclusion................................................................................................. 222
Note ........................................................................................................... 222

You might also like