Predictive Analytic Techniques and Big Data For Improved Health Outcomes Final Report
Predictive Analytic Techniques and Big Data For Improved Health Outcomes Final Report
February 2021
This research was commissioned via LSE
Consulting which was set up by the London School
of Economics and Political Science to enable and
facilitate the application of its academic expertise
and intellectual resources.
LSE Consulting
LSE Enterprise Limited
London School of Economics and Political Science
Houghton Street
London
WC2A 2AE
Table of contents
5
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
List of abbreviations
6
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
7
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Executive Summary
Background
In the recent years, predictive analytics tools have been increasingly investigated and adopted
in healthcare. When effective, these tools can successfully identify and stratify patients based
on their individual risk of incurring in a specific health outcome. In this sense, predictive analytics
tools differ from traditional descriptive analytics as the latter try to explain already existing
processes and functions. This innovation can potentially reshape risk management in healthcare,
since it can allow clinicians, patients, administrative staff and health care decision makers to
predict potential events and therefore change the traditional decision-making processes. The
development of these predictive tools is complemented by the advancement in the use and
availability of big data, which represent the basis of a new and more accurate data infrastructure.
The hypothesis of this scoping review is to investigate if predictive analytics are effective in
identifying patients at risk of poor health outcomes, and if their development is actually
improving patient outcomes for providers and enhancing the transition to different
reimbursement models, such as the value-based ones. This scoping review also tries to set up a
taxonomy of the available predictive analytics tools based on the identified literature, and to list
the techniques and sources of data used to develop them.
Methods
A scoping review has been conducted to gather evidence in favour or in opposition to the broad
research hypothesis, which is the following: “The use of predictive modelling to proactively
identify patients who are at highest risk of poor health outcomes and will benefit most from
intervention (also assuming that this intervention is happening early) is one solution believed to
improve an efficient resource allocation and patient outcomes.” The over-arching theme is
therefore the following: implementation of predictive algorithms/analytics and/or artificial
intelligence in health care can support population health management, predict and improve
health outcomes, optimise care delivery, develop precision medicine and new therapies, help
structure value-based agreements between payers and suppliers, reduce unnecessary
expenditure and improve efficiency in resource allocation across the value-based care continuum
The review has been developed following the PRISMA guidelines for scoping reviews. Four
databases have been used (Ovid Medline, PubMed, Web of Science and Scopus), and a grey
literature search has been performed using a similar search strategy. The search included all the
relevant publications from January 1992 to April 2019, limited to the English language. Included
studies were categorized based on disease area of interest, type of predictive tool(s), clinical
treatment outcome and disease stage. The taxonomy of predictive tools reported in this research
is based on the included studies and does not intend to be a univocal classification of these tools.
Based on the 198 included articles, the review summarises the predictive analytics tools adopted
over the years, what techniques are used to develop them, and how they are commonly adopted
by healthcare providers.
Seven predictive tools categories are identified: 1. Scoring systems; 2. Risk index/scores; 3.
Staging/Grading systems; 4. Algorithms; 5. Modelling (i.e., single tools); 6. Machine Learning;
7. Deep Learning. Also, 11 techniques were found: 1. Algorithms; 2. Association rules learning;
3. Convolutional Neural Networks; Decision Trees; 4. Deep Belief Network; 5. Deep Neural
Network; 6. Hazard models (e.g., Cox proportional hazard model); 7. Linear/Logistic regression;
8. Naïve Bayes; 9. Neural Networks; 10. Nomograms; 11. Random Forests. Common big data
8
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
sources are the following: Administrative claims, Clinical Trial Data, Electronic Health Records
(EHR), Personal Genome Services, Smartphone applications, Social media, Wearable devices.
Scoring systems are among the first predictive tools, as they started to be developed at the end
of 1960s for trauma patients. They are also predominant in the literature of 1990s and early
2000s (34 of the 48 included studies between 1992 and 2009 are related to scoring systems).
Together with scoring systems, risk index, risk scores, staging and grading systems can be
considered traditional tools as they are not fitting or are very marginal in the machine learning
spectrum, and are mostly supervised tools, while many of the recent predictive modelling and
artificial intelligence tools are unsupervised. Supervised tools are based on a known set of input
data and on a specific output, with the goal of generating a predicted output with different input
data. Unsupervised tools instead are developed without a specific output, and have the goal of
exploring the input data, allowing the tool to find potential unknown correlations.
Based on the included literature, it is not possible to rank the efficacy of different predictive
tools, as so far research has only focused on comparing a specific set of predictive tools, and
mostly only for certain disease areas. Synthesis from the scoping review is showing that there
isn’t a broader attempt of comparing all the available predictive tools or techniques in the
literature, and this is a symptom of a prevalent fragmentation of interests from the stakeholder
community. However, it emerges that artificial intelligence is the area where most of the
ambitions of developing more accurate, generalisable and reliable tools are concentrated.
Scoring systems are by far the most commonly investigated predictive tools (113 out of 198
articles). This is likely caused by the fact that they are quite easy to be developed, do not
necessarily need a large amount of data and can be easily understood by patients and healthcare
providers. 24 articles instead are related to algorithms, 17 to artificial intelligence (9 regarding
machine learning tools and 8 to deep learning tools), 18 articles analyse risk index and risk
scores, 17 are based on generic predictive modelling, and finally only 9 out of 198 are related
to staging and grading systems.
Oncology is the disease area where most predictive analytics tools are adopted (73 out of 198
articles), followed by cardiovascular diseases (38 articles), liver diseases (17 articles) and kidney
diseases (11 articles). Other areas which are investigated to a lesser extent are surgical
techniques (9 articles), digestive system (9 articles), haematology (6 articles), infectious
diseases (6 articles), neurodegenerative diseases (5 articles) and orthopaedics (5 articles). Only
14 articles are not focussing on a specific disease area.
With regards to the disease stage, most of the studies (119 articles) are related to secondary
care, while 59 are based on prevention or primary care. Most common investigated clinical
treatment outcome is related to surgery (75 out of 198 articles), followed by survival (63
articles), and occurrence of diseases (28 articles).
Challenges
Overall predictive analytics tools have found to be a useful resource for key stakeholders, as
most of them shown to be useful complementary tools for healthcare providers and patients in
predicting health outcomes and comparing risks of different treatments at an individual level.
However, different challenges emerge from the included literature. The review identified
challenges related to 1) Predictive tools external validation and data quality; 2) Governance and
regulation; 3) Data infrastructure, exchange and interoperability; 4) Healthcare workforce
education and adaptation; 5) Predictive tools and healthcare financing; 6) Data privacy and
ethics; 7) Patient safety.
1. Predictive tools external validation and data quality. Regarding predictive analytics tools
efficacy and reliability, the first challenge is related to the generalisability of their performance.
Generalisability, or external validation of predictive analytics tools is required to scientifically
9
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
prove that a specific tool would be effective and reliable in a heterogeneous population, and this
is lacking for most of the predictive tools. Also, data quality can be an issue in the accuracy of
models to infer correct relationships or generalize results. This can be caused by many factors,
starting from the way data is collected (patients’ or providers’ biases), data readability for the
model (that as said can be sorted with NLP tools), or because of time frames over which the
model predicts an event.
2. Governance and regulation. As of today, there is no scientific consensus over which tools
or techniques are suggested based on disease area or type of diagnosis/treatment. However,
attempts by national agencies are currently done to close this gap. The American FDA is an
example as they developed and updated their regulatory framework for artificial intelligence and
machine learning based software as medical devices (SaMD) to adapt regulation to new
innovations, improve transparency and enhance tools validation.
4. Healthcare workforce education and adaptation. Digital maturity amongst health care
employees needs to be a priority to enable health care systems to manage genomic and AI tools.
According to the Watcher Review, within the English NHS, it is expected that all Trusts will
achieve by 2023 a high level of digital maturity. This means that local Trusts will have to be able
to develop and manage infrastructures where new digital technologies will be implemented. The
Topol Review remarks how, by 2040 at least the 80% of the health workforce will have to be
able to understand and manage genomics and AI tools. Also, it will be a challenge to find good
quality expertise in data analysis and science, both in clinical organisation and in other
organizations.
5. Predictive tools and healthcare financing. The included literature provides no evidence
directly examining how risk management for providers based on predictive tools could enhance
a transition to value-based payments. Out of 198 studies there is hardly any evidence on how
predictive tools could enhance a transition to value-based payments. However, a few examples
are available, like the Buurtzorg Neighbourhood Care insurers in the Netherlands, where they
are trying to simultaneously collect behavioural, demographic, health, and engagement data to
provide an opportunity for machine learning and development of novel AI tools. This
infrastructure could be useful for them to enhance the development of patient-centred and value-
based systems.
6. Data privacy and ethics. Not everywhere the legislation is well updated for the most recent
predictive analytics tools. GDPR (General Data Protection Regulation) in Europe or the
California’s Consumer Privacy Act are two good examples of setting up a data privacy regulation
framework, however the high costs for regulatory compliance could still limit small organisations’
growth in this sector. Ethics challenges in this topic can arise when suggested treatments from
the predictive tool can be in conflict with physicians’ ethical obligations or patient’s preferences.
7. Patient safety. Safety and efficacy of predictive analytics, particularly for the AI driven ones,
is strictly related to how updated the regulatory framework is. The already mentioned reforms
carried by the FDA represent an example of updating regulatory standards for safety and efficacy
assessments.
Flat Iron. Flat Iron is one of the fastest growing companies in the sector, which has set a goal
of having an automatic system that gathers millions patient data in a readable format for AI
10
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
tools. Intergovernmental organizations and policymakers could look at these realities to build
collaborations.
Case studies from national organizations. NHS digital in England represents an example of
how a national healthcare provider and insurer can set up a specific area of work for AI and
predictive tools implementation. The idea of creating sustainable infrastructures in local trusts
and promoting a development effort at all levels of the organizations, from national management
organization to local trusts, is an example of how national health insurers and providers could
collaborate on a national level.
In USA instead, the Food and Drug Administration (FDA) published in 2019 a guide to evaluate
AI tools and a discussion paper called proposed regulatory framework for modifications to AI/ML
based software as Medical Device (SaMD), that looks at how changing algorithms can be more
efficiently assessed in premarket development and post market performance assessment. The
discussion led in 2021 to an updated framework and an action plan that encourages data
harmonisation, transparency and safety.
Big Data for Better Outcomes. This project sees the collaboration of a large number or
universities, national and local insurers and regulators and pharmaceutical companies. It is an
example of creating a vast collaboration with key stakeholders and of breadth in analysis, since
it looks at all the main disease areas. Results from this project can lead to tangible progress in
big data management and therefore in how it could be implemented in AI tools.
INF-ACT. Similarly, this project, promoted by the European Commission, involves 40 partners
in 28 countries, and represents an example of how intergovernmental organizations could
promote international collaborations on big data research.
Maccabi Biobank. Maccabi Health Services and TIPA Biobank established in 2017 the TIPA
Biobank Research Initiative, with the goal of collecting biological samples that can be used for
research. The project strives to collect a solid set of data as it is linked to Maccabi Health
Services, which is one of the main insurer organization in Israel. So far they have collected
samples from 2.5 million members among 350 different labs, allowing the possibility of using
this data for longitudinal studies for a wide range of clinical conditions.
To the authors’ knowledge this study provides the first comprehensive taxonomy of predicting
tools focusing on health outcomes. Predictive analytics tools have found to be a useful resource
in healthcare, however different challenges, particularly for the most recent analytics tools, still
have to be addressed. In the next two decades is expected a massive increase in AI, genomic
and robotic tools implementation in healthcare, so a lot is still to be developed to address the
ambitions and objectives of the various stakeholders.
1. Integration of predictive tools is required. It is unlikely that a one size fits all tool will be
developed for every disease area. However, a major concerted research effort (academia,
healthcare providers, national regulators and the private sector) would bring benefits in creating
more effective predictive tools and would provide a clearer framework.
11
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
3. Regulation relating to data issues needs to take a pro-active stance and advance
faster. The idea of improving national regulatory frameworks to enhance pre-market
development and post-market performance assessment from FDA represents an innovative and
fundamental example of how regulators and company can try to create a framework that can
speed up the R&D and improve safety. Inter-governmental organizations, companies and other
national regulators need to follow this still on-going process to make progress on this issue.
Positive examples exist from individual health insurers who have resolved data issues, including
obtaining prospective consent from members on data usage.
4. Creation of common platforms that could help enhance data-pooling and the
predictive power across settings. Cross-country collaboration or collaboration across settings
could have beneficial effects (e.g., Inf-Act). Private sector initiatives could help advance
technology development and methods, although benefits also need to be more widely diffused.
12
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Abstract
Background: The recent development of and access to large samples of digital data, together
with increasing research and adoption of technological tools (such as predictive analytics
techniques), are constantly changing healthcare at all the stages of medical practice.
Objective: To develop a taxonomy of predictive tools used for diagnosis or for the evaluation
of disease progression and health outcomes using big data sources.
Methods: A scoping review was performed to identify relevant peer-reviewed and grey
literature. The research design was guided by the following hypothesis: predictive modelling can
proactively identify patients who are at highest risk of poor health outcomes and will benefit
most from intervention. Articles and grey literature were included in the review if they provided
evidence in favour of or in opposition to the hypothesis. Development and current use of each
technique, tool and data source is discussed and analysed based on the collected information.
Also, 6 case studies related to research and regulation from governments, academia and
companies are reported and discussed in a separate section.
Results: The review included 198 studies, which were categorized by predictive tool type,
disease area, clinical treatment outcome and disease stage. A taxonomy of predictive
techniques, tools, and big data sources was created with classification based on key features.
The review identified 7 predictive tools categories (i.e. 1) scoring systems; 2) risk index and risk
scores; 3) staging and grading systems; 4) algorithms; 5) modelling; 6) machine learning; 7)
deep learning). Each tool’s development and performance has been analysed based on the
included literature. Also, the review identified 8 challenges areas related to the further
development and implementation of predictive tools: 1) Predictive tools external validation and
data quality; 2) Governance and regulation; 3) Data infrastructure, exchange and
interoperability; 4) Healthcare workforce education and adaptation; 5) Predictive tools and
healthcare financing; 6) Data privacy and ethics; 7) Ethical challenges; 8) Patient safety.
Conclusion: Most predictive analytics report good performance levels in improving treatment
management and in forecasting health outcomes. It is expected that their predictive value will
increase with new technology advancements and further availability of big data. In order to
realise of the full potential of predictive analytics in healthcare, challenges around regulation,
data quality, infrastructure, exchange and interoperability, data privacy, health workforce
education and patient safety will need to be overcome.
Keywords: Predictive analytics; predictive techniques; predictive tools; big data analytics;
artificial intelligence
13
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
1. Background
In recent years, big data analytics capabilities in healthcare organisations have developed
significantly, leading to a switch from basic descriptive analytics to predictive analytics (Galetsi
and Katsaliaki, 2018). The innovation of predictive analytics lies in the use of statistical methods
to identify predictive patterns, while descriptive analytics tries to explain already existing
processes and functions (ibidem). Predictive analytics allow all healthcare stakeholders,
including clinicians, patients, administrative staff, health policy decision makers and financial
experts, to more accurately foresee potential events and optimise decision-making. The
importance of being able to predict events is most clearly seen in the realms of intensive care,
surgery, emergency care, and pharmaceutical use. The correct positioning of pharmaceutical
products, amongst other types of intervention, within a patient’s disease pathway can have
considerable effects on patient outcomes. An accurate predictive pathway for the disease and
fine-tuned sense of when something is going wrong can help optimise patient outcomes.
Provider and payer organizations can apply predictive analytics tools to help address a range of
challenges (financial, administrative, and healthcare provision). Successful implementation can
improve health outcomes, efficiency in resource allocation, health system financial sustainability
and user/patient satisfaction. Countries have started setting national plans and strategy to
encourage development, implementation, and harmonisation of these new technologies and big
data sources. Big data generation initiatives in healthcare are increasingly being promoted, such
as the Cancer Genome Atlas (TCGA), Pan-Cancer Analysis of Whole Genomes (PCAWG), and
neuropsychiatric diseases (PsychENCODE) (Agrawal and Prabakaran, 2020). The UK launched a
personalised health and care 2020 strategy, with the goal of explaining how new technologies
and new sources of data will be used to develop personalized treatments (NHS, 2020). At an
international level, the WHO released a global digital health strategy for the 2020-2025 period,
with the goal of harmonising the uptake of digital health infrastructure across countries, and to
coordinate innovation, knowledge transfer and vision on the topic (WHO, 2020).
In light of the above, the objective of this research is to analyse existing predictive analytics
tools and data infrastructure, in order to: a) develop a taxonomy of the available existing tools;
b) assess the strengths and weaknesses of different types of tools; and c) identify ability to
detect high-risk patient groups. This research is performed with the view of producing
recommendations for the improvement of potential future models, for setting up adequate
systems and to enable optimization of outcomes. Having conducted a selective literature review,
in the following section, we outline a number of areas where predictive algorithms and analytics
have been applied and define potentially actionable hypotheses to be tested.
How are healthcare organizations deploying predictive capabilities to extract actionable, forward-
looking insights from their growing data assets? Based on a selective literature review, we
investigate how predictive algorithms can accurately and reliably predict health outcomes, to
improve disease management and population health.
14
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Across all reimbursement models, the identification, stratification, and management of high-risk
patients is central to improving quality and cost outcomes. Organizations that can identify
individuals with elevated risks of developing chronic conditions as early in disease progression
as possible have the best chance of helping patients avoid long-term health problems that are
costly and difficult to treat. Creating predictive tools based on lab testing, biometric data, claims
data, patient-generated health data, and the social determinants of health can give healthcare
providers insight into which individuals might benefit from enhanced services or wellness
activities.
The actionable hypothesis to study in this context is: The use of predictive modelling to
proactively identify patients who are at highest risk of poor health outcomes and will benefit
most from early intervention improves patient outcomes and results in a more efficient resource
allocation.
15
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
2. Methods
2.1. Hypothesis
In light of the area of interest reported we want to test the following hypothesis:
• The use of predictive modelling to proactively identify patients who are at highest risk of
poor health outcomes and will benefit most from intervention (also assuming that this
intervention is happening early) is one solution believed to improve an efficient resource
allocation and patient outcomes.
Based on this hypothesis, the research question is the following: are there predictive tools
enabling early identification of high-risk patients? The over-arching hypothesis/theme, therefore,
is as follows: implementation of predictive algorithms/analytics and/or artificial intelligence in
health care can support population health management, predict and improve health outcomes,
optimise care delivery, develop precision medicine and new therapies, help structure value-
based agreements between payers and suppliers, reduce unnecessary expenditure and improve
efficiency in resource allocation across the value-based care continuum.
A scoping review has been conducted in order to identify materials, reports and case studies
providing evidence in favour of or in opposition to the research hypothesis. This enables
identification of the volume of evidence available on the implementation of predictive
algorithms/predictive analytics/artificial intelligence/machine learning in the areas identified
above.
The scoping review has been developed following the PRISMA guidelines for scoping reviews
(PRISMA, 2018). Notably, the goals of the research are to map the identified evidence by
therapeutic area and over time, to underline the advantages and disadvantages of the main
tools, and to assess their overall performance in terms of reliability and accuracy.
A search for peer-reviewed literature was performed on Ovid Medline in April 2019 with the
following search strategy:
Step Search
3 #1 and #2
16
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
The search included all the relevant publications from January 1992 to April 2019. This was
accompanied by manual searches for specific topics on Ovid Medline and other platforms, i.e.,
PubMed, Web of Science, Scopus, and grey literature databases using a similar search strategy.
Search results from Ovid Medline were exported to EndNote for title and abstract screening.
Selected studies were then supplemented with studies identified through manual searches. An
excel template was created to facilitate full text screening. The included studies were categorized
based on disease area of interest, type of predictive tool(s), clinical treatment outcome and
disease stage. The taxonomy of predictive tools reported in this research is based on the included
studies and does not intend to be a univocal classification of these tools.
17
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
3. Results
393 studies were identified through Ovid Medline. Of these, 198 were included after title and
abstract screening. 33 studies additional studies were identified through manual searches.
Results have been categorised by type of predictive tool(s) (Table 2).
Table 2 - Study results from the Ovid Medline search categorised per predictive tool type, disease
area, clinical treatment outcome and disease stage
75 Prevention/ 59
Oncology 73 Surgery
Primary
Haematology 6 12
Infectious 6
Generic studies 14
The table above summarises the number of included studies by disease area, clinical treatment
outcome and disease stage. Publication dates range from 1992 to 2019. Oncology and
cardiovascular diseases are the most frequently investigated disease areas. The most frequently
studied clinical treatment outcomes for predictive tools are surgery and survival, covering more
than half of the included studies. The most frequent disease stage studied in the context of
predictive tools was later stage disease involving secondary care (hospital care setting).
The most common outcome measure for assessing predictive tools performance used in the
literature is the AUC-ROC curve, which is a binary metric that assess the discriminative ability
of a tool. It ranges from 0.5, where there is no discriminative ability, to 1, where there is perfect
discrimination (Cantor et al. 2000). A tool with a higher AUC score has better performance in
predicting a health outcome.
18
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Predictive analytics are defined as methods of analysis adopted to face and manage the
challenges related to big data sources (Hernandez et al. 2017). These methods use past and
current data with the goal of predicting future or unknown events.
Predictive analytics have evolved considerably over the past 50 years. Scoring, staging and
grading systems emerged in the 1970s and represent the first types of predictive techniques.
More sophisticated techniques have increasingly been developed from the 2000s onwards to
improve the use of big data and enhance prediction accuracy. This includes predictive modelling
(e.g. logistic regression), mathematical methods (e.g. nomograms) and Artificial Intelligence
(AI), through the use of machine learning or data mining.
19
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Table 3 - Predictive techniques, tools and big data sources lists. Numbers in the predictive tools’
column reflect the number of articles that are related to each tool.
17 Personal Genome
Modelling Decision Trees Nomograms
Services
24 Random Smartphone
Algorithms Deep Belief Network
Forests applications
Based on the 198 included studies, this scoping review identified 7 predictive tools categories
(1. Scoring systems; 2. Risk index/scores; 3. Staging/Grading systems; 4. Algorithms; 5.
Modelling (i.e. single tools); 6. Machine Learning; 7. Deep Learning). These tools can be
developed with the use of different techniques. Scoring systems are the most investigated
predictive tools, particularly in the 1990s and early 2000s; 34 of the 48 included studies
published between 1992 and 2009 are related to scoring systems. Predictive tools can be also
categorised as supervised or unsupervised tools. Supervised tools use a known set of input data
to generate a specific predicted output. Unsupervised tools are adopted when there is not a
specific output and involves exploring input data to find potential unknown correlations.
This review also identified the following predictive techniques: 1. Algorithms; 2. Association rules
learning; 3. Convolutional Neural Networks; Decision Trees; 4. Deep Belief Network; 5. Deep
Neural Network; 6. Hazard models (e.g. Cox proportional hazard model); 7. Linear/Logistic
regression; 8. Naïve Bayes; 9. Neural Networks; 10. Nomograms; 11. Random Forests. Each
category of predictive tool type may encompass a range of different predictive techniques. The
following sections of the results will group the included literature based on the identified
predictive tools categories.
Big data can be referred to as large and complex databases with a varied and complex structure
(Sagiroglu et al. 2013). These datasets are characterized by high variable specificity for each
endpoint, by long observation timelines, and by data originating from many sources. In
healthcare, this data is mainly collected via Electronic Health Records (EHR), administrative
claims, clinical trial data, genomic services, social media, and by personal common tools such
as smartphone applications and wearable devices. Big data datasets have 3 peculiarities: large
sample sizes, high heterogeneity and high dimensionality (i.e. many variables per each endpoint)
(Hernandez et al. 2017). However, big data evolution and adoption generates challenges for
predictive analysis. Variables errors can accumulate from various sources leading to noise
accumulation and poor predictions or classifications. Even partially-biased sources can contribute
to noise accumulation. Another issue that requires human monitoring and correction is spurious
correlation. Unsupervised predictive tools in particular have the potential to show high
20
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
correlation between variables that are not actually correlated, leading to wrong inference and
false predictions.
Scoring systems (SS) are among the first predictive tools implemented, initially developed to
analyse clinically relevant surrogate outcome measures in intensive care units, in order to
evaluate the effectiveness of treatment practices (Rapsang et al. 2014). The first scoring
systems were created at the end of the 1960s for trauma patients. (Gunning et al. 1999).
Predictive tool
Scoring systems
Algorithms
Decision Trees
Linear/Logistic regression
Naïve Bayes
Neural Networks
Nomograms
Random forests
SS are made up of two components: the score which represents disease severity, and the
probability model that can match groups of patients and make a quantitative comparison
analysis.
Logistic regression was initially adopted to create models looking at probability of death. The
ideal probability model should be based on three factors: validity, calibration and discrimination
(Rapsang et al. 2014), Validity refers to the quality of the model performance, based on a test
assessment. Calibration is related to the how accurate the model is. An example of calibration
could be to assess the gap between the actual mortality and the probability of mortality
estimated by the model. Discrimination refers to the ability to distinguish between dead and
alive patients, based on the model estimation. A good discrimination assessment can be
measured with metrics such as "sensitivity, specificity, false positive rate, false negative rate,
21
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
positive predictive power, misclassification rate, area under the receiver operating characteristic
curve and concordance" (Champion 2002).
Scoring systems
Clinical management
The first SS adopted were specific anatomical models. Specific SS look to analyse only a certain
group of patients, while generic SS aim to provide generalised results. Anatomical SS evaluate
the extent of injury, providing fixed results, and are depending on an accurate measurement or
description of the disease, while physiological SS are stemming from observation and
measurement of vital signs, and are looking at the impact of injury on function, leading to results
that vary as the response to injury changes.
The roles of SS include comparing predicted and actual health outcomes (comparative audit),
improving observational or non-randomised or RCTs datasets (evaluative research) and
providing support for healthcare professionals’ decision-making process (clinical management).
As SS are amongst the most traditional available predictive tools, there is a vast number of
studies that explain their development and provide an assessment of their effectiveness. One of
the most widely adopted SS is APACHE (Acute Physiology And Chronic Health Evaluation). The
first version was developed in 1981 and was revised a number of times, leading to the
introduction of APACHE IV in 2006 (Zimmerman et al. 2006). This SS has been used in different
diseases areas, including cardiovascular disease and oncology (Hu et al. 2013), and is generally
utilised to predict clinical outcomes such as survival rate and length of hospitalisation. Many
studies have compared APACHE with other SS. Hu et al. compared APACHE with MELD (Model
End-stage Liver Diseases) to predict the risk of mortality after orthotopic liver transplantation,
highlighting how the former SS showed a higher prognostic value (APACHE area under the curve
(AUC) was 0.937 while MELD AUC was 0.694). Another study compared APACHE IV with SOFA
(Sequential Organ Failure Assessment) and SAPS II (Simplified Acute Physiology Score) to
predict short-term mortality in patients with acute myocarditis. SAPS II had a slightly higher
prognostic value (AUC: SOFA 0.920, APACHE IV 0.934, SAPS II 0.942) (Hu et al. 2013). Another
study focusing on cardiovascular diseases assessed the NCDR-RESCUE (Real-World Estimator of
Survival in Catheterized STEMI Patients Following Unsuccessful Earlier Fibrinolysis) scoring
system and reported that this SS can successfully be used to assess the risk of mortality after
percutaneous coronary intervention (Burjonroppa et al. 2011).
Together with cardiovascular diseases, oncology is the main area in which SS are developed and
applied. Prostascore, a prognostic model, allows providers to predict health outcomes of patients
with advanced prostate cancer that have to decide what therapeutic path to choose (e.g.
chemotherapy or surgery) (Abdel-Rahman et al. 2017). This is just one of the SS available for
this specific disease, and it is hard to assess what is the most effective predictive tool. Issues
relating to the depth and width of data, study follow-ups and generalisability limit the external
22
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
validity of prostascore. The American Joint Committee on Cancer (AJCC), which proposed a
staging system for the same disease, has conducted comparisons of the available predictive
tools. The AJCC has reported limitations in generalisability of most SS, as they are frequently
adopted for small cohorts of patients in specific areas with unique socio-economic features.
The AAAP scoring system is used to predict overall survival rates of patients with unresectable
metastatic colon cancer that incurred a primary tumour resection. A study tested the prognostic
scoring system based on four clinical risk factors (age, alkaline phosphatase (ALP), ascites, and
platelet/lymphocyte ratio (PLR) on a cohort of 110 patients, divided in three risk groups (low,
medium and high risk). The overall survival rate varied significantly across risk groups (low risk:
57.1%; medium risk: 10.7%; high risk: 0.0%. P < 0.001). This prognostic scoring system has
proven to be a reliable tool for, outcome prediction of primary tumor resection and so to help
providers and helps patients with metastatic colon cancer in choosing the best treatment path.
One study examines a SS that predicts health outcomes of patients with Crohn’s diseases taking
vedolizumab (Dulai et al. 2018). The SS was able to identify patients in clinical remission after
vedolizumab therapy with an AUC of 0.67 (92% sensitivity), patients with mucosal healing with
a AUC of 0.72 (98% sensitivity), patients in corticosteroid-free remission with an AUC of 0.66
(94% sensitivity), patients with both mucosal healing and clinical remission with an AUC of 0.75
(100% sensitivity), and patients with corticosteroid-free clinical remission with mucosal healing
with an AUC of 0.75 (100% sensitivity). Another SS for Crohn's disease, the PROSPECT model,
has been developed through univariable and multivariable Cox's proportional hazards model to
build a web-based tool for providers and patients to help in predicting the risk of contracting
Crohn's disease, based on genetic, clinical and serologic variables (Siegel et al. 2016). 243
patient were involved in the validation study to assess the web-based tool, and the model has
proven to be reliable in predicting Crohn's disease complication over time. The model was also
tested for external validity on two cohorts (adults and paediatric patients), which reported a
concordance index of 0.73 and 0.75 respectively. A strength of the PROSPECT model is the
generation of individualised risk prediction based on accessible and easy-to-collect data.
Another study developed a nomogram and online tool to predict postoperative bowel dysfunction
severity in patients that received a restorative anterior resection for rectal cancer, based on an
international patient-reported outcome measure, LARS (Low Anterior Resection Syndrome)
(Battersby et al. 2018). The tool, POLARS (Pre-Operative LARS) has been tested on two different
national datasets of patients that have to undergo a restorative anterior resection, in terms of
capacity to predict long-term bowel dysfunction (mean LARS scores of 26 and 24 with a standard
deviation of 11 in the two cohorts). The study also assessed how some factors (e.g. age or sex)
can relate to diseases progression, but was unable to control other factors such as socioeconomic
status, comorbidities, social support, and self-management. The European Society of
Coloproctology also reports an overview of studies (8 in the last update in January 2018) in
different national settings that tested and validated score systems (European Society of
Coloproctology, 2019), with the aim of harmonising research on SS for colorectal cancer across
different settings.
Overall, early SS were limited to making comparisons between observed and predicted health
outcomes within a small subset of patients, while more advanced SS rely on the larger datasets
to assist healthcare providers in care and treatment choices. Based on the evidence collected,
there is still considerable room for improvement in the ability of SS to manage and leverage big
data. Scoring systems are by far the most investigated predictive techniques in this scoping
review. These however do not represent the latest predictive analytics techniques developed,
and a comparison with other types of predictive tools is needed in order to identify the optimal
use of SS.
23
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Staging and grading systems are usually pooled in the same category of scoring systems. These
tools are mostly adopted for cancer diagnosis as decision-making support tool for clinicians or
for patients and as classification criterion in clinical trials.
Predictive tool
Staging/grading system
Algorithms
Decision Trees
Linear/Logistic regression
Naïve Bayes
Neural Networks
Nomograms
Random forests
Few studies included in the scoping review addressed grading and staging systems. Focus was
limited to evaluation of individual systems or a small number of systems in highly specific disease
areas. No evidence on the broader rationale of these systems was identified. Grading and staging
systems are generally related to disease severity. The most adopted staging system in US is the
TNM system (primary tumor (T), regional lymph nodes (N) and distant metastases (M)), which
groups three disease features together in one staging system. The Lugano and the Ann Arbor
systems were compared to TNM in predicting the overall survival of patients with primary
gastrointestinal lymphoma (PGL) (Chang et al 2015). TNM has the best performance in predicting
5-year overall survival rates in aggressive and indolent PGL (TNM stages: I 100%, II 87.18%,
III 75.17% and IV 16.69% p<0.0001) compared to Lugano (stages: I: 100%, II 80%, IIE
64.96%, IV 49.90%) and Ann Arbor (IE 95.83%, IIE 55.34%, 66.67%, IV 0%).
Another study, integrated the TNM system with a gene signature analysis to predict tumor
relapse within 3 years for patients with colorectal cancer (Peng et al 2010.). The integrated
model has proven to be more effective than a predictive tool utilising only TNM (AUC of 0.664
vs 0.647). Also, survival analysis showed that the 3 years relapse free survival was 100% in low
risk, 74% in medium risk and 52.4% in high-risk groups. The development of big data availability
could help to create integrated systems of predictive tools including staging systems (Edge et
al. 2010).
24
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Risk stratification tools (risk indexes and risk scores) are also often pooled in the same category
as scoring systems. Risk models assess the individual patient risk by adapting individual patient
data into a multivariable risk prediction model (Moonesinghe et al. 2013).
Predictive tool
Risk index/scores
Algorithms
Decision Trees
Linear/Logistic regression
Naïve Bayes
Neural Networks
Nomograms
Random forests
Cardiovascular and oncological diseases are the common disease areas where risk indexes were
implemented. One of the most adopted tools in this category is the Framingham risk score. This
tool builds on a long history of research that started at the end of the 1940s. The Framingham
Heart Study was a long-term investigation which aimed to improve preventive and treatment
research for cardiovascular diseases (Mahmood et al. 2014). The Framingham risk score was
first published in 1998, and is widely utilised to predict the risk of incurring cardiovascular
diseases. It is also utilised to assess the impact of cardiovascular risk factors on other diseases,
such as multiple sclerosis (Moccia et al. 2015). Another study uses the Framingham risk score
to explore the link between breast cancer could and cardiovascular diseases, showing that
women with breast cancer have a 1.77 times higher risk of contracting cardiovascular diseases
than women who have never had breast cancer (Geernat et al 2018). One research team
integrated a 70-gene signature, a clinical tool, to different risk prediction algorithms, to predict
outcomes in early stages of breast cancers (Drukker et al. 2014). PREDICT integrated with 70-
gene signature (AUC: 0.662) was the best predictive tool compared to AOL, NPI, St. Gallen, CBO
and NABON. The authors report that integration of 70-gene signature and risk prediction
algorithms can improve risk estimation and help providers improve management of early stage
breast cancer.
Another study assessed PREDICT 2.0 as a prognostic tool in 8834 breast cancer patients. The
tool reported an AUC of 0.80 for 5-year overall survival (OS) and an AUC of 0.78 for 10-year OS
25
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
(Van Maaren et al. 2017). A subgroup analysis of the cohort was performed based on age and
on oestrogen receptor subtype (ER). The tool was less accurate in some subgroups (patients
older than 75 years and ER negative patients) but was reported to be an overall reliable
predictive tool. Despite promising results, low adoption of risk indexes remains an issue. Lack of
use can be caused by poor clinician awareness, limited evidence on the robustness, and by
concerns over tool complexity and accuracy (Moonesinghe et al. 2013). Other examples of risk
stratification tools implemented in broader contexts include four programmes implemented in
England, Wales and Scotland to reduce emergency hospitalisation rates: PARR (Patients-at-risk-
of-hospitalization) and CPM (Combined Predictive Model) in England; PRISM (Predictive Risk
Stratification Model) in Wales and SPARRA (Scottish patients at risk of readmission and
admission in Scotland). These programmes were adopted between 2006 and 2010 and consisted
of risk stratification models based on linear and logistic regressions that could identify individuals
at high risk of hospitalisation (Hutchings et al. 2013).
Predictive modelling and algorithms are the most broadly defined category of predictive tools. A
wider range of tools and techniques are captured under these terms within the literature.
Table 8 - Predictive techniques that can be used to develop predictive tools based on algorithms
Predictive tool
Algorithms
Algorithms
Decision Trees
Linear/Logistic regression
Naïve Bayes
Neural Networks
Nomograms
Random forests
Predictive modelling techniques include both older techniques captured under scoring systems
and more recent ones related to AI. Generally, predictive modelling projects do not referring to
a specific analytics technique, but rather to a broader project that can involve more than one
tool (e.g. national programs for health prevention). Predictive modelling can be broadly analysed
in four ways: through the event that it is predicting; through the set of patient predictor variables
available; through the time frame considered to make a prediction; or through the type of
26
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
statistical technique adopted (Panattoni et al. 2011). The accuracy of predictive modelling
evidence is related to the patient predictor variables adopted, including socio-demographic;
diagnostic; prior utilisation or costs; pharmacy data; health status and functionality; clinical data
(Panattoni et al. 2011). Some literature categorises predictive analytics or prescriptive analytics
as predictive modelling tools. These tools focus mainly on cancer and cardiovascular diseases.
One study developed a mathematical method (Radial basis functions and particle swarm
optimization RBF-PSO) to predict the final height of patients with growth hormome deficiency
(Migliaretti et al. 2018). The tool was found to be reliable in predicting final patient’s height.
Another study examined if clinical PWV score (pulse wave velocity) can be a prognostic tool for
detecting major adverse cardiac events (MACE) in patients after percutaneous coronary
intervention (Chen et al. 2015). The tool was reliable in predicting 3-year MACE (AUC 0.72). A
comparative study focusing on breast cancer assesses different mathematical methods (Logistic
regression, decision trees, and random forests) to identify the best predictive tool for detecting
adverse events (Lindsay et al 2019). The study reports that ensemble methods (random forests)
are more effective than single-model methods (decision trees, logistic regressions). Ensemble
methods had an average AUC of 0.053 vs single-model methods AUC of 0.034.
Another study adopted a model based on logistic regression to predict which children affected
by asthma can be treated with inhaled corticosteroids (ICS) (Wu et al. 2017). The scaled Brier
score was used to evaluate the overall prognostic value of model, while AUC curve was used to
assess the model's predictive responsiveness. Tool validation was performed on a cohort of 158
children, reporting an AUC of 0.763 and a Brier score of 0.23 (where zero is no prediction and 1
indicates perfect prediction). The study provides an example of how specific techniques can be
used to develop a model which is difficult to classify within a taxonomy.
Algorithms for predictive tools development and implementation can be utilised as single
predictive techniques and can be both implemented in non-AI and AI tools. Algorithms
development and implementation involves 5 stages: 1) acquiring data; 2) building and validating
the model; 3) applying in a real-world setting; 4) testing it in practice and 5) scaling the model
to generalize implications (Amarasingham et al. 2014). Recent examples of algorithm utilisation
as predictive tool can be found in Martinez-Gimenez et al. (2018), Andres et al. (2018), and Zhu
et al. (2018). The first study investigated how algorithms could predict treatment modalities
based on temperature differences in burn wounds, analysed with thermographic scans (Martinez-
Gimenez et al. 2018). The algorithm was reliable, correctly predicting the best treatment option
with an accuracy of 85.35%. The second study developed a software tool, PSSP, based on a
learning algorithm, to predict individual survival after liver transplantation for primary sclerosing
cholangitis (Andres et al. 2018). The authors also developed an evaluation measure called D-
calibration, to assess tool effectiveness. PSSP is a reliable tool in estimating the survival
probability over time. The study also compares this algorithm-based tool with risk scores and
other models, such as the Cox proportional hazard model, arguing that algorithm-based tools
are more effective for screening tasks and more accurate in prospective cohort analysis. The
final study developed an algorithm-based tool (ALR, ALP-to-lymphocyte ratio) that predicts
survival and microvascular invasion in patients with hepatocellular carcinoma (Zhu et al 2018).
Based on a cohort of 165 patients, ALR had an AUC of 0.73 in predicting microvascular invasion,
had the highest accuracy when compared with three other tools (PLR, AUC: 0.632; APRI, AUC:
0.554; Fib-4, AUC: 0.572). Also, ALR proved to be a reliable independent predictor of survival
for patients with hepatocellular carcinoma.
27
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
In the last few years scientific literature has increasingly focused on AI tools for predictive
analysis in health (Jiang et al. 2017). The two predominant types of AI are machine learning and
natural learning processes (Figure 2).
AI analytics rely on algorithms to create tools that can “learn” features from larger healthcare
datasets, while older predictive tools involve a process where the input, output and outcome are
specifically set by humans (Panesar 2010). The two main AI subgroups, ML and NLP, through
the use of algorithms, are trained from extensive volumes of datasets to find associations
between subject features and outcomes of interests (Jiang et al. 2017). There is not a rigid
threshold that can distinguish what tools can be considered AI or not. Rather, tools can be
positioned on a spectrum in terms of level of human specification vs level of learned features
from available data. (Beam et al. 2017). AI tools tend to have less the human involvement and
more independent learning from data. Machine learning is the broadest AI subgroup and includes
supervised ML, unsupervised ML and dep learning. NLP involves converting non-machine-
readable information into a language that can be understood by AI tools (i.e. the extraction of
information of unstructured data, such as clinical notes or medical journals contents) (Jiang et
al. 2017). Overall NLP has an ancillary role for ML proper functioning.
28
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Table 9 - Predictive techniques that can be used to develop Machin Learning predictive tools
Predictive tool
Algorithms
Decision Trees
Linear/Logistic regression
Naïve Bayes
Neural Networks
Nomograms
Random forests
The older predictive analytics tools (scoring, grading systems, nomograms, risk index) are all
considered supervised techniques, and do not fit or only very marginally fit within the AI
spectrum. A general advancement of machine learning is that it can handle greater volumes of
data and has a tendency to produce more generalizable results through both supervised ML or
unsupervised ML tools. Classic machine learning techniques are supervised. Common supervised
ML tools include decision trees, association rules learning, linear and logistic regression, naïve
Bayes, random forests, discriminant analysis, support vector machine (SVM) and neural network
(Jiang et al. 2017; Gianfrancesco et al. 2018). SVM and neural networks are the most frequently
used supervised ML techniques and rely on imaging, genetic and electrophysiological data (Jiang
et al. 2017). These tools are categorised as supervised because researchers must introduce input
data and a specific set of outputs or outcomes of interest. The aim is to infer ex-ante the
probability of a specific outcome based on a clustered dataset (patients’ traits). Generally inputs
are composed of baseline data (i.e. patients’ age, gender, disease history) and the health
outcomes are disease indicators, survival times, and quantitative disease levels (Jiang et al.
2017).
Unsupervised ML techniques don’t include any outcome of interest in their algorithm. The
rationale is to use a tool that can learn features from data and autonomously infer associations
between similar groups of subjects. The most common unsupervised techniques adopted as
predictive tools are clustering and principal component analysis (PCA).
29
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Table 10 - Predictive techniques that can be adopted to develop deep learning predictive tools
Predictive tool
Deep Learning
Algorithms
Decision Trees
Linear/Logistic regression
Naïve Bayes
Neural Networks
Nomograms
Random forests
Deep learning (DL) can be considered as a statistical extension of classical supervised and
unsupervised ML techniques. In particular, they extend the capacity of classical neural networks
by developing more layers of representation from which a learning algorithm can discover and
develop new patterns (Esteva et al. 2019). DL occurs without supervision techniques (without
including specific outcomes of interest). The automatic composition of multiple layers allows DL
tools to handle even greater volumes of data compared to a classic machine learning technique.
Unsupervised and DL techniques are the most powerful tools when there is the need to reduce
data dimensionality and to identify unknown subgroups (Jiang et al. 2017). The most commonly
used deep learning technique is convolution neural network (CNN), which is mainly used to
handle and reduce high dimensionality in imaging data (Lecun et al. 2009). Other DL techniques
for predictive analytics development are recurrent neural network, deep belief network and deep
neural network. Deep learning techniques can be integrated with NLP to create automatic tools
that can constantly generate sources of raw data, clean data and use it for the required purpose.
As for the previous discussed tools, cancer and cardiovascular diseases are the most common
disease areas of interest for AI application. These recently developed techniques can greatly
support providers in prescriptive decisions (diagnosis and treatment) and in predicting risk in
health outcomes. In some cases, ML tools have exceeded provider’s ability in prediction or
prescription decision-making. One study focused on DL use cardiovascular disease utilised
artificial neural networks (multilayer perceptron, MLP, and radial basis function networks, RBF)
and Bayesian networks to assess tool accuracy, sensitivity and specificity in the prediction of
hospital mortality in patients with abdominal aortic aneurysm (Monsalve-Torra et al. 2016). ANN
tools have the highest overall accuracy (95.1% for MLP and 92.9% for RBF) but have low
sensitivity rates (MPL: 65.5%, RBF: 69.5%), while Bayesian networks had a sensitivity of 86.8%.
A combination of all three methods led to a higher sensitivity (87%). Overall, there is still no
agreement over the best predictive tool or technique for this disease. While the authors report
that Bayesian network algorithms is the technique with the best overall results, other studies
30
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
claim that ANN and multiple regression are the best DL predictive techniques in the context of
this disease area.
Artificial Neural Networks have been also used in the context of kidney diseases. One study
developed an ANN system to predict postoperative outcomes after percutaneous
nephrolithotomy (PCNL). Based on a cohort of 254 patients, the system was able to accurately
predict stone free rates complications (AUC: 0.861), with predictive accuracy and sensitivity of
postoperative variables between 81% and 98.2% (Aminsharifi et al. 2017). Previous research
suggests that this machine learning system has a prognostic accuracy at least as accurate as
previously implemented statistical models (eg. regression analyses) in this disease area
(Aminsharifi et al. 2017). Despite this, there is still no comparison of different tools in the
literature for prognostic accuracy.
An artificial neural network and SVM based tool was developed to predict brain arteriovenous
malformation caused by a surgical technique with a 97.5% accuracy. This was a considerable
improvement over standard regression analysis in the same setting which yielded an accuracy
of 43% (Asadi et al. 2014). Another study adopted logistic regression to successfully predict the
outcome of a 3-month treatment after a stroke (Zhang 2013). In the context of cancer research,
IBM developed Watson, which assembles ML and NLP techniques to assist providers in treatment
decision. Watson suggestions matched provider decisions in 99% of cases. This tool also
supports clinical research through identification of genetic associations in different types of
cancer (Jiang et al. 2017).
31
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
4. Case studies
The UK NHS in UK is actively promoting research and implementation of predictive tools for both
patients and providers. By 2040, more than 80% of the NHS workforce will have to be able to
understand and manage genomic, AI and robotics tools (Topol 2019). The ability to read
genomes, use speech recognition and NLP programmes and manage predictive analytics tools
will be a critical component of healthcare delivery. Currently NICE has approved the following
data-derived tools used for population risk stratification within the NHS: QCancer, adopted to
calculate the absolute risk of a patient having an undiagnosed cancer; QRisk, which evaluates a
patient’s risk of incurring in cardiovascular diseases over their life; electronic frailty index (eFI)
scores, that predicts primary care adverse outcomes risks based on a patient’s underlying
vulnerability; and QAdmissions, which predicts the risk of an emergency hospital admission.
Many other projects at a local level are promoted. Royal Free NHS Trust and Google DeepMind
are in partnership for a project on real-time EHR data, to develop apps for healthcare
professionals that can predict patient deterioration. In Berkshire (Connected Care in Berkshire),
17 health and social care organisations are sharing EHR records to promote data exchange and
to enhance predictive tools management within local hospitals.
NHS digital in England represents an exciting example of how a national healthcare provider and
insurer can set up and support a project for AI and predictive tools implementation. NHS digital
has aspired to create sustainable infrastructures in local trusts and to promote development at
all levels of the organization, from national management organization to local trusts, providing
a good example of how health insurers and providers can collaborate on a national level in the
context of digital health and predictive tools.
Flat-Iron Health was acquired by Roche in 2018. One of the main goals of this partnership is to
link new methods in gathering, analysing and providing continuous complex data (mainly
through the EHR design), to more effective and comprehensive AI predictive tools. These efforts
help to generate AI tools with enhanced prognostic effectiveness in order to better support
provider and researcher decision-making. FlatIron’s ambitious approach to data integration, with
a goal of having an automatic system that gathers millions patient data in a readable format for
AI tools, could be a ground-breaking development in the field of predictive analytics. While still
at an early stage, Flat Iron is a great example of a company adopting a long-term perspective.
Collaborations with intergovernmental organizations and policymakers could help to further
promote this initiative.
32
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
4.3. Inf-act.eu
The Inf-act project is an ambitious joint collaboration between European governmental and
research institutions to address fragmentation, lack of comprehensiveness and limited access to
health data in Europe. The Inf-act project consists of 10 working packages which aim to: a) Set
up a framework of the health information systems in Europe; b) provide guidance on how to
integrate these systems with national policies; c) review the reliability and volume of evidence
available on innovation in health information; d) determine the level of interoperability of
available health data; and e) provide recommendations on tools and methods to support health
information systems. The overarching objective of the 10 working packages is to create a unique
framework of business cases, analyses and proposals that provides a common and sustainable
research infrastructure, reduces health information inequalities and improves interoperability of
health information within the European Union. This collaboration plans to give political support
to national countries in developing and implementing best practices, build capacity within and
across countries and provide common health information tools. Inf-act, in collaboration with
European Research Infrastructure Consortium on Health Information for Research and Evidence-
based Policy (HIREP-ERIC), represents one of the most ambitious projects in predictive analytics
given a wide-ranging approach to creating harmonised and coordinated data infrastructure in
health information and data generation within Europe.The project was launched in 2018 and will
end in March 2021.
Big Data for Better Outcomes (BD4BO) is another European research programme involving
national governmental bodies, academia, research institutions and companies involved in the
pharmaceutical sector. The aim of this comprehensive programme is to foster the development
of platforms for big data, to enhance interoperability and improve the level of analysis in big
data. The programme has four disease specific projects: ROADMAP on Alzheimer’s disease;
Harmony on hematologic malignancies; PIONEER on prostate cancer and BigData@Heart on
cardiovascular diseases such as atrial fibrillation, acute coronary syndrome and heart failure.
The overall programme is managed by DO-IT, a coordination platform within the Innovative
Medicines Initiative 2, which is a joint programme promoted by the EU and European Federation
of Pharmaceutical Industries and Associations (EFPIA). This project sees the collaboration of a
large number or universities, national and local insurers and regulators and pharmaceutical
companies. It is a great example of creating a vast collaboration with key stakeholders and of
breadth in analysis, given a focus on multiple key disease areas. Results from this project can
lead to tangible progress in big data management and, by extension, in how it could be
implemented in AI tools.
Maccabi Health Services and TIPA Biobank represent another interesting case of how health
insurers’ organizations can deal with data collection and quality. In 2017, Maccabi established
the TIPA Biobank Research Initiative (Maccabi, 2019) with the goal of collecting biological
samples that can be used for research. The project strives to collect a comprehensive set of
biologic data linked to Maccabi Health Services, which is one of the main insurer organizations
in Israel. So far they have collected multiple samples from 2.5 million members across 350
different labs. This creates the possibility of using this data for longitudinal studies for a wide
range of clinical conditions. The TIPA Biobank can provide start-up companies with digital,
genetic and biological data to improve research and to help support the development of
validation of new tools, such as predictive analytics. The collaboration between a national
healthcare organization and local research centres improves the quantity and quality of data
collection, and represents a crucial first step for promoting the development of innovative and
33
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
complex predictive tools. The TIPA biobank serves as a case study which could enable an
effective subsequent collaboration on an international level to improve data exchange.
In 2019 the U.S. Food and Drug Administration (FDA) published a guide to evaluate AI tools and
a discussion paper called ‘Proposed regulatory framework for modifications to AI/ML based
software as Medical Device (SaMD)’ (FDA, 2019), that looks at how changing algorithms can be
more efficiently assessed in premarket development and postmarket performance assessment
(FDA) (FDA, 2019). The FDA is already a leading regulatory body in dealing with the innovations
brought by AI in healthcare. It currently proposes flexible and tailored premarket authorisations
for these tools (premarket clearance (510(k)) or De Novo classification).
Machine learning tools present challenges to regulators, as the scope and performance of the
algorithm is dynamic and changes as more data is analysed. The FDA discussion paper, called
on all the interested stakeholders to join and support a debate on proposed reforms in AI/ML
software regulation. In January 2021, the FDA published an Action Plan based on the discussion
paper and subsequent stakeholder discussion. The FDA outlined 5 actions: 1. Tailored regulatory
framework for AI/ML-based SaMD; 2. Good Machine Learning Practice (GMLP); 3. Patient-
Centered Approach Incorporating Transparency to Users; 4. Regulatory Science Methods Related
to Algorithm Bias & Robustness; 5. Real-World Performance (RWP).
The updated regulatory framework outlined in the action plan is based on the SaMD Pre-
Specifications (SPS) (that describes what aspects the manufacturer intends to change through
learning) and the Algorithm Change Protocol (ACP) (that reports how the algorithm will learn
and change) (FDA, 2021). The GMLP encourages harmonization of best practices in data
management, training, interpretability, documentation and evaluation. A patient-centered
approach aims to increase transparency for users by holding public workshops to discuss device
labelling features. Also, the FDA commits to support new methodologies for the evaluation and
improvement of machine learning algorithms, and to collaborate with stakeholders who are
piloting real-world performance processes for AI/ML-based SaMD (FDA, 2021).
34
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
5. Discussion
The results of this scoping review suggest that predictive tools are a useful resource for patients,
providers and insurers. To the authors’ knowledge this is the first study that has established a
taxonomy of predictive tools used to inform research and decision-making in healthcare. Most
of the predictive tools analysed in the included literature were shown to be useful ancillary tools
for healthcare providers and patients in predicting the risk of incurring in specific consequences
if undertaking certain treatments, or in predicting the overall survival rate in a cohort of patients.
Among the articles analysed through the Ovid Medline search, the vast majority of the studies
focus on older generation tools, mainly used for oncology and cardiovascular diseases. While AI
driven predictive tools show tremendous potential, they are a relatively recent development
implementation in healthcare thus far is limited. There is an on-going debate on how AI driven
tools can reshape healthcare management and on identifying the current challenges, but
relatively few articles were identified that validate the performance of AI driven tools.
In many disease areas, scoring systems or application of single techniques already provide
significant predictive value, potentially limiting the need for AI driven tools. Nevertheless,
researchers are increasingly looking at ways of integrating different techniques or tools, including
traditional ones, in order to maximise the prognostic performance. Innovation in the field of
predictive analytics remains complex. While researches aim to develop tools with high accuracy,
sensitivity, and precision, limitations are still present in terms the level data complexity that can
be managed by a tool. Both supervised and unsupervised predictive techniques have been
successfully developed and implemented, yet both approaches have strengths and weaknesses.
Further, it is still hard to generalise the validity of a specific technique or tool above others.
There is still no scientific consensus over which techniques are perform best in a specific disease
area or type of diagnosis/treatment and a broader attempt to compare all available predictive
tools or techniques in the literature is lacking. Fragmentation of interests from the stakeholder
community remains a barrier to developing consensus on predictive tools.
The scoping review provided only partial answers to our main hypothesis. While some evidence
is present on how predictive modelling can stratify patients based on the risk, very little is
available on how these tools could support the transition to value-based payments. One of the
few authors that links these two issues is Panesar (2019). In the Shifting from volume to value
chapter the author reports the case of Buurtzorg Neighbourhood Care in the Netherlands, as an
example of this paradigm shift. The simultaneous collection of behavioural, demographic, health,
and engagement data can provide an opportunity for machine learning and development of novel
AI, to rapidly improve, and learn from, user behaviour and outcomes. This in turn could enhance
the development of patient-centered and value-based systems. However, further research is
needed in assessing the potential links between predictive analytics tools and healthcare
financing issues.
With the currently available information it is not possible to rank all tools or techniques in a
unique classification. Nevertheless, a number of pros and cons of individual tools emerge from
the research and it is possible to make inferences about the use of one tool in lieu of most others.
Scoring systems are the most investigated tools not only because they were among the first
tools developed, but also because they are relatively easy to be set up, do not require large
amounts of data and can be easily understood by patients and healthcare providers. However,
due to their simplicity they cannot manage complex and large data sets and have frequently
faced limitations in demonstrating generalisability and external validity. The same applies for
other old-generation tools (i.e. risk scores, staging and grading system). As a result, current
ambitions for developing more accurate, generalizable and reliable tools are largely concentrated
on machine learning driven tools. Machine learning driven tools can handle an impressive amount
of data, can be developed as unsupervised tools, and have the capacity to adapt their analysis
as more data becomes available. Outstanding challenges in the context of predictive analytics
tools, particularly for the most recent analytics techniques, are related to governance, regulation,
35
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
data quality, data exchange and interoperability, privacy and ethics, health workforce education
and patient safety (He et al. 2019, Panch et al 2018, Parikh et al 2019).
Governance and regulation. As reported in the FDA case study, efforts are currently being
made by regulators to keep pace with the level of innovation in predictive tools and deliver
effective regulatory frameworks. A number of debates and proposal have emerged in the past
few years, particularly in the context of algorithm-based tools. One proposal outlines five criteria
to update the regulatory framework for algorithm-based predictive tools (Parikh et al. 2019).
The five criteria are related to the performance endpoints, benchmarks, interventions,
specification and audit mechanisms. The applicability of these criteria were demonstrated with
the WAVE Clinical platform, the first surveillance system to receive FDA clearance for clinical
practice (Parikh et al. 2019).
Data quality. Poor data quality directly reduces the accuracy of a model and limits
generalisability of results. Without accurate and correct data, predictive tools cannot perform
well. Issues in data quality arise from many factors, including the way data is collected (patients’
or providers’ biases), data readability for the model (NLP tools may offer a solution in this
context), or the timing/duration of data collection. Typically, models that attempt to predict
events further in the future have lower predictive accuracy given limitations in available data.
(Mukamel et al. 1997). A predictive tool generates more accurate results with a time frame of
less than one year compared to multiannual time frames. However, short time periods may be
less relevant for risk prediction or patient identification in some disease areas (Panattoni et al
2011). These are relevant limitations for predictive tools and can significantly hamper the
development of one-size-fits-all tool.
Data infrastructure, exchange and interoperability. The lack of free exchange of data
presents another limitation to the development of predictive tools. Currently, incentives are
lacking to promote a major and stable data exchange. This affects AI tools in particular, given
their need to be continuously fed with new data from clinical studies to learn and improve
performance (Jiang et al. 2017). Further debate on this issue is needed across all stakeholders
in order to improve accuracy and robustness of predictive techniques, specifically unsupervised
techniques. The development of national platforms to improve data collection (including the
reported cases in UK and Israel) bode well for future developments in data exchange at the
international level. Without coordinated efforts between national health insurers, healthcare
providers and patient organizations, it hard to foresee resolution of issues in data fragmentation
and interoperability. Examples of data aggregation amongst organisations are limited. Within
the USA, data aggregation in Intensive Care Units and Veterans Administration have helped to
accelerate AI development in healthcare (Panch et al. 2019).
Interoperability and data exchange also underlie the broader issue of data property, data
responsibility and utilisation. Data property rights composition can significantly impact the
process of promoting interoperability. Possible solutions to deal with data infrastructure issues
include: a) creating generalized data infrastructures based on already existing cases, such as
the STRIDES initiative promoted by the National Institute of Health (NIH, 2019) or the MIMIC
initiative from the Massachusetts Institute of Technology (Johnson et al. 2016); or b) to convince
all the healthcare companies, through legislation, to commercialize their clinical data in
accessible clouds (Panch et al. 2019). Despite continued efforts regulatory bodies and healthcare
organizations, data infrastructure is likely to remain a key barrier to promoting free exchange
and interoperability of data.
Data privacy. The use of individual patient data for personalised medicine presents another
challenge to regulators, providers and healthcare companies. Data privacy issues have always
been present, but the issue has come under increased scrutiny with the advent of machine
learning and big data sources. In many settings, data infrastructure legislation does not
adequately reflect the most recent developments in predictive analytics tools. The GDPR
(General Data Protection Regulation) in Europe and the California’s Consumer Privacy Act are
36
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
two good examples of effective data privacy regulation frameworks. Howeve high costs for
regulatory compliance could limit the growth of small organisations in this sector (Panch et al
2019).
Ethical challenges. Ethical challenges can arise when the suggested treatments from a
predictive tool conflict with a physician’s ethical obligations or a patient’s preferences. A
comprehensive conceptual framework of the legal and ethical challenges in managing predictive
analytics tools has been developed, including data collection, development, validation in real
world settings and final implementation, underlying how data infrastructure regulation and policy
decisions shape development and implementation of predictive tools (Cohen et al. 2014).
Health workforce education. According to the Watcher Review, all Trusts within the English
NHS, are expected to achieve a high level of digital maturity by 2023. Local Trusts will have to
be able to develop and manage infrastructures where new digital technologies will be
implemented. The Topol Review forecasts that by 2040 at least the 80% of the health workforce
will have to be able to understand and manage genomics and AI tools. As big data sources
accumulate, it will be a challenge to find good quality expertise in data analysis and science,
both in clinical organisation and in other organizations. Some argue that it will not be reasonable
to expect that physicians will be able to reach this level of understanding, but that it will be
inevitable that medical schools will have to provide informatics programs and adequate training
for the future student cohorts (He et al. 2019). Others stress how the developments of AI,
particularly in the area of diagnostic image analysis, will lead to a demise of radiologists and a
likely merging of into a single specialty called information specialist (Panch et al. 2018). This
new specialty would focus predominantly on managing AI tools results and tailoring them to
individual patients, rather than diagnostic image analysis.
Patient safety. Safety and efficacy of predictive analytics, particularly for the AI driven
technologies, requires frequent updating of regulatory frameworks. The FDA reforms in
regulatory frameworks for AI predictive tools present an excellent example of updating
regulatory standards for safety and efficacy assessments. In the US, predictive analytics fall
under the SaMD label, defined by the International Medical Device Regulators Forum (IMDFR).
This differentiation from other medical devices allows a more tailored and rapid premarketing
authorisation process. Regulation should balance the need for patient safety and efficacy with
facilitating quick access to new techniques.
37
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
6. Conclusion
Predictive analytics can be very useful tools for detecting health outcomes. Evidence shows that
most tools are accurate enough to help providers and patients with treatment management and
with forecasting health outcomes. Innovations in recent years suggest that they will be
increasingly important in the shift to personalised treatments. Little evidence is available in
assessing the relationship between predictive analytics tools and the transition to value-based
payments systems. Substantial increases in AI, genomic and robotic tools implementation in
healthcare are expected over the next two decades. Undoubtably, addressing the ambitions and
objectives of all stakeholders for implementation of predictive tools will require a coordinated
and collaborative international effort.
38
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
7. References
1. Abd-El-Gawad WM, Abou-Hashem RM, El Maraghy MO, Amin GE. The validity of Geriatric Nutrition Risk Index: simple
tool for prediction of nutritional-related complication of hospitalized elderly patients. Comparison with Mini Nutritional
Assessment. Clin Nutr. 2014 Dec;33(6):1108-16. doi: 10.1016/j.clnu.2013.12.005. Epub 2013 Dec 28. PMID:
24418116
2. Abdel-Rahman O. Prostascore: A Simplified Tool for Predicting Outcomes among Patients with Treatment-naive
Advanced Prostate Cancer. Clin Oncol (R Coll Radiol). 2017 Nov;29(11):732-738. doi: 10.1016/j.clon.2017.08.003.
Epub 2017 Sep 1. PMID: 28867136.
3. Adachi K, Kawase T, Yoshida K, Yazaki T, Onozuka S. ABC Surgical Risk Scale for skull base meningioma: a new
scoring system for predicting the extent of tumor removal and neurological outcome. Clinical article. J Neurosurg. 2009
Nov;111(5):1053-61. doi: 10.3171/2007.11.17446. PMID: 19119879.
4. Agrawal R. and Prabakaran S. (2020) Big data in digital healthcare: lessons learnt and recommendations for general
practice. Heredity, 124, 525-534. https://doi.org/10.1038/s41437-020-0303-2
5. Aguilar JA, Paley D, Paley J, Santpure S, Patel M, Herzenberg JE, Bhave A. Clinical validation of the multiplier method
for predicting limb length discrepancy and outcome of epiphysiodesis, part II. J Pediatr Orthop. 2005 Mar-
Apr;25(2):192-6. doi: 10.1097/01.bpo.0000150808.90052.7c. PMID: 15718900.
6. Akçay M, Tosun M, Gevher F, Kalkan S, Ersöz C, Kayalı Y, Tepeler A. Comparison of Scoring Systems in Predicting
Success of Percutaneous Nephrolithotomy. Balkan Med J. 2019 Jan 1;36(1):32-36. doi:
10.4274/balkanmedj.2017.1631. Epub 2018 Sep 11. PMID: 30203780; PMCID: PMC6335940.
7. Alessandrino G, Chevalier B, Lefèvre T, Sanguineti F, Garot P, Unterseeh T, Hovasse T, Morice MC, Louvard Y. A
Clinical and Angiographic Scoring System to Predict the Probability of Successful First-Attempt Percutaneous
Coronary Intervention in Patients With Total Chronic Coronary Occlusion. JACC Cardiovasc Interv. 2015
Oct;8(12):1540-8. doi: 10.1016/j.jcin.2015.07.009. PMID: 26493246.
8. Amarasingham, R., Patzer, R. E., Huesch, M., Nguyen, N. Q., & Xie, B. (2014). Implementing Electronic Health Care
Predictive Analytics: Considerations And Challenges. Health Affairs, 33(7), 1148–1154. doi:10.1377/hlthaff.2014.0352
9. Aminsharifi, A., Irani, D., Pooyesh, S., Parvin, H., Dehghani, S., Yousofi, K., … Zibaie, F. (2017). Artificial Neural
Network System to Predict the Postoperative Outcome of Percutaneous Nephrolithotomy. Journal of Endourology,
31(5), 461–467. doi:10.1089/end.2016.0791
10. Andrade-Souza YM, Zadeh G, Ramani M, Scora D, Tsao MN, Schwartz ML. Testing the radiosurgery-based
arteriovenous malformation score and the modified Spetzler-Martin grading system to predict radiosurgical outcome.
J Neurosurg. 2005 Oct;103(4):642-8. doi: 10.3171/jns.2005.103.4.0642. PMID: 16266046.
11. Andres, A., Montano-Loza, A., Greiner, R., Uhlich, M., Jin, P., Hoehn, B., … Kneteman, N. M. (2018). A novel learning
algorithm to predict individual survival after liver transplantation for primary sclerosing cholangitis. PLOS ONE, 13(3),
e0193523. doi:10.1371/journal.pone.0193523
12. Angus, L., et al. (2019) The genomic landscape of metastatic breast cancer highlights changes in mutation and
signature frequencies. Nat Genet 51, 1450–1458.
13. Arai T, Lefèvre T, Hayashida K, Watanabe Y, O'Connor SA, Hovasse T, Romano M, Garot P, Bouvier E, Chevalier B,
Morice MC. Usefulness of a Simple Clinical Risk Prediction Method, Modified ACEF Score, for Transcatheter Aortic
Valve Implantation. Circ J. 2015;79(7):1496-503. doi: 10.1253/circj.CJ-14-1242. Epub 2015 May 1. PMID: 25947002.
14. Arena R, Myers J, Abella J, Peberdy MA, Bensimhon D, Chase P, Guazzi M. The ventilatory classification system
effectively predicts hospitalization in patients with heart failure. J Cardiopulm Rehabil Prev. 2008 May-Jun;28(3):195-
8. doi: 10.1097/01.HCR.0000320071.89093.d6. PMID: 18496319.
15. Asadi, H., Dowling, R., Yan, B., & Mitchell, P. (2014). Machine Learning for Outcome Prediction of Acute Ischemic
Stroke Post Intra-Arterial Therapy. PLoS ONE, 9(2), e88225. doi:10.1371/journal.pone.0088225
16. Baas AF, Janssen KJ, Prinssen M, Buskens E, Blankensteijn JD. The Glasgow Aneurysm Score as a tool to predict
30-day and 2-year mortality in the patients from the Dutch Randomized Endovascular Aneurysm Management trial. J
Vasc Surg. 2008 Feb;47(2):277-81. doi: 10.1016/j.jvs.2007.10.018. PMID: 18241749.
17. Barlow T, Dunbar M, Sprowson A, Parsons N, Griffin D. Development of an outcome prediction tool for patients
considering a total knee replacement--the Knee Outcome Prediction Study (KOPS). BMC Musculoskelet Disord. 2014
Dec 23;15:451. doi: 10.1186/1471-2474-15-451. PMID: 25539734; PMCID: PMC4364581.
18. Barmettler A, Wang J, Heo M, Gladstone GJ. Upper Eyelid Blepharoplasty: A Novel Method to Predict and Improve
Outcomes. Aesthet Surg J. 2018 Oct 15;38(11):NP156-NP164. doi: 10.1093/asj/sjy167. PMID: 30007317.
19. Barrie A, Homburg R, McDowell G, Brown J, Kingsland C, Troup S. Examining the efficacy of six published time-lapse
imaging embryo selection algorithms to predict implantation to demonstrate the need for the development of specific,
in-house morphokinetic selection algorithms. Fertil Steril. 2017 Mar;107(3):613-621. doi:
10.1016/j.fertnstert.2016.11.014. Epub 2017 Jan 6. PMID: 28069186.
39
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
20. Battersby, N. J., Bouliotis, G., Emmertsen, K. J., Juul, T., Glynne-Jones, R., Branagan, G., … Moran, B. J. (2017).
Development and external validation of a nomogram and online tool to predict bowel dysfunction following restorative
rectal cancer resection: the POLARS score. Gut, gutjnl–2016–312695. doi:10.1136/gutjnl-2016-312695
21. Beam, A. L., Kartoun, U., Pai, J. K., Chatterjee, A. K., Fitzgerald, T. P., Shaw, S. Y., & Kohane, I. S. (2017). Predictive
Modeling of Physician-Patient Dynamics That Influence Sleep Medication Prescriptions and Clinical Decision-Making.
Scientific Reports, 7(1). doi:10.1038/srep42282
22. Behan L, Dimitrov BD, Kuehni CE, Hogg C, Carroll M, Evans HJ, Goutaki M, Harris A, Packham S, Walker WT, Lucas
JS. PICADAR: a diagnostic predictive tool for primary ciliary dyskinesia. Eur Respir J. 2016 Apr;47(4):1103-12. doi:
10.1183/13993003.01551-2015. Epub 2016 Feb 25. PMID: 26917608; PMCID: PMC4819882.
23. Behari S, Giri PJ, Shukla D, Jain VK, Banerji D. Surgical strategies for giant medial sphenoid wing meningiomas: a
new scoring system for predicting extent of resection. Acta Neurochir (Wien). 2008 Sep;150(9):865-77; discussion
877. doi: 10.1007/s00701-008-0006-6. Epub 2008 Aug 27. PMID: 18754074.
24. Bernau C, Riester M, Boulesteix AL, Parmigiani G, Huttenhower C, Waldron L, Trippa L. Cross-study validation for
the assessment of prediction algorithms. Bioinformatics. 2014 Jun 15;30(12):i105-12. doi:
10.1093/bioinformatics/btu279. PMID: 24931973; PMCID: PMC4058929.
25. Biancari F, Salenius JP, Heikkinen M, Luther M, Ylönen K, Lepäntalo M. Risk-scoring method for prediction of 30-day
postoperative outcome after infrainguinal surgical revascularization for critical lower-limb ischemia: a Finnvasc registry
study. World J Surg. 2007 Jan;31(1):217-25; discussion 226-7. doi: 10.1007/s00268-006-0242-y. PMID: 17171494.
26. Big Data for Better Outcomes [Website] (2019). Available at: https://bd4bo.eu/
27. Biss TT, Hanley JP. Recombinant activated factor VII (rFVIIa/NovoSeven) in intractable haemorrhage: use of a clinical
scoring system to predict outcome. Vox Sang. 2006 Jan;90(1):45-52. doi: 10.1111/j.1423-0410.2005.00711.x. PMID:
16359355.
28. Bodea R, Hajjar NA, Bartos A, Zaharie F, Graur F, Iancu C. Evaluation of P-POSSUM Risk Scoring System in
Prediction of Morbidity and Mortality after Pancreaticoduodenectomy. Chirurgia (Bucur). 2018 May-Jun;113(3):399-
404. doi: 10.21614/chirurgia.113.3.399. PMID: 29981671.
29. Bozkurt IH, Aydogdu O, Yonguc T, Yarimoglu S, Sen V, Gunlusoy B, Degirmenci T. Comparison of Guy and Clinical
Research Office of the Endourological Society Nephrolithometry Scoring Systems for Predicting Stone-Free Status
and Complication Rates After Percutaneous Nephrolithotomy: A Single Center Study with 437 Cases. J Endourol.
2015 Sep;29(9):1006-10. doi: 10.1089/end.2015.0199. Epub 2015 Jul 8. PMID: 26153844.
30. Buethe DD, Moussly S, Lin HY, Yue B, Rodriguez AR, Spiess PE, Sexton WJ. Is the R.E.N.A.L. nephrometry scoring
system predictive of the functional efficacy of nephron sparing surgery in the solitary kidney? J Urol. 2012
Sep;188(3):729-35. doi: 10.1016/j.juro.2012.04.115. Epub 2012 Jul 20. PMID: 22819418.
31. Burjonroppa, S. C., Varosy, P. D., Rao, S. V., Ou, F.-S., Roe, M., Peterson, E., … Shunk, K. A. (2011). Survival of
Patients Undergoing Rescue Percutaneous Coronary Intervention. JACC: Cardiovascular Interventions, 4(1), 42–50.
doi:10.1016/j.jcin.2010.09.020
32. Cadili A, Dabbs K, Scolyer RA, Brown PT, Thompson JF. Re-evaluation of a scoring system to predict nonsentinel-
node metastasis and prognosis in melanoma patients. J Am Coll Surg. 2010 Oct;211(4):522-5. doi:
10.1016/j.jamcollsurg.2010.06.016. Epub 2010 Aug 21. PMID: 20729103.
33. Cantor, S. B., & Kattan, M. W. (2000). Determining the Area under the ROC Curve for a Binary Diagnostic Test.
Medical Decision Making, 20(4), 468–470. doi:10.1177/0272989x0002000410
34. Chaichana KL, Pendleton C, Chambless L, Camara-Quintana J, Nathan JK, Hassam-Malani L, Li G, Harsh GR 4th,
Thompson RC, Lim M, Quinones-Hinojosa A. Multi-institutional validation of a preoperative scoring system which
predicts survival for patients with glioblastoma. J Clin Neurosci. 2013 Oct;20(10):1422-6. doi:
10.1016/j.jocn.2013.02.007. Epub 2013 Aug 6. PMID: 23928040; PMCID: PMC4086640.
35. Champion, H. R. (2002). Trauma Scoring. Scandinavian Journal of Surgery, 91(1), 12–22.
doi:10.1177/145749690209100104
36. Chang GJ. Can Prognostic Scoring Tools Predict Treatment Outcomes? Dis Colon Rectum. 2017 Sep;60(9):875-876.
doi: 10.1097/DCR.0000000000000820. PMID: 28796723.
37. Chang, S., Shi, X., Xu, Z., Liu, Q., (2015) TNM staging system may be superior to Lugano and Ann Arbor systems in
predicting the overall survival of patients with primary gastrointestinal lymphoma. JBuon. Available at:
https://pdfs.semanticscholar.org/0803/5befcc91405cb0a4721b6e3a4ad0290aa0e8.pdf
38. Chen, B.W. et al (2015) Combination of pulse wave velocity with clinical factors as a promising tool to predict major
adverse cardiac events after percutaneous coronary intervention. Journal of Cardiology, Vol. 65(4), 318-323
39. Chen JY, Feng J, Wang XQ, Cai SW, Dong JH, Chen YL. Risk scoring system and predictor for clinically relevant
pancreatic fistula after pancreaticoduodenectomy. World J Gastroenterol. 2015 May 21;21(19):5926-33. doi:
10.3748/wjg.v21.i19.5926. PMID: 26019457; PMCID: PMC4438027.
40. Cohen, I. G., Amarasingham, R., Shah, A., Xie, B., & Lo, B. (2014). The Legal And Ethical Concerns That Arise From
Using Complex Predictive Analytics In Health Care. Health Affairs, 33(7), 1139–1147. doi:10.1377/hlthaff.2014.0048
41. Chen S, Ling Q, Yu K, Huang C, Li N, Zheng J, Bao S, Cheng Q, Zhu M, Chen M. Dual oxidase 1: A predictive tool
for the prognosis of hepatocellular carcinoma patients. Oncol Rep. 2016 Jun;35(6):3198-208. doi:
10.3892/or.2016.4745. Epub 2016 Apr 14. PMID: 27108801; PMCID: PMC4869938.
40
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
42. Chiappetta S, Stier C, Squillante S, Theodoridou S, Weiner RA. The importance of the Edmonton Obesity Staging
System in predicting postoperative outcome and 30-day mortality after metabolic surgery. Surg Obes Relat Dis. 2016
Dec;12(10):1847-1855. doi: 10.1016/j.soard.2016.02.042. Epub 2016 Mar 2. PMID: 27317606.
43. Christopoulos G, Kandzari DE, Yeh RW, Jaffer FA, Karmpaliotis D, Wyman MR, Alaswad K, Lombardi W, Grantham
JA, Moses J, Christakopoulos G, Tarar MNJ, Rangan BV, Lembo N, Garcia S, Cipher D, Thompson CA, Banerjee S,
Brilakis ES. Development and Validation of a Novel Scoring System for Predicting Technical Success of Chronic Total
Occlusion Percutaneous Coronary Interventions: The PROGRESS CTO (Prospective Global Registry for the Study of
Chronic Total Occlusion Intervention) Score. JACC Cardiovasc Interv. 2016 Jan 11;9(1):1-9. doi:
10.1016/j.jcin.2015.09.022. PMID: 26762904.
44. Chua SK, Shyu KG, Lu MJ, Lien LM, Lin CH, Chao HH, Lo HM. Clinical utility of CHADS2 and CHA2DS2-VASc scoring
systems for predicting postoperative atrial fibrillation after cardiac surgery. J Thorac Cardiovasc Surg. 2013
Oct;146(4):919-926.e1. doi: 10.1016/j.jtcvs.2013.03.040. Epub 2013 Apr 26. PMID: 23628495.
45. Chung WJ, Chen CY, Lee FY, Wu CC, Hsueh SK, Lin CJ, Hang CL, Wu CJ, Cheng CI. Validation of Scoring Systems
That Predict Outcomes in Patients With Coronary Artery Disease Undergoing Coronary Artery Bypass Grafting
Surgery. Medicine (Baltimore). 2015 Jun;94(23):e927. doi: 10.1097/MD.0000000000000927. PMID: 26061316;
PMCID: PMC4616463.
46. Convery PA, Cantrell LA, Di Santo N, Broadwater G, Modesitt SC, Secord AA, Havrilesky LJ. Retrospective review of
an intraoperative algorithm to predict lymph node metastasis in low-grade endometrial adenocarcinoma. Gynecol
Oncol. 2011 Oct;123(1):65-70. doi: 10.1016/j.ygyno.2011.06.025. Epub 2011 Jul 13. PMID: 21742369.
47. Cooperberg MR, Davicioni E, Crisan A, Jenkins RB, Ghadessi M, Karnes RJ. Combined value of validated clinical and
genomic risk stratification tools for predicting prostate cancer mortality in a high-risk prostatectomy cohort. Eur Urol.
2015 Feb;67(2):326-33. doi: 10.1016/j.eururo.2014.05.039. Epub 2014 Jul 2. PMID: 24998118; PMCID:
PMC4282620.
48. Corso A, Galli M, Mangiacavalli S, Rossini F, Nozza A, Pascutto C, Montefusco V, Baldini L, Cafro AM, Crippa C,
Cazzola M, Corradini P. Response-adjusted ISS (RaISS) is a simple and reliable prognostic scoring system for
predicting progression-free survival in transplanted patients with multiple myeloma. Am J Hematol. 2012
Feb;87(2):150-4. doi: 10.1002/ajh.22220. Epub 2011 Dec 21. PMID: 22189759.
49. Critsinelis A, Kurihara C, Volkovicher N, Kawabori M, Sugiura T, Manon M 2nd, Wang S, Civitello AB, Morgan JA.
Model of End-Stage Liver Disease-eXcluding International Normalized Ratio (MELD-XI) Scoring System to Predict
Outcomes in Patients Who Undergo Left Ventricular Assist Device Implantation. Ann Thorac Surg. 2018
Aug;106(2):513-519. doi: 10.1016/j.athoracsur.2018.02.082. Epub 2018 Apr 4. PMID: 29626453.
50. D'Avanzo A, Ituarte P, Treseler P, Kebebew E, Wu J, Wong M, Duh QY, Siperstein AE, Clark OH. Prognostic scoring
systems in patients with follicular thyroid cancer: a comparison of different staging systems in predicting the patient
outcome. Thyroid. 2004 Jun;14(6):453-8. doi: 10.1089/105072504323150778. PMID: 15242573.
51. Davis B, Marin D, Hurwitz LM, Ronald J, Ellis MJ, Ravindra KV, Collins BH, Kim CY. Application of a Novel CT-Based
Iliac Artery Calcification Scoring System for Predicting Renal Transplant Outcomes. AJR Am J Roentgenol. 2016
Feb;206(2):436-41. doi: 10.2214/AJR.15.14794. PMID: 26797375.
52. De Maria GL, Fahrni G, Alkhalil M, Cuculi F, Dawkins S, Wolfrum M, Choudhury RP, Forfar JC, Prendergast BD,
Yetgin T, van Geuns RJ, Tebaldi M, Channon KM, Kharbanda RK, Rothwell PM, Valgimigli M, Banning AP. A tool for
predicting the outcome of reperfusion in ST-elevation myocardial infarction using age, thrombotic burden and index of
microcirculatory resistance (ATI score). EuroIntervention. 2016 Nov 20;12(10):1223-1230. doi:
10.4244/EIJV12I10A202. PMID: 27866132.
53. Dehghani SM, Gholami S, Bahador A, Haghighat M, Imanieh MH, Nikeghbalian S, Salahi H, Davari HR, Mehrabani
D, Malek-Hosseini SA. Comparison of Child-Turcotte-Pugh and pediatric end-stage liver disease scoring systems to
predict morbidity and mortality of children awaiting liver transplantation. Transplant Proc. 2007 Dec;39(10):3175-7.
doi: 10.1016/j.transproceed.2007.07.080. PMID: 18089346.
54. Dewey TM, Brown D, Ryan WH, Herbert MA, Prince SL, Mack MJ. Reliability of risk algorithms in predicting early and
late operative outcomes in high-risk patients undergoing aortic valve replacement. J Thorac Cardiovasc Surg. 2008
Jan;135(1):180-7. doi: 10.1016/j.jtcvs.2007.09.011. Epub 2007 Nov 26. PMID: 18179938.
55. Dou D, Yang S, Lin Y, Zhang J. An eight-miRNA signature expression-based risk scoring system for prediction of
survival in pancreatic adenocarcinoma. Cancer Biomark. 2018;23(1):79-93. doi: 10.3233/CBM-181420. PMID:
29991127.
56. Dou K, Zhang D, Xu B, Yang Y, Yin D, Qiao S, Wu Y, Yan H, You S, Wang Y, Wu Z, Gao R, Kirtane AJ. An angiographic
tool for risk prediction of side branch occlusion in coronary bifurcation intervention: the RESOLVE score system (Risk
prEdiction of Side branch OccLusion in coronary bifurcation interVEntion). JACC Cardiovasc Interv. 2015 Jan;8(1 Pt
A):39-46. doi: 10.1016/j.jcin.2014.08.011. PMID: 25616815.
57. Dowsett M, Salter J, Zabaglo L, Mallon E, Howell A, Buzdar AU, Forbes J, Pineda S, Cuzick J. Predictive algorithms
for adjuvant therapy: TransATAC. Steroids. 2011 Jul;76(8):777-80. doi: 10.1016/j.steroids.2011.02.032. Epub 2011
Apr 4. PMID: 21470560.
41
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
58. Drukker, C. A., Nijenhuis, M. V., Bueno-de-Mesquita, J. M., Retèl, V. P., van Harten, W. H., van Tinteren, H., … Linn,
S. C. (2014). Optimized outcome prediction in breast cancer by combining the 70-gene signature with clinical risk
prediction algorithms. Breast Cancer Research and Treatment, 145(3), 697–705. doi:10.1007/s10549-014-2954-2
59. Dulai, P. S., Boland, B. S., Singh, S., Chaudrey, K., Koliani-Pace, J. L., Kochhar, G., … Cao, C. (2018). Development
and Validation of a Scoring System to Predict Outcomes of Vedolizumab Treatment in Patients With Crohn’s Disease.
Gastroenterology. doi:10.1053/j.gastro.2018.05.039
60. Dunn, B.K., Steele, V.E., Fagerstrom, R.M., Topp, C.F., Ransohoff, D., Cunningham, C., Lubet, R., Ford, L.G., Kramer,
B.S. (2015) Predictive Value Tools as an Aid in Chemopreventive Agent Development, JNCI: Journal of the National
Cancer Institute, Volume 107, Issue 12, December 2015, djv259
61. Edge, S., Byrd, D.R., Compton, C.C., Fritz, A.G., Greene, F., Trotti, A. (2010) AJCC Cancer Staging Handbook. 7th
Edition, Springer. ISBN: 978-0-387-88442-4
62. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., … Dean, J. (2019). A guide to deep
learning in healthcare. Nature Medicine, 25(1), 24–29. doi:10.1038/s41591-018-0316-z
63. European Society of Coloproctology. (2019) The LARS Score, Overview of LARS Score translations and validation
studies. PDF available at the website page: https://www.escp.eu.com/news/focus-on/beyond-colorectal-cancer/1579-
lars-score
64. Evers D, Kerkhoffs JL, Van Egmond L, Schipperus MR, Wijermans PW. The efficiency of therapeutic
erythrocytapheresis compared to phlebotomy: a mathematical tool for predicting response in hereditary
hemochromatosis, polycythemia vera, and secondary erythrocytosis. J Clin Apher. 2014 Jun;29(3):133-8. doi:
10.1002/jca.21303. Epub 2013 Oct 15. PMID: 24130064.
65. Favrat B, Rao S, O'Connor PG, Schottenfeld R. A staging system to predict prognosis among methadone maintenance
patients, based on admission characteristics. Subst Abus. 2002 Dec;23(4):233-44. doi: 10.1080/08897070209511496.
PMID: 12438836.
66. Food and Drug Administration (2021) Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical
Device (SaMD) Action Plan [Website] Available at: https://www.fda.gov/media/145022/download
67. Food And Drug Administration. (2019) Proposed regulatory framework for modifications to AI/ML based software as
Medical Device (SaMD) [Website]. Available at: https://www.fda.gov/media/122535/download
68. Fujino A, Mintz GS, Matsumura M, Lee T, Kim SY, Hoshino M, Usui E, Yonetsu T, Haag ES, Shlofmitz RA, Kakuta T,
Maehara A. A new optical coherence tomography-based calcium scoring system to predict stent underexpansion.
EuroIntervention. 2018 Apr 6;13(18):e2182-e2189. doi: 10.4244/EIJ-D-17-00962. PMID: 29400655.
69. Gaba RC, Couture PM, Bui JT, Knuttinen MG, Walzer NM, Kallwitz ER, Berkes JL, Cotler SJ. Prognostic capability of
different liver disease scoring systems for prediction of early mortality after transjugular intrahepatic portosystemic
shunt creation. J Vasc Interv Radiol. 2013 Mar;24(3):411-20, 420.e1-4; quiz 421. doi: 10.1016/j.jvir.2012.10.026. Epub
2013 Jan 9. PMID: 23312989.
70. Galetsi P. and Katsaliaki K. (2018) A review of the literature on big data analytics in healthcare. Journal of the
Operational Research Society. 71(10)1511-1529. https://doi.org/10.1080/01605682.2019.1630328
71. Garcia Gracia C, Yardi R, Kattan MW, Nair D, Gupta A, Najm I, Bingaman W, Gonzalez-Martinez J, Jehi L. Seizure
freedom score: a new simple method to predict success of epilepsy surgery. Epilepsia. 2015 Mar;56(3):359-65. doi:
10.1111/epi.12892. Epub 2014 Dec 20. PMID: 25530458.
72. Gatti G, Barbati G, Luzzati R, Sinagra G, Pappalardo A. Prospective validation of a predictive scoring system for deep
sternal wound infection after routine bilateral internal thoracic artery grafting. Interact Cardiovasc Thorac Surg. 2016
May;22(5):606-11. doi: 10.1093/icvts/ivw016. Epub 2016 Feb 17. PMID: 26892193; PMCID: PMC4892156.
73. Gatti G, Perrotti A, Obadia JF, Duval X, Iung B, Alla F, Chirouze C, Selton-Suty C, Hoen B, Sinagra G, Delahaye F,
Tattevin P, Le Moing V, Pappalardo A, Chocron S; Association for the Study and Prevention of Infective Endocarditis
Study Group–Association pour l'Étude et la Prévention de l'Endocadite Infectieuse (AEPEI). Simple Scoring System
to Predict In-Hospital Mortality After Surgery for Infective Endocarditis. J Am Heart Assoc. 2017 Jul 20;6(7):e004806.
doi: 10.1161/JAHA.116.004806. PMID: 28729412; PMCID: PMC5586260.
74. Gernaat, S.A.M., Boer, J.M.A., van den Bongard, D.H.J. et al. The risk of cardiovascular disease following breast
cancer by Framingham risk score (2018) 170: 119. https://doi.org/10.1007/s10549-018-4723-0
75. Gianfrancesco, M. A., Tamang, S., Yazdany, J., & Schmajuk, G. (2018). Potential Biases in Machine Learning
Algorithms Using Electronic Health Record Data. JAMA Internal Medicine. doi:10.1001/jamainternmed.2018.3763
76. Gockel I, Niebisch S, Campbell LK, Sgourakis G, Junginger T. Prognostic scoring system predictive of survival after
surgical resection of esophageal carcinoma. Thorac Cardiovasc Surg. 2013 Sep;61(6):470-8. doi: 10.1055/s-0032-
1331843. Epub 2013 Mar 8. PMID: 23475799.
77. Goyal MK, Chakravarthi S, Modi M, Bhalla A, Lal V. Status epilepticus severity score (STESS): A useful tool to predict
outcome of status epilepticus. Clin Neurol Neurosurg. 2015 Dec;139:96-9. doi: 10.1016/j.clineuro.2015.09.010. Epub
2015 Sep 15. PMID: 26409183.
78. Green DA, Osterberg EC, Xylinas E, Rink M, Karakiewicz PI, Scherr DS, Shariat SF. Predictive tools for prostate
cancer staging, treatment response and outcomes. Arch Esp Urol. 2012 Nov;65(9):787-807. English, Spanish. PMID:
23154603.
42
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
79. Gunning, K., & Rowan, K. (1999). ABC of intensive care: Outcome data and scoring systems. BMJ, 319(7204), 241–
244. doi:10.1136/bmj.319.7204.241
80. Gupta N, Ranjan G, Arora MP, Goswami B, Chaudhary P, Kapur A, Kumar R, Chand T. Validation of a scoring system
to predict difficult laparoscopic cholecystectomy. Int J Surg. 2013;11(9):1002-6. doi: 10.1016/j.ijsu.2013.05.037. Epub
2013 Jun 8. PMID: 23751733.
81. Gupta P, Chakraborty A, Gossett JM, Rettiganti M. A prognostic tool to predict outcomes in children undergoing the
Norwood operation. J Thorac Cardiovasc Surg. 2017 Dec;154(6):2030-2037.e2. doi: 10.1016/j.jtcvs.2017.08.034.
Epub 2017 Aug 30. PMID: 28941736.
82. Gupta P, Rettiganti M, Gossett JM, Daufeldt J, Rice TB, Wetzel RC. Development and Validation of an Empiric Tool
to Predict Favorable Neurologic Outcomes Among PICU Patients. Crit Care Med. 2018 Jan;46(1):108-115. doi:
10.1097/CCM.0000000000002753. PMID: 28991830.
83. Gutiérrez-García G, Cardesa-Salzmann T, Climent F, González-Barca E, Mercadal S, Mate JL, Sancho JM, Arenillas
L, Serrano S, Escoda L, Martínez S, Valera A, Martínez A, Jares P, Pinyol M, García-Herrera A, Martínez-Trillos A,
Giné E, Villamor N, Campo E, Colomo L, López-Guillermo A; Grup per l'Estudi dels Limfomes de Catalunya I Balears
(GELCAB). Gene-expression profiling and not immunophenotypic algorithms predicts prognosis in patients with diffuse
large B-cell lymphoma treated with immunochemotherapy. Blood. 2011 May 5;117(18):4836-43. doi: 10.1182/blood-
2010-12-322362. Epub 2011 Mar 25. PMID: 21441466.
84. Ham WS, Chalfin HJ, Feng Z, Trock BJ, Epstein JI, Cheung C, Humphreys E, Partin AW, Han M. New Prostate Cancer
Grading System Predicts Long-term Survival Following Surgery for Gleason Score 8-10 Prostate Cancer. Eur Urol.
2017 Jun;71(6):907-912. doi: 10.1016/j.eururo.2016.11.006. Epub 2016 Nov 19. PMID: 27876305.
85. Hashimoto D, Takamori H, Sakamoto Y, Tanaka H, Hirota M, Baba H. Can the physiologic ability and surgical stress
(E-PASS) scoring system predict operative morbidity after distal pancreatectomy? Surg Today. 2010 Jul;40(7):632-7.
doi: 10.1007/s00595-009-4112-8. Epub 2010 Jun 26. PMID: 20582514.
86. He, J., Baxter, S. L., Xu, J., Xu, J., Zhou, X., & Zhang, K. (2019). The practical implementation of artificial intelligence
technologies in medicine. Nature Medicine, 25(1), 30–36. doi:10.1038/s41591-018-0307-0
87. Hernandez, I., & Zhang, Y. (2017). Using predictive analytics and big data to optimize pharmaceutical outcomes.
American Journal of Health-System Pharmacy, 74(18), 1494–1500. doi:10.2146/ajhp161011
88. Hiroyasu T, Miyabe Y, Yokouchi H. Training data selection method for prediction of anticancer drug effects using a
genetic algorithm with local search. Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:124-8. doi:
10.1109/IEMBS.2011.6089868. PMID: 22254266.
89. Hu, Y., Zhang, X., Liu, Y., Yan, J., Li, T., & Hu, A. (2013). APACHE IV Is Superior to MELD Scoring System in
Predicting Prognosis in Patients after Orthotopic Liver Transplantation. Clinical and Developmental Immunology, 2013,
1–5. doi:10.1155/2013/809847
90. Huang Y, Xie T, Cao Y, Wu M, Yu L, Lu S, Xu G, Hu J, Ruan H. Comparison of two classification systems in predicting
the outcome of diabetic foot ulcers: the Wagner grade and the Saint Elian Wound score systems. Wound Repair
Regen. 2015 May-Jun;23(3):379-85. doi: 10.1111/wrr.12289. PMID: 25817047.
91. Huang YS, Lin HJ, Fang YR, Wang K, Chang FY, Lee SD. Development and validation of a scoring system predicting
failure of endoscopic epinephrine injection therapy in Taiwanese patients with bleeding peptic ulcers. Zhonghua Yi
Xue Za Zhi (Taipei). 2002 Apr;65(4):144-50. PMID: 12135192.
92. Hur H, Kim NK, Min BS, Baik SH, Lee KY, Koom WS, Ahn JB, Kim H. Can a biomarker-based scoring system predict
pathologic complete response after preoperative chemoradiotherapy for rectal cancer? Dis Colon Rectum. 2014
May;57(5):592-601. doi: 10.1097/DCR.0000000000000109. PMID: 24819099.
93. Hur, H., Tulina, I., Cho, M.S., Min, B.S., Koom, W.S., Lim, J.S., Ahn, J.B., Kim, N.K. (2016) Biomarker-Based Scoring
System for Prediction of Tumor Response After Preoperative Chemoradiotherapy in Rectal Cancer by Reverse
Transcriptase Polymerase Chain Reaction Analysis. Diseases of the Colon and Rectum, 01 Dec 2016, 59(12):1174-
1182 PMID: 27824703
94. Hutchings HA, Evans BA, Fitzsimmons D, Harrison J, Heaven M, Huxley P, Kingston MR, Lewis L, Phillips C, Porter
A, Russell IT, Sewell B, Warm D, Watkins A, Snooks HA. Predictive risk stratification model: a progressive cluster-
randomised trial in chronic conditions management (PRISMATIC) research protocol. Trials. 2013 Sep 18;14:301. doi:
10.1186/1745-6215-14-301. PMID: 24330749; PMCID: PMC3848373.
95. Inamoto Y, Kurahashi S, Imahashi N, Fukushima N, Adachi T, Kinoshita T, Tsushita K, Miyamura K, Naoe T, Sugiura
I. Combinations of cytogenetics and international scoring system can predict poor prognosis in multiple myeloma after
high-dose chemotherapy and autologous stem cell transplantation. Am J Hematol. 2009 May;84(5):283-6. doi:
10.1002/ajh.21390. PMID: 19338045.
96. Isariyawongse BK, Kattan MW. Prediction tools in surgical oncology. Surg Oncol Clin N Am. 2012 Jul;21(3):439-47,
viii-ix. doi: 10.1016/j.soc.2012.03.007. Epub 2012 Apr 17. PMID: 22583992.
97. Jacobson SM, Slain D. Evaluation of a bedside scoring system for predicting clinical cure and recurrence of Clostridium
difficile infections. Am J Health Syst Pharm. 2015 Nov 1;72(21):1871-5. doi: 10.2146/ajhp150076. PMID: 26490821.
98. Jahr G, Broi MD, Holte H Jr, Beiske K, Meling TR. Evaluation of Memorial Sloan-Kettering Cancer Center and
International Extranodal Lymphoma Study Group prognostic scoring systems to predict Overall Survival in intracranial
43
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
Primary CNS lymphoma. Brain Behav. 2018 Feb 5;8(3):e00928. doi: 10.1002/brb3.928. PMID: 29541540; PMCID:
PMC5840438.
99. Jeffery AD. Methodological Challenges in Examining the Impact of Healthcare Predictive Analytics on Nursing-
Sensitive Patient Outcomes. Comput Inform Nurs. 2015 Jun;33(6):258-64. doi: 10.1097/CIN.0000000000000154.
PMID: 25899442.
100. Jeong GK, Kaplan FT, Liporace F, Paksima N, Koval KJ. An evaluation of two scoring systems to predict instability in
fractures of the distal radius. J Trauma. 2004 Nov;57(5):1043-7. doi: 10.1097/01.ta.0000105886.89776.82. PMID:
15580030.
101. Jiang CB, Lee HC, Yeung CY, Sheu JC, Chang PY, Wang NL, Yeh CY. A scoring system to predict the need for liver
transplantation for biliary atresia after Kasai portoenterostomy. Eur J Pediatr. 2003 Sep;162(9):603-6. doi:
10.1007/s00431-003-1268-x. Epub 2003 Jul 3. PMID: 12844260.
102. Jiang, F., Jiang, Y., Zhi, H., Dong, Y., Li, H., Ma, S., … Wang, Y. (2017). Artificial intelligence in healthcare: past,
present and future. Stroke and Vascular Neurology, 2(4), 230–243. doi:10.1136/svn-2017-000101
103. Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database (2016). Sci. Data. 3, 160035.
104. Joint Action on Health Information [Website] (2019). Available at: https://Inf-act.eu/
105. Julien M, Wild JL, Blansfield J, Shabahang M, Halm K, Meade P, Dove J, Fluck M, Hunsinger M, Leonard D. Severe
complicated Clostridium difficile infection: Can the UPMC proposed scoring system predict the need for surgery? J
Trauma Acute Care Surg. 2016 Aug;81(2):221-8. doi: 10.1097/TA.0000000000001112. PMID: 27257702.
106. Kang Y, Cheng L, Cui J, Li L, Qin S, Su Y, Mao J, Gong X, Chen H, Pan C, Shen X, He B, Shu X. A new score system
for predicting response to cardiac resynchronization therapy. Cardiol J. 2015;22(2):179-87. doi:
10.5603/CJ.a2014.0089. Epub 2014 Nov 27. PMID: 25428735.
107. Kattan MW. Comparing prediction tools. Eur Urol. 2010 Apr;57(4):569-70; discussion 574. doi:
10.1016/j.eururo.2010.01.006. Epub 2010 Jan 14. PMID: 20080334.
108. Kattan MW. The many uses of cancer prediction tools. Semin Oncol. 2010 Feb;37(1):20-2. doi:
10.1053/j.seminoncol.2009.12.010. PMID: 20172359.
109. Khaw AV, Angermaier A, Kirsch M, Kessler C, Hosten N, Langner S. Comparing perfusion CT evaluation algorithms
for predicting outcome after endovascular treatment in anterior circulation ischaemic stroke. Clin Radiol. 2015
May;70(5):e41-50. doi: 10.1016/j.crad.2015.02.001. Epub 2015 Mar 9. PMID: 25766967.
110. Khera S, Kolte D, Deo S, Kalra A, Gupta T, Abbott D, Kleiman N, Bhatt DL, Fonarow GC, Khalique OK, Kodali S, Leon
MB, Elmariah S. Derivation and external validation of a simple risk tool to predict 30-day hospital readmissions after
transcatheter aortic valve replacement. EuroIntervention. 2019 Jun 20;15(2):155-163. doi: 10.4244/EIJ-D-18-00954.
PMID: 30803938.
111. Kim, H., Sohn, H.J., Kim, S., Kim, K., Lee, J.H., Bang. S.M., Kim, D.H., Sohn, S.K., Lee, J.L., Suh, C. (2006) New
Staging Systems Can Predict Prognosis of Multiple Myeloma Patients Undergoing Autologous Peripheral Blood Stem
Cell Transplantation as First-Line Therapy, Biology of Blood and Marrow Transplantation, Volume 12, Issue 8, 837-
844, ISSN 1083-8791
112. Kim HS, Ju CI. Spinal Instability Predictive Scoring System for Subsequent Fracture After Bone Cement Augmentation
in Patients with Osteoporotic Vertebral Compression Fracture. World Neurosurg. 2017 Oct;106:736-745. doi:
10.1016/j.wneu.2017.07.049. Epub 2017 Jul 19. PMID: 28735136.
113. Kim SH, Hwang HK, Lee WJ, Kang CM. Identification of an N staging system that predicts oncologic outcome in
resected left-sided pancreatic cancer. Medicine (Baltimore). 2016 Jun;95(26):e4035. doi:
10.1097/MD.0000000000004035. PMID: 27368029; PMCID: PMC4937943.
114. Kirsch AJ, Arlen AM, Leong T, Merriman LS, Herrel LA, Scherz HC, Smith EA, Srinivasan AK. Vesicoureteral reflux
index (VURx): a novel tool to predict primary reflux improvement and resolution in children less than 2 years of age. J
Pediatr Urol. 2014 Dec;10(6):1249-54. doi: 10.1016/j.jpurol.2014.06.019. Epub 2014 Jul 24. PMID: 25511573.
115. Kluth LA, Black PC, Bochner BH, Catto J, Lerner SP, Stenzl A, Sylvester R, Vickers AJ, Xylinas E, Shariat SF.
Prognostic and Prediction Tools in Bladder Cancer: A Comprehensive Review of the Literature. Eur Urol. 2015
Aug;68(2):238-53. doi: 10.1016/j.eururo.2015.01.032. Epub 2015 Feb 21. PMID: 25709027.
116. Kobayashi N, Hirano K, Nakano M, Muramatsu T, Tsukahara R, Ito Y, Ishimori H, Yamawaki M, Araki M, Takimura H,
Sakamoto Y. Development and validation of a new scoring system to predict wound healing after endovascular therapy
in critical limb ischemia with tissue loss. J Endovasc Ther. 2015 Feb;22(1):48-56. doi: 10.1177/1526602814564370.
PMID: 25775680.
117. Kobe AR, Meyer A, Elmubarak H, Kempfert J, Pavicevic J, Maisano F, Walther T, Falk V, Sündermann SH. Frailty
Assessed by the FORECAST Is a Valid Tool to Predict Short-Term Outcome After Transcatheter Aortic Valve
Replacement. Innovations (Phila). 2016 Nov/Dec;11(6):407-413. doi: 10.1097/IMI.0000000000000321. PMID:
27926626.
118. Kocaaslan R, Tepeler A, Buldu I, Tosun M, Utangac MM, Karakan T, Ozyuvali E, Hatipoglu NK, Unsal A, Sarica K. Do
the urolithiasis scoring systems predict the success of percutaneous nephrolithotomy in cases with anatomical
abnormalities? Urolithiasis. 2017 Jun;45(3):305-310. doi: 10.1007/s00240-016-0903-8. Epub 2016 Jul 12. PMID:
27406306.
44
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
119. Kodama M, Okura Y, Hirono S, Hanawa H, Ogawa Y, Itoh M, Izumi T, Aizawa Y. A new scoring system to predict the
efficacy of steroid therapy for patients with active myocarditis--a retrospective study. Jpn Circ J. 1998 Oct;62(10):715-
20. doi: 10.1253/jcj.62.715. PMID: 9805250.
120. Kogo M, Suzuki A, Sunaga T, Kaneko K, Imawari M, Kiuchi Y. Scoring system for predicting recurrence after
chemoradiotherapy including 5-fluorouracil and platinum for patients with esophageal cancer.
Hepatogastroenterology. 2013 Nov-Dec;60(128):1979-84. doi: 10.5754/hge13131. PMID: 24088316.
121. Kress MA, Collins BT, Collins SP, Dritschilo A, Gagnon G, Unger K. Scoring system predictive of survival for patients
undergoing stereotactic body radiation therapy for liver tumors. Radiat Oncol. 2012 Sep 5;7:148. doi: 10.1186/1748-
717X-7-148. PMID: 22950606; PMCID: PMC3493308.
122. Kumar S, Sreenivas J, Karthikeyan VS, Mallya A, Keshavamurthy R. Evaluation of CROES Nephrolithometry
Nomogram as a Preoperative Predictive System for Percutaneous Nephrolithotomy Outcomes. J Endourol. 2016
Oct;30(10):1079-1083. doi: 10.1089/end.2016.0340. Epub 2016 Sep 22. PMID: 27550775.
123. Kuroda J, Shimura Y, Ohta K, Tanaka H, Shibayama H, Kosugi S, Fuchida S, Kobayashi M, Kaneko H, Uoshima N,
Ishii K, Nomura S, Taniwaki M, Takaori-Kondo A, Shimazaki C, Tsudo M, Hino M, Matsumura I, Kanakura Y; Kansai
Myeloma Forum Investigators. Limited value of the international staging system for predicting long-term outcome of
transplant-ineligible, newly diagnosed, symptomatic multiple myeloma in the era of novel agents. Int J Hematol. 2014
Apr;99(4):441-9. doi: 10.1007/s12185-014-1539-5. Epub 2014 Mar 1. PMID: 24584872.
124. Laguna Sanz AJ, Mulla CM, Fowler KM, Cloutier E, Goldfine AB, Newswanger B, Cummins M, Deshpande S,
Prestrelski SJ, Strange P, Zisser H, Doyle FJ 3rd, Dassau E, Patti ME. Design and Clinical Evaluation of a Novel Low-
Glucose Prediction Algorithm with Mini-Dose Stable Glucagon Delivery in Post-Bariatric Hypoglycemia. Diabetes
Technol Ther. 2018 Feb;20(2):127-139. doi: 10.1089/dia.2017.0298. PMID: 29355439; PMCID: PMC5771550.
125. Lammers WJ, Hirschfield GM, Corpechot C, Nevens F, Lindor KD, Janssen HL, Floreani A, Ponsioen CY, Mayo MJ,
Invernizzi P, Battezzati PM, Parés A, Burroughs AK, Mason AL, Kowdley KV, Kumagi T, Harms MH, Trivedi PJ,
Poupon R, Cheung A, Lleo A, Caballeria L, Hansen BE, van Buuren HR; Global PBC Study Group. Development and
Validation of a Scoring System to Predict Outcomes of Patients With Primary Biliary Cirrhosis Receiving
Ursodeoxycholic Acid Therapy. Gastroenterology. 2015 Dec;149(7):1804-1812.e4. doi: 10.1053/j.gastro.2015.07.061.
Epub 2015 Aug 7. PMID: 26261009.
126. Lau L, Kankanige Y, Rubinstein B, Jones R, Christophi C, Muralidharan V, Bailey J. Machine-Learning Algorithms
Predict Graft Failure After Liver Transplantation. Transplantation. 2017 Apr;101(4):e125-e132. doi:
10.1097/TP.0000000000001600. PMID: 27941428; PMCID: PMC7228574.
127. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (2009). Gradient Based Learning Applied to Document Recognition.
Intelligent Signal Processing. doi:10.1109/9780470544976.ch9
128. Lee JH, Yoon JH, Cho EJ, Yang HJ, Jang ES, Kwak MS, Hwang SY, Yu SJ, Lee CH, Kim YJ, Kim CY, Lee HS. Simple
scoring system predicting genotypic resistance during rescue therapy for Lamivudine-resistant chronic hepatitis B. J
Clin Gastroenterol. 2012 Mar;46(3):243-50. doi: 10.1097/MCG.0b013e318225f559. PMID: 21716122.
129. Lee Y, Ragguett RM, Mansur RB, Boutilier JJ, Rosenblat JD, Trevizol A, Brietzke E, Lin K, Pan Z, Subramaniapillai
M, Chan TCY, Fus D, Park C, Musial N, Zuckerman H, Chen VC, Ho R, Rong C, McIntyre RS. Applications of machine
learning algorithms to predict therapeutic outcomes in depression: A meta-analysis and systematic review. J Affect
Disord. 2018 Dec 1;241:519-532. doi: 10.1016/j.jad.2018.08.073. Epub 2018 Aug 14. Erratum in: J Affect Disord. 2020
Sep 1;274:1211-1215. PMID: 30153635.
130. Lee, Y., Ragguett, R.-M., Mansur, R. B., Boutilier, J. J., Rosenblat, J. D., Trevizol, A., … McIntyre, R. S. (2018).
Applications of machine learning algorithms to predict therapeutic outcomes in depression: a meta-analysis and
systematic review. Journal of Affective Disorders. doi:10.1016/j.jad.2018.08.073
131. Leibovich BC, Han KR, Bui MH, Pantuck AJ, Dorey FJ, Figlin RA, Belldegrun A. Scoring algorithm to predict survival
after nephrectomy and immunotherapy in patients with metastatic renal cell carcinoma: a stratification tool for
prospective clinical trials. Cancer. 2003 Dec 15;98(12):2566-75. doi: 10.1002/cncr.11851. PMID: 14669275.
132. Leung E, Ferjani AM, Kitchen A, Griffin D, Stellard N, Wong LS. Risk-adjusted scoring systems can predict surgeons'
performance in colorectal surgery. Surgeon. 2011 Feb;9(1):3-7. doi: 10.1016/j.surge.2010.07.008. Epub 2010 Aug 21.
PMID: 21195323.
133. Levine ZT, Buchanan RI, Sekhar LN, Rosen CL, Wright DC. Proposed grading system to predict the extent of resection
and outcomes for cranial base meningiomas. Neurosurgery. 1999 Aug;45(2):221-30. doi: 10.1097/00006123-
199908000-00003. PMID: 10449065.
134. Lewis SE, O'Connell M, Stevenson M, Thompson-Cree L, McClure N. An algorithm to predict pregnancy in assisted
reproduction. Hum Reprod. 2004 Jun;19(6):1385-94. doi: 10.1093/humrep/deh227. Epub 2004 Apr 29. PMID:
15117906.
135. Li A, Khalighi PR, Wu Q, Garcia DA. External validation of the PLASMIC score: a clinical prediction tool for thrombotic
thrombocytopenic purpura diagnosis and treatment. J Thromb Haemost. 2018 Jan;16(1):164-169. doi:
10.1111/jth.13882. Epub 2017 Nov 16. PMID: 29064619; PMCID: PMC5760324.
136. Li JL, Lin XY, Zhuang LJ, He JY, Peng QQ, Dong YP, Wu JX. Establishment of a risk scoring system for predicting
locoregional recurrence in T1 to T2 node-negative breast cancer patients treated with mastectomy: Implications for
45
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
156. Molteni A, Riva M, Greco R, Nichelatti M, Ravano E, Marbello L, Nosari A, Morra E. Verifying Hellström-Lindberg score
as predictive tool for response to erythropoietin therapy according to the "International Working Group" criteria, in
anemic patients affected by myelodysplastic syndrome: a monocentric experience. Int J Hematol. 2013 Apr;97(4):472-
9. doi: 10.1007/s12185-013-1305-0. Epub 2013 Mar 19. PMID: 23508542.
157. Monsalve-Torra, A., Ruiz-Fernandez, D., Marin-Alonso, O., Soriano-Payá, A., Camacho-Mackenzie, J., & Carreño-
Jaimes, M. (2016). Using machine learning methods for predicting inhospital mortality in patients undergoing open
repair of abdominal aortic aneurysm. Journal of Biomedical Informatics, 62, 195–201. doi:10.1016/j.jbi.2016.07.007
158. Moon SH, Kim DY, Park JW, Oh JH, Chang HJ, Kim SY, Kim TH, Park HC, Choi DH, Chun HK, Kim JH, Park JH, Yu
CS. Can the new American Joint Committee on Cancer staging system predict survival in rectal cancer patients treated
with curative surgery following preoperative chemoradiotherapy? Cancer. 2012 Oct 15;118(20):4961-8. doi:
10.1002/cncr.27507. Epub 2012 Mar 13. PMID: 22415662.
159. Moonesinghe, S.R., Mythen, M.G., Das P., Rowan K.M. & Grocott, M.P.W. (2013). Risk Stratification Tools for
Predicting Morbidity and Mortality in Adult Patients Undergoing Major Surgery: Qualitative Systematic Review.
Anesthesiology 10 2013, Vol.119, 959-981. doi:10.1097/ALN.0b013e3182a4e94d
160. Morales-Gisbert SM, Zaragozá García JM, Plaza Martínez Á, Gómez Palonés FJ, Ortiz-Monzón E. Development of
an individualized scoring system to predict mid-term survival after carotid endarterectomy. J Cardiovasc Surg (Torino).
2017 Aug;58(4):535-542. doi: 10.23736/S0021-9509.16.08198-2. Epub 2014 Jul 30. PMID: 25073889.
161. Mould RF, Lederman M, Tai P, Wong JK. Methodology to predict long-term cancer survival from short-term data using
Tobacco Cancer Risk and Absolute Cancer Cure models. Phys Med Biol. 2002 Nov 21;47(22):3893-924. doi:
10.1088/0031-9155/47/22/301. PMID: 12476973.
162. Mukamel, D. B., Chou, C.-C., Zimmer, J. G., & Rothenberg, B. M. (1997). The Effect of Accurate Patient Screening
on the Cost-Effectiveness of Case Management Programs. The Gerontologist, 37(6), 777–784.
doi:10.1093/geront/37.6.777
163. Munivenkatappa RB, Schweitzer EJ, Papadimitriou JC, Drachenberg CB, Thom KA, Perencevich EN, Haririan A,
Rasetto F, Cooper M, Campos L, Barth RN, Bartlett ST, Philosophe B. The Maryland aggregate pathology index: a
deceased donor kidney biopsy scoring system for predicting graft failure. Am J Transplant. 2008 Nov;8(11):2316-24.
doi: 10.1111/j.1600-6143.2008.02370.x. Epub 2008 Sep 17. PMID: 18801024.
164. Murphy KC, Kay D, Davenport DL, Bernard A. Decision Tool for Predicting Outcomes in Geriatric Acute Mesenteric
Ischemia. Am Surg. 2018 Aug 1;84(8):1247-1251. PMID: 30185294.
165. National Institutes of Health. STRIDES (2019) [Website]. Available at: https://datascience.nih.gov/strides (2019).
166. Neidert MC, Lawton MT, Mader M, Seifert B, Valavanis A, Regli L, Bozinov O, Burkhardt JK. The AVICH Score: A
Novel Grading System to Predict Clinical Outcome in Arteriovenous Malformation-Related Intracerebral Hemorrhage.
World Neurosurg. 2016 Aug;92:292-297. doi: 10.1016/j.wneu.2016.04.080. Epub 2016 May 2. PMID: 27150647.
167. NHS Digital (2020) Personalised Health and Care 2020 strategy. [Website] Available at:
https://www.gov.uk/government/publications/personalised-health-and-care-2020
168. Nishida T, Sonoda H, Oishi Y, Tanoue Y, Nakashima A, Shiokawa Y, Tominaga R. The novel EuroSCORE II algorithm
predicts the hospital mortality of thoracic aortic surgery in 461 consecutive Japanese patients better than both the
original additive and logistic EuroSCORE algorithms. Interact Cardiovasc Thorac Surg. 2014 Apr;18(4):446-50. doi:
10.1093/icvts/ivt524. Epub 2013 Dec 23. PMID: 24368550; PMCID: PMC3957283.
169. Noren DP, Long BL, Norel R, Rrhissorrakrai K, Hess K, Hu CW, Bisberg AJ, Schultz A, Engquist E, Liu L, Lin X, Chen
GM, Xie H, Hunter GA, Boutros PC, Stepanov O; DREAM 9 AML-OPC Consortium, Norman T, Friend SH, Stolovitzky
G, Kornblau S, Qutub AA. A Crowdsourcing Approach to Developing and Assessing Prediction Algorithms for AML
Prognosis. PLoS Comput Biol. 2016 Jun 28;12(6):e1004890. doi: 10.1371/journal.pcbi.1004890. PMID: 27351836;
PMCID: PMC4924788.
170. Noureldin YA, Elkoushy MA, Andonian S. Which is better? Guy's versus S.T.O.N.E. nephrolithometry scoring systems
in predicting stone-free status post-percutaneous nephrolithotomy. World J Urol. 2015 Nov;33(11):1821-5. doi:
10.1007/s00345-015-1508-5. Epub 2015 Feb 13. PMID: 25678344.
171. Onal B, Tansu N, Demirkesen O, Yalcin V, Huang L, Nguyen HT, Cilento BG, Erozenci A. Nomogram and scoring
system for predicting stone-free status after extracorporeal shock wave lithotripsy in children with urolithiasis. BJU Int.
2013 Feb;111(2):344-52. doi: 10.1111/j.1464-410X.2012.11281.x. Epub 2012 Jun 6. PMID: 22672514.
172. Onoe S, Maeda A, Takayama Y, Fukami Y, Kaneoka Y. A preoperative predictive scoring system to predict the ability
to achieve the critical view of safety during laparoscopic cholecystectomy for acute cholecystitis. HPB (Oxford). 2017
May;19(5):406-410. doi: 10.1016/j.hpb.2016.12.013. Epub 2017 Jan 20. PMID: 28117229.
173. Oosterveld M, Suciu S, Muus P, Germing U, Delforge M, Belhabri A, Aul C, Selleslag D, Ferrant A, Marie JP, Amadori
S, Jehn U, Mandelli F, Hess U, Hellström-Lindberg E, Cakmak-Wollgast S, Vignetti M, Labar B, Willemze R, de Witte
T. Specific scoring systems to predict survival of patients with high-risk myelodysplastic syndrome (MDS) and de novo
acute myeloid leukemia (AML) after intensive antileukemic treatment based on results of the EORTC-GIMEMA AML-
10 and intergroup CRIANT studies. Ann Hematol. 2015 Jan;94(1):23-34. doi: 10.1007/s00277-014-2177-y. Epub 2014
Aug 7. PMID: 25096636.
174. Ozgor F, Yanaral F, Savun M, Ozdemir H, Sarilar O, Binbay M. Comparison of STONE, CROES and Guy's
nephrolithometry scoring systems for predicting stone-free status and complication rates after percutaneous
47
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
nephrolithotomy in obese patients. Urolithiasis. 2018 Oct;46(5):471-477. doi: 10.1007/s00240-017-1003-0. Epub 2017
Jul 29. PMID: 28756459.
175. Panattoni, L. E., Vaithianathan, R., Ashton, T., & Lewis, G. H. (2011). Predictive risk modelling in health: options for
New Zealand and Australia. Australian Health Review, 35(1), 45. doi:10.1071/ah09845
176. Panayiotopoulos YP, Edmondson RA, Reidy JF, Taylor PR. A scoring system to predict the outcome of long
femorodistal arterial bypass grafts to single calf or pedal vessels. Eur J Vasc Endovasc Surg. 1998 May;15(5):380-6.
doi: 10.1016/s1078-5884(98)80197-4. PMID: 9633491.
177. Panch, T., Szolovits, P., & Atun, R. (2018). Artificial intelligence, machine learning and health systems. Journal of
Global Health, 8(2). doi:10.7189/jogh.08.020303
178. Panesar, A. (2019). Machine Learning and AI for Healthcare. doi:10.1007/978-1-4842-3799-1
179. Parikh, R.B., Obermeyer, Z., Navathe, A.S. (2019). Regulation of predictive analytics in medicine. Science 363 (6429),
810-812. DOI: 10.1126/science.aaw0029
180. Park JY, Moon KS, Lee KH, Lim SH, Jang WY, Lee H, Jung TY, Kim IY, Jung S. Gamma knife radiosurgery for elderly
patients with brain metastases: evaluation of scoring systems that predict survival. BMC Cancer. 2015 Feb 14;15:54.
doi: 10.1186/s12885-015-1070-y. PMID: 25885321; PMCID: PMC4333254.
181. Patterson BO, Holt PJ, Hinchliffe R, Nordon IM, Loftus IM, Thompson MM. Existing risk prediction methods for elective
abdominal aortic aneurysm repair do not predict short-term outcome following endovascular repair. J Vasc Surg. 2010
Jul;52(1):25-30. doi: 10.1016/j.jvs.2010.01.084. PMID: 20434296.
182. Peleg N, Sneh Arbib O, Issachar A, Cohen-Naftaly M, Braun M, Shlomai A. Noninvasive scoring systems predict
hepatic and extra-hepatic cancers in patients with nonalcoholic fatty liver disease. PLoS One. 2018 Aug
14;13(8):e0202393. doi: 10.1371/journal.pone.0202393. PMID: 30106985; PMCID: PMC6091950.
183. Peng, J., Wang, Z., Chen, W., Ding, Y., Wang, H., Huang, H., … Cai, S. (2010). Integration of genetic signature and
TNM staging system for predicting the relapse of locally advanced colorectal cancer. International Journal of Colorectal
Disease, 25(11), 1277–1285. doi:10.1007/s00384-010-1043-1
184. Perrotti A, Gatti G, Dorigo E, Sinagra G, Pappalardo A, Chocron S. Validation of a Predictive Scoring System for Deep
Sternal Wound Infection after Bilateral Internal Thoracic Artery Grafting in a Cohort of French Patients. Surg Infect
(Larchmt). 2017 Feb/Mar;18(2):181-188. doi: 10.1089/sur.2016.150. Epub 2016 Dec 8. PMID: 27929930.
185. PRISMA (2018) PRISMA for Scoping Reviews. Available at: http://www.prisma-
statement.org/Extensions/ScopingReviews
186. Putz C, Wiedenhöfer B, Gerner HJ, Fürstenberg CH. Tokuhashi prognosis score: an important tool in prediction of the
neurological outcome in metastatic spinal cord compression: a retrospective clinical study. Spine (Phila Pa 1976).
2008 Nov 15;33(24):2669-74. doi: 10.1097/BRS.0b013e318188b98f. PMID: 18981960.
187. Qi X, Zhang X, Li Z, Hui J, Xiang Y, Chen J, Zhao J, Li J, Qi FZ, Xu Y. HVPG signature: A prognostic and predictive
tool in hepatocellular carcinoma. Oncotarget. 2016 Sep 20;7(38):62789-62796. doi: 10.18632/oncotarget.11558.
PMID: 27566593; PMCID: PMC5308766.
188. Qian ZY, Hou XF, Xu DJ, Yang B, Chen ML, Chen C, Zhang FX, Shan QJ, Cao KJ, Zou JG. An algorithm to predict
the site of origin of focal atrial tachycardia. Pacing Clin Electrophysiol. 2011 Apr;34(4):414-21. doi: 10.1111/j.1540-
8159.2010.02980.x. Epub 2010 Nov 22. PMID: 21091746.
189. Quinn DI, Henshall SM, Haynes AM, Brenner PC, Kooner R, Golovsky D, Mathews J, O'Neill GF, Turner JJ, Delprado
W, Finlayson JF, Sutherland RL, Grygiel JJ, Stricker PD. Prognostic significance of pathologic features in localized
prostate cancer treated with radical prostatectomy: implications for staging systems and predictive models. J Clin
Oncol. 2001 Aug 15;19(16):3692-705. doi: 10.1200/JCO.2001.19.16.3692. PMID: 11504751.
190. Qureshi MA, Safian RD, Grines CL, Goldstein JA, Westveer DC, Glazier S, Balasubramanian M, O'Neill WW.
Simplified scoring system for predicting mortality after percutaneous coronary intervention. J Am Coll Cardiol. 2003
Dec 3;42(11):1890-5. doi: 10.1016/j.jacc.2003.06.014. PMID: 14662247.
191. Rades D, Conde-Moreno AJ, Cacicedo J, Veninga T, Gebauer N, Bartscht T, Schild SE. A predictive tool particularly
designed for elderly myeloma patients presenting with spinal cord compression. BMC Cancer. 2016 Apr 25;16:292.
doi: 10.1186/s12885-016-2325-y. PMID: 27112210; PMCID: PMC4845505.
192. Rapsang, A., & Shyam, D. (2014). Scoring systems in the intensive care unit: A compendium. Indian Journal of Critical
Care Medicine, 18(4), 220. doi:10.4103/0972-5229.130573
193. Roumen RM, Schers TJ, de Boer HH, Goris RJ. Scoring systems for predicting outcome in acute hemorrhagic
necrotizing pancreatitis. Eur J Surg. 1992 Mar;158(3):167-71. PMID: 1356457.
194. Rowan KM, Kerr JH, Major E, McPherson K, Short A, Vessey MP. Intensive Care Society's Acute Physiology and
Chronic Health Evaluation (APACHE II) study in Britain and Ireland: a prospective, multicenter, cohort study comparing
two methods for predicting outcome for adult intensive care patients. Crit Care Med. 1994 Sep;22(9):1392-401. doi:
10.1097/00003246-199409000-00007. PMID: 8062560.
195. Sabatine MS, Januzzi JL, Snapinn S, Théroux P, Jang IK. A risk score system for predicting adverse outcomes and
magnitude of benefit with glycoprotein IIb/IIIa inhibitor therapy in patients with unstable angina pectoris. Am J Cardiol.
2001 Sep 1;88(5):488-92. doi: 10.1016/s0002-9149(01)01724-6. PMID: 11524055.
196. Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. 2013 International Conference on Collaboration Technologies
and Systems (CTS). doi:10.1109/cts.2013.6567202
48
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
197. Sakurai M, Karigane D, Kasahara H, Tanigawa T, Ishida A, Murakami H, Kikuchi M, Kohashi S. Geriatric screening
tools predict survival outcomes in older patients with diffuse large B cell lymphoma. Ann Hematol. 2019 Mar;98(3):669-
678. doi: 10.1007/s00277-018-3551-y. Epub 2018 Nov 15. PMID: 30443764.
198. Sarzaeem MR, Mandegar MH, Roshanali F, Vedadian A, Saidi B, Alaeddini F, Tabarestani N. Scoring system for
predicting saphenous vein graft patency in coronary artery bypass grafting. Tex Heart Inst J. 2010;37(5):525-30. PMID:
20978562; PMCID: PMC2953219.
199. Scruth EA, Page K, Cheng E, Campbell M, Worrall-Carter L. Risk determination after an acute myocardial infarction:
review of 3 clinical risk prediction tools. Clin Nurse Spec. 2012 Jan-Feb;26(1):35-41. doi:
10.1097/NUR.0b013e31823bfafc. PMID: 22146272.
200. Senagore AJ, Warmuth AJ, Delaney CP, Tekkis PP, Fazio VW. POSSUM, p-POSSUM, and Cr-POSSUM:
implementation issues in a United States health care system for prediction of outcome for colon cancer resection. Dis
Colon Rectum. 2004 Sep;47(9):1435-41. doi: 10.1007/s10350-004-0604-1. Epub 2004 Jul 15. PMID: 15486738.
201. Sgarbura O, Tomulescu V, Popescu I. Robotic oncologic complexity score - a new tool for predicting complications in
computer-enhanced oncologic surgery. Int J Med Robot. 2016 Jun;12(2):296-302. doi: 10.1002/rcs.1664. Epub 2015
May 5. PMID: 25943703.
202. Shariat SF, Karakiewicz PI, Godoy G, Lerner SP. Use of nomograms for predictions of outcome in patients with
advanced bladder cancer. Ther Adv Urol. 2009 Apr;1(1):13-26. doi: 10.1177/1756287209103923. PMID: 21789050;
PMCID: PMC3126044.
203. Shariat SF, Karakiewicz PI, Suardi N, Kattan MW. Comparison of nomograms with other methods for predicting
outcomes in prostate cancer: a critical analysis of the literature. Clin Cancer Res. 2008 Jul 15;14(14):4400-7. doi:
10.1158/1078-0432.CCR-07-4713. PMID: 18628454.
204. Shen JY, Li C, Wen TF, Yan LN, Li B, Wang WT, Yang JY, Xu MQ. A simple prognostic score system predicts the
prognosis of solitary large hepatocellular carcinoma following hepatectomy. Medicine (Baltimore). 2016
Aug;95(31):e4296. doi: 10.1097/MD.0000000000004296. PMID: 27495033; PMCID: PMC4979787.
205. Siegel, C. A., Horton, H., Siegel, L. S., Thompson, K. D., Mackenzie, T., Stewart, S. K., … McGovern, D. P. (2015). A
validated web-based tool to display individualised Crohn’s disease predicted outcomes based on clinical, serologic
and genetic variables. Alimentary Pharmacology & Therapeutics, 43(2), 262–271. doi:10.1111/apt.13460
206. Smaniotto D, D'Agostino G, Luzi S, Valentini V, Macchia G, Mantini G, Margariti PA, Ferrandina G, Scambia G.
Concurrent 5-fluorouracil, mitomycin C and radiation with or without brachytherapy in recurrent cervical cancer: a
scoring system to predict clinical response and outcome. Tumori. 2005 Jul-Aug;91(4):295-301. PMID: 16277092.
207. Sotiropoulos GC, Lang H. Clinical scoring systems for predicting outcome after surgery for colorectal liver metastases:
towards a better multidisciplinary approach. Liver Int. 2009 Jan;29(1):6-9. doi: 10.1111/j.1478-3231.2008.01923.x.
Erratum in: Liver Int. 2009 Apr;29(4):617. PMID: 19120939.
208. Sotiropoulos GC, Miyazaki M, Konstadoulakis MM, Paul A, Molmenti EP, Gomatos IP, Radtke A, Baba HA,
Beckebaum S, Brokalaki EI, Ohtsuka M, Schwartz ME, Broelsch CE, Sgourakis G. Multicentric evaluation of a clinical
and prognostic scoring system predictive of survival after resection of intrahepatic cholangiocarcinomas. Liver Int.
2010 Aug;30(7):996-1002. doi: 10.1111/j.1478-3231.2010.02203.x. Epub 2010 Feb 5. PMID: 20141593.
209. Sprenger, M., Mettler, T., (2016). On the utility of E-health business model design patterns. Twenty-Fourth European
Conference on Information Systems (ECIS), İstanbul,Turkey, 2016. Available at:
https://www.alexandria.unisg.ch/248256/1/ECIS2016.pdf
210. Srinivas TR, Taber DJ, Su Z, Zhang J, Mour G, Northrup D, Tripathi A, Marsden JE, Moran WP, Mauldin PD. Big Data,
Predictive Analytics, and Quality Improvement in Kidney Transplantation: A Proof of Concept. Am J Transplant. 2017
Mar;17(3):671-681. doi: 10.1111/ajt.14099. Epub 2017 Jan 4. PMID: 27804279.
211. Stec S, Gorecki A, Zaborska B, Kulakowski P. A simple point score system for predicting the efficacy of external
rectilinear biphasic cardioversion for persistent atrial fibrillation. Europace. 2006 Apr;8(4):297-301. doi:
10.1093/europace/eul010. Epub 2006 Mar 16. PMID: 16627458.
212. Subramanyam R, Yeramaneni S, Hossain MM, Anneken AM, Varughese AM. Perioperative Respiratory Adverse
Events in Pediatric Ambulatory Anesthesia: Development and Validation of a Risk Prediction Tool. Anesth Analg. 2016
May;122(5):1578-85. doi: 10.1213/ANE.0000000000001216. PMID: 27101501.
213. Szövérfi Z, Lazary A, Bozsódi Á, Klemencsics I, Éltes PE, Varga PP. Primary Spinal Tumor Mortality Score (PSTMS):
a novel scoring system for predicting poor survival. Spine J. 2014 Nov 1;14(11):2691-700. doi:
10.1016/j.spinee.2014.03.009. Epub 2014 Mar 17. PMID: 24650850.
214. Tailly TO, Okhunov Z, Nadeau BR, Huynh MJ, Labadie K, Akhavein A, Violette PD, Olvera-Posada D, Alenezi H,
Amann J, Bird VG, Landman J, Smith AD, Denstedt JD, Razvi H. Multicenter External Validation and Comparison of
Stone Scoring Systems in Predicting Outcomes After Percutaneous Nephrolithotomy. J Endourol. 2016
May;30(5):594-601. doi: 10.1089/end.2015.0700. Epub 2016 Feb 5. PMID: 26728427.
215. Takaoka K, Nannya Y, Shinohara A, Arai S, Nakamura F, Kurokawa M. A novel scoring system to predict the incidence
of invasive fungal disease in salvage chemotherapies for malignant lymphoma. Ann Hematol. 2014 Oct;93(10):1637-
44. doi: 10.1007/s00277-014-2093-1. Epub 2014 Jun 8. PMID: 24908330.
216. Tanaskovic S, Radak D, Aleksic N, Calija B, Maravic-Stojkovic V, Nenezic D, Ilijevski N, Popov P, Vucurevic G, Babic
S, Matic P, Gajin P, Vasic D, Rancic Z. Scoring system to predict early carotid restenosis after eversion
49
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
50
Predictive Analytic Techniques and Big Data
for Improved Health Outcomes
236. Zhen C, Guoliang Q, Lishuang M, Zhen Z, Chen W, Jun Z, Shuli L, Kaoping G, Chao L, Xuan Y, Long L. Design and
validation of an early scoring system for predicting early outcomes of type III biliary atresia after Kasai's operation.
Pediatr Surg Int. 2015 Jun;31(6):535-42. doi: 10.1007/s00383-015-3710-3. Epub 2015 Apr 18. PMID: 25895075.
237. Zhu, Y., Xu, D., Zhang, Z., Dong, J., Zhou, Y., Zhang, W.-W., … Zhu, W.-W. (2018). A new laboratory-based algorithm
to predict microvascular invasion and survival in patients with hepatocellular carcinoma. International Journal of
Surgery, 57, 45–53. doi:10.1016/j.ijsu.2018.07.011
238. Zhuang J, Lian H, Zhao X, Zhang G, Gan W, Li X, Guo H. The application of PADUA scoring system for predicting
complications of laparoscopic renal cryoablation. Int Urol Nephrol. 2015 May;47(5):781-8. doi: 10.1007/s11255-015-
0943-y. Epub 2015 Mar 18. PMID: 25782623.
239. Zisman A, Pantuck AJ, Wieder J, Chao DH, Dorey F, Said JW, deKernion JB, Figlin RA, Belldegrun AS. Risk group
assessment and clinical outcome algorithm to predict the natural history of patients with surgically resected renal cell
carcinoma. J Clin Oncol. 2002 Dec 1;20(23):4559-66. doi: 10.1200/JCO.2002.05.111. PMID: 12454113.
240. Zimmerman, J. E., Kramer, A. A., McNair, D. S., & Malila, F. M. (2006). Acute Physiology and Chronic Health
Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients*. Critical Care Medicine, 34(5),
1297–1310. doi:10.1097/01.ccm.0000215112.84523.f0
241. Zollo MB, Moskop JC, Kahn CE Jr. Knowing the score: using predictive scoring systems in clinical practice. Am J Crit
Care. 1996 Mar;5(2):147-51. PMID: 8653166.
51