Impact of Big Data Analytics On People's Health: Overview of Systematic Reviews and Recommendations For Future Studies
Impact of Big Data Analytics On People's Health: Overview of Systematic Reviews and Recommendations For Future Studies
Review
Israel Júnior Borges do Nascimento1,2, ClinPath, PharmB; Milena Soriano Marcolino3,4, MD, MSc, PhD; Hebatullah
Mohamed Abdulazeem5, MBBS; Ishanka Weerasekara6,7, PhD; Natasha Azzopardi-Muscat8, MD, MPH, MSc, PhD;
Marcos André Gonçalves9, PhD; David Novillo-Ortiz8, MLIS, MSc, PhD
1
School of Medicine, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
2
Department of Medicine, School of Medicine, Medical College of Wisconsin, Wauwatosa, WI, United States
3
Department of Internal Medicine, University Hospital, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
4
School of Medicine and Telehealth Center, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
5
Department of Sport and Health Sciences, Technical University Munich, Munich, Germany
6
School of Health Sciences, Faculty of Health and Medicine, The University of Newcastle, Callaghan, Australia
7
Department of Physiotherapy, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, Sri Lanka
8
Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Copenhagen, Denmark
9
Department of Computer Science, Institute of Exact Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Corresponding Author:
David Novillo-Ortiz, MLIS, MSc, PhD
Division of Country Health Policies and Systems
World Health Organization, Regional Office for Europe
Marmorej 51
Copenhagen, 2100
Denmark
Phone: 45 61614868
Email: [email protected]
Abstract
Background: Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects
on public health.
Objective: The aim of this study was to assess the impact of the use of big data analytics on people’s health based on the health
indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European
Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2–related studies. Furthermore,
we sought to identify the most relevant challenges and opportunities of these tools with respect to people’s health.
Methods: Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science,
Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the
effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data
extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist.
Results: The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000
patients. Most of the included studies used patient data collected from electronic health records, hospital information systems,
private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases.
“Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease” and “suicide mortality rate” were the
most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the
EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of
diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors;
and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results
was rated as “critically low” for 25 reviews, as “low” for 7 reviews, and as “moderate” for 3 reviews. The most frequently
identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized
database for patient data.
Conclusions: Although the overall quality of included studies was limited, big data analytics has shown moderate to high
accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time
analyses of large sets of varied input data to diagnose and predict disease outcomes.
Trial Registration: International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048;
https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048
KEYWORDS
public health; big data; health status; evidence-based medicine; big data analytics; secondary data analysis; machine learning;
systematic review; overview; World Health Organization
Textbox 1. List of 46 World Health Organization health indicators defined at the Thirteenth General Programme of Work.
• Hepatitis B incidence (measured by surface antigen [HBsAg] prevalence among children under 5 years)
• Probability of dying from any of cardiovascular disease (CVD), cancer, diabetes, chronic renal disease (CRD) (aged 30-70 years) (%)
• Total alcohol per capita consumption in adults aged >15 years (liters of pure alcohol)
• Proportion of women (aged 15-49 years) having need for family planning satisfied with modern methods (%)
• Population with household expenditures on health >10% of total household expenditure or income (%)
• Mortality rate attributed to exposure to unsafe water, sanitation, and hygiene (WASH) services (per 100,000 population)
• Proportion of population covered by all vaccines included in national programs (diphtheria-tetanus-pertussis vaccine, measles-containing-vaccine
second dose, pneumococcal conjugated vaccine) (%)
• Proportion of health facilities with essential medicines available and affordable on a sustainable basis (%)
• Density of health workers (doctors, nurse and midwives, pharmacists, dentists per 10,000 population)
• Proportion of children under 5 years developmentally on track (health, learning, and psychosocial well-being) (%)
• Proportion of women (aged 15-49 years) subjected to violence by current or former intimate partner (%)
• Proportion of women (aged 15-49 years) who make their own decisions regarding sexual relations, contraceptive use, and reproductive health
care (%)
• Proportion of population using safely managed sanitation services and hand-washing facilities (%)
• Annual mean concentrations of fine particulate matter (PM2.5) in urban areas (μg/m3)
• Proportion of children (aged 1-17 years) experiencing physical or psychological aggression (%)
• Proportion of vulnerable people in fragile settings provided with essential health services (%)
The purposes of the reviews varied broadly. Generally, they (1) Mental Health
outlined AI applications in different medical specialties; (2) Five reviews reported on AI, data mining, and ML in
analyzed features for the detection, prediction, or diagnosis of psychiatry/psychology [12,14,19,25,45], most commonly
multiple diseases or conditions; or (3) pinpointed challenges assessing these techniques in the diagnosis of mental disorders.
and opportunities. Two reviews assessed the use of ML algorithms for predicting
WHO Indicators and Core Priorities suicidal behaviors. High levels of risk classification accuracy
(typically higher than 90%) were reported in two reviews, either
Most of the studies assessed the effects of big data analytics on
for adult primary care patients or teenagers [19,25]. Although
noncommunicable diseases [12-15,17,21,22,24,27,31,32,34,36,
the review authors stated the potential of ML techniques in daily
38,40-44]. Furthermore, three reviews covered mental health,
clinical practice, limitations were highlighted, including no
associated with the indicator “suicide mortality rate” [19,25,45];
external validation and reporting inconsistencies.
three studies were related to the indicator “probability of dying
from any of cardiovascular, cancer, diabetes, or chronic renal The use of ML algorithms for early detection of psychiatric
disease” [16,18,20,28,29]; and two studies were related to the conditions was also reported [12,45]. ML was used to develop
indicator “proportion of bloodstream infections due to prediagnosis algorithms for constructing risk models to signal
antimicrobial-resistant organisms” [26,33]. One study described a patient’s predisposition or risk for a psychiatric/psychological
technology use in disaster management and preparedness, health issue, for predicting a diagnosis of newly identified
covering the “number of persons affected by disasters” indicator patients, and to differentiate mental conditions with overlapping
[11], and one study was associated with the indicator “maternal symptomatology. For studies using structural neuroimaging to
mortality ratio” [30]. Overlap made precise classification into classify bipolar diseases and other diagnoses, the accuracy
WHO health indicators challenging, and four studies could not ranged from 52.13% to 100%, whereas studies using serum
be categorized because they mainly described challenges or biomarkers reported an accuracy ranging from 72.5% to 77.5%.
opportunities in big data analytics [23,39] or because they were
Only one review used social media to generate analyzable data
related to the COVID-19 pandemic [35,37].
on the prevention, recognition, and support for severe mental
Diseases or Conditions Assessed illnesses [14]. The study included broad descriptions of ML
techniques and data types for detection, diagnosis, prognosis,
Diabetes Mellitus treatment, support, and resulting public health implications. The
AI tools associated with big data analytics in the care of patients authors highlighted the potential for monitoring well-being, and
with diabetes mellitus (DM) were assessed in six reviews that providing an ecologically and cost-efficient evaluation of
included 345 primary studies [15,20,32,38,40]. Three studies community mental health through social media and electronic
reviewed AI in screening and diagnosing type 1 or type 2 DM, records.
providing varied ranges of accuracy, sensitivity, and specificity
[20,32,40]. Variables included systolic blood pressure, body COVID-19
mass index, triglyceride levels, and others. Two reviews covered Two reviews reported the application of big data analytics and
DM control and the clinical management of DM patients [32,40]. ML to better understand the current novel coronavirus pandemic
One noted that techniques for diabetes self-management varied [35,37]. One assessed data mining and ML techniques in
among the tools evaluated and reported mean values for its diagnosing COVID-19 cases. Although the study did not define
robust metrics [18]. The other evaluated the use of data-driven the best methodology to evaluate and detect potential cases, the
tools for predicting blood glucose dynamics and the impact of authors noted an elevated frequency of decision tree models,
ML and data mining [20], describing the input parameters used naïve Bayes classifiers, and SVM algorithms used during
among data-driven analysis models. However, the authors of previous pandemics.
these reviews concluded that achieving a methodologically Another review focused on SARS-CoV-2 immunization, and
precise predictive model is challenging and must consider proposed that AI could expedite vaccine discovery through
multiple parameters. studying the virus’s capabilities, virulence, and genome using
Various studies assessed the ability of big data analytics to genetic databanks. That study merged discussions of deep
predict individual DM complications such as hypoglycemia, learning–based drug screening for predicting the interaction
nephropathy, and others [15,32,38]. Supervised ML methods, between protein and ligands, and using imaging results linked
decision trees, deep neural networks, random forests (RF) to AI tools for detecting SARS-CoV-2 infections.
learning, and support vector machine (SVM) reportedly had the
Oncology
best outcomes for assessing complications. One review assessed
deep learning–based algorithms in screening patients for diabetic Four studies described the utility of ML, computerized clinical
retinopathy. Of 11 studies, 8 reported sensitivity and specificity decision systems, and deep learning in oncology [24,28,29,31].
of 80.3% to 100% and 84%% to 99%, respectively; two reported Using computerized clinical decision support systems (DSS)
accuracies of 78.7% and 81%; and one reported an area under significantly improves process outcomes in oncology [24]. A
the receiver operating curve (AUC) of 0.955 [15]. compelling example shows that initial decisions were modified
in 31% of cases after consultation of clinical DSS, which
consistently resulted in improved patient management.
Furthermore, implementing clinical DSS led to an average cost
Textbox 2. Current challenges to use big data tools for peoples’ health, and future perspectives and opportunities.
Current Challenges
1. Data structure: issues with fragmented data and incompatible or heterogeneous data formats
2. Data security: problems with privacy, lack of transparency, integrity, and inherent data structure
3. Data standardization: concerns with limited interoperability, data obtention, mining, and sharing, along with language barriers
4. Inaccuracy: issues with inconsistencies, lack of precision, and data timeliness
5. Limited awareness of big data analytics capabilities among health managers and health care professionals
6. Lack of evidence of big data analytics on the impact on clinical outcomes for peoples’ health
7. Lack of skills and training among professionals to collect, process, or extract data
8. Managerial issues: ownership and government dilemma, along with data management, organizational, and financial issues
9. Regulatory, political, and legal concerns
10. Expenses with data storage and transfer
Future Perspectives and Opportunities
1. To improve the decision-making process with real-time analytics
2. To improve patient-centric health care and to enhance personalized medicine
3. To support early detection of diseases and prognostic assessment by predicting epidemics and pandemics, improving disease monitoring,
implementing and tracking health behaviors, predicting patients’ vulnerabilities
4. To improve data quality, structure, and accessibility by enabling the improvement of rapid acquisition of large volumes and types of data, in a
transparent way, and the improvement of data error detection
5. To enable potential health care cost reduction
6. To improve quality of care by improving efficient health outcomes, reducing the waste of resources, increasing productivity and performance,
promoting risk reduction, and optimizing process management
7. To provide better forms to manage population health either through early detection of diseases or establishing ways to support health policy
makers.
8. To enhance fraud detection
9. To enhance health-threat detection plans by governmental entities
10. To support the creation of new research hypotheses
Many systematic reviews reported simple or inappropriate data analysis and ML experiments involves elevated model
evaluation measures for the task at hand. The most common complexity and commonly requires testing of several modeling
metric used to evaluate the performance of a classification algorithms [54]. The diversity of big data tools and ML
predictive model is accuracy, which is calculated as the algorithms requires proper standardization of protocols and
proportion of correct predictions in the test set divided by the comparative approaches. Additionally, the process of tuning
total number of predictions that were made on the test set. This the hyperparameters of the algorithms is not uniformly reported.
metric is easy to use and to interpret, as a single number Important characteristics essential for replicability and external
summarizes the model capability. However, accuracy values validation were not frequently available. Lastly, most of the
and error rate, which is simply the complement of accuracy, are studies provide little guidance to explain the results. Without
not adequate for skewed or imbalanced classification tasks (ie, knowing how and why the models achieve their results,
when the distribution of observations in the training dataset applicability and trust of the models in real-world scenarios are
across the classes is not equal), because of the bias toward the severely compromised. Therefore, we urge the testing and
majority class. When the distribution is slightly skewed, assessment of supervised, unsupervised, and semisupervised
accuracy can still be a useful metric; however, when the methodologies, with explanation and interpretation to justify
distribution is severely skewed, accuracy becomes an unreliable the results. Moreover, we encourage hyperparameter
measure of model performance. optimization to achieve adjusted improvement of models,
enhance model generalizations for untrained data, and avoid
For instance, in a binary classification task with a distribution
overfitting to increase predictive accuracy.
of (95%, 5%) for the classes (eg, healthy vs sick), a “dumb
classifier” that simply chooses the class “healthy” for all Only two published systematic reviews evaluated the impact
instances will have 95% of accuracy in this task, although the of big data analytics on the COVID-19 pandemic. Primary
most important issue in this task would be correctly classifying studies on COVID-19 are lacking, which indicates an
the “sick” class. Precision (also called the positive predictive opportunity to apply big data and ML to this and future
value), which captures the fraction of correctly classified epidemics/pandemics [35,37]. As of November 30, 2020, many
instances among the instances predicted for a given class (eg, published protocols were retrieved through a standard search
“sick”); recall or sensitivity, which captures the fraction of on PROSPERO. The titles of these review protocols showed
instances of a class (eg, “sick”) that were correctly classified; an intention to evaluate ML tools in diagnosis and prediction,
and F-measure, the harmonic mean of precision and recall the impact of telemedicine using ML techniques, and the use
calculated per class of interest, are more robust metrics for of AI-based disease surveillance [55].
several practical situations. The proper choice of an evaluation
Although DSS are an important application of big data analytics
metric should be carefully determined, as these indices ought
and may benefit patient care [56-58], only two reviews assessed
to be used by regulatory bodies for screening tests and not for
such systems [16,24]. One focused on predictive analytics for
diagnostic reasoning [52]. The most important issue is to choose
identifying patients at risk of drug-induced QTc interval
the appropriate (most robust) performance metric given the
prolongation, discussing the efficacy of a DSS that has shown
particularities of each case.
evidence of reduced prescriptions for QT interval–prolonging
Another pitfall identified among the included reviews was the drugs. Similarly, one study exploring the impact of DSS on
lack of reporting the precise experimental protocols used for quality care in oncology showed that implementing these
testing ML algorithms and the specific type of replication systems might positively impact physician-prescribing
performed. behaviors, health care costs, and clinician workload.
There is no formal tool for assessing quality and risk of bias in This overview of systematic reviews updates the available
big data studies. This is an area that is ripe for development. In evidence from multiple primary studies intersecting computer
Textbox 3, we summarize our recommendations for systematic science, engineering, medicine, and public health. We used a
reviews on the application of big data and ML for people’s comprehensive search strategy (performed by an information
health based on our experience, the findings of this systematic specialist) with a predefined published protocol, precise
review, and inspired by Cunha et al [53]. inclusion criteria, rigorous data extraction, and quality
assessment of retrieved records. We avoided reporting bias
High variability in the results was evident across different ML
through the dual and blinded examination of systematic reviews
techniques and approaches among the 35 reviews, even for those
and by having one review author standardizing the extracted
assessing the same disease or condition. Indeed, designing big
data.
Textbox 3. Recommendations for systematic reviews on the application of big data and machine learning for people’s health.
• Choose an appropriate evaluation measure for the task and data characteristics, and justify your choice
Different evaluation measures such as accuracy, area under the receiver operating characteristic curve, precision, recall, and F-measure capture different
aspects of the task and are influenced by data characteristics such as skewness (ie, imbalance), sampling bias, etc. Choose your measures wisely and
justify your choice based on the aforementioned aspects of the task and the data.
• Ensure the employment of appropriate experimental protocols/design to guarantee generalization of the results
Authors should use experimental protocols based on cross-validation or multiple training/validation/test splits of the employed datasets with more
than one repetition of the experimental procedure. The objective of this criterion is to analyze whether the study assesses the capacity of generalization
of each method compared in the experiments. The use of a single default split of the input dataset with only one training/test split does not fit this
requirement. Repetitions are essential to demonstrate the generalization of the investigated methods for multiple training and test sets, and to avoid
any suspicion of a “lucky” (single) partition that favors the authors’ method.
• Properly tune, and explicitly report the tuning process and values of the hyperparameters of all compared methods
The effectiveness of big data solutions and machine-learning methods is highly affected by the choice of the parameters of these methods (ie, parameter
tuning). The wrong or improper choice of parameters may make a highly effective method exhibit very poor behavior in a given task. Ideally, the
parameters should be chosen for each specific task and dataset using a partition of the training set (ie, validation), which is different from the dataset
used to train and to test the model. This procedure is known as cross-validation on the training set or nested cross-validation.
Even if the tuning of all methods is properly executed, this should be explicitly reported in the paper, with the exact values (or range of values) used
for each parameter and the best choices used. When the tuning information is missing or absent, it is impossible to determine whether the methods
have been implemented appropriately and if they have achieved their maximum potential in a given task. It is also impossible to assess whether the
comparison is fair, as some methods may have been used at their maximum capacity and others not.
Authors should employ statistical significance tests to contrast the compared strategies in their experimental evaluation. Statistical tests are essential
to assess whether the performance of the analyzed methods in the sample (ie, the considered datasets) is likely to reflect, with certain confidence, their
actual performance in the whole population. As such, they are key to support any claim of superiority of a particular method over others. Without
such tests, the relative performance observed in the sample cannot, by any means, be extrapolated to the population. The choice of the tests should
also reflect the characteristics of the data (ie, determining whether the data follow a normal distribution).
• Make the data and code freely available with proper documentation
One of the issues that hampers reproducibility of studies, and therefore scientific progress, is the lack of original implementation (with proper
documentation) of the methods and techniques, and the unavailability of the original data used to test the methods. Therefore, it is important to make
all data, models, code, documentation, and other digital artifacts used in the research available for others to reuse. The artifacts made available must be
sufficient to ensure that published results can be accurately reproduced.
• Report other dimensions of the problem such as model costs (time) and potential for explainability
Effectiveness of the solutions, as captured by accuracy-oriented measures, is not the only dimension that should be evaluated. Indeed, if the effectiveness
of the studied models is similar and sufficient for a given health-related application, other dimensions such as time efficiency (or the costs) to train
and deploy (test) the models are essential to evaluate the practical applicability of such solutions. Another dimension that may influence the decision
for the practical use of a big data or a machine-learning method in a real practical situation is the ability to understand why the model has produced
certain outputs (ie, explainability). Solutions such as those based on neural networks may be highly effective when presented with huge amounts of
data, but their training and deployment costs as well as their opaqueness may not make them the best choice for a given health-related application.
However, limitations exist. The inferior quality scores based approaches, even for the same disease or condition. The diversity
on the AMSTAR 2 tool might reflect incomplete reporting and of big data tools and ML algorithms require proper
lack of adherence to substandardized review methods. There is standardization of protocols and comparative approaches, and
neither an established bias risk tool specifically for big data or the process of tuning the hyperparameters of the algorithms is
ML studies nor any systematic way of presenting the findings not uniformly reported. Important characteristics essential for
of such studies. Furthermore, most studies provided a narrative replicability and external validation were not frequently
description of results, requiring summarization. Nevertheless, available.
all of the reviews were inspected by most authors, and the most
Additionally, the included reviews in this systematic review
relevant data were condensed in the text or in descriptive tables.
addressed different health-related tasks; however, studies
Big data analytics provide public health and health care with assessing the impact on clinical outcomes remain scarce. Thus,
powerful instruments to gather and analyze large volumes of evidence of applicability in daily medical practice is still needed.
heterogeneous data. Although research in this field has been Further studies should focus on how big data analytics impact
growing exponentially in the last decade, the overall quality of clinical outcomes and on creating proper methodological
evidence is found to be low to moderate. High variability of guidelines for reporting big data/ML studies, as well as using
results was observed across different ML techniques and robust performance metrics to assess accuracy.
Acknowledgments
We highly appreciate the efforts provided by our experienced librarian Maria Björklund from Lund University, who kindly
prepared the search strategy used in this research. In addition, we thank Anneliese Arno (University College of London and
Covidence Platform) for providing guidance in performing this research through Covidence. We also thank Raisa Eda de Resende,
Edson Amaro Júnior, and Kaíque Amâncio Alvim for helping the group with data extraction and double-checking the input data.
Authors' Contributions
IJBdN, MM, MG, NAM, and DNO designed the study. HA, IW, and IJBdN performed first- and second-stage screening, and
extracted the presented data. MM solved any disagreements. HA, IW, and IBdN carried out the quality assessment. IJBdN, MM,
MG, and DNO drafted the manuscript and its final version. DNO and NAM are staff members of the WHO. The authors alone
are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the
WHO.
Conflicts of Interest
None declared.
Multimedia Appendix 1
Search strategy used in the research.
[DOCX File , 26 KB-Multimedia Appendix 1]
Multimedia Appendix 2
Quality assessment judgment using the AMSTAR 2 tool.
[DOCX File , 28 KB-Multimedia Appendix 2]
Multimedia Appendix 3
Main characteristics of included studies.
[DOCX File , 38 KB-Multimedia Appendix 3]
Multimedia Appendix 4
Results and limitations of included systematic reviews.
[DOCX File , 53 KB-Multimedia Appendix 4]
References
1. Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform 2018 May 10;15(3):20170030
[FREE Full text] [doi: 10.1515/jib-2017-0030] [Medline: 29746254]
2. Panahiazar M, Taslimitehrani V, Jadhav A, Pathak J. Empowering personalized medicine with big data and semantic web
technology: promises, challenges, and use cases. Proc IEEE Int Conf Big Data 2014 Oct;2014:790-795 [FREE Full text]
[doi: 10.1109/BigData.2014.7004307] [Medline: 25705726]
3. 13th General Programme of Work (GPW13) WHO Impact Framework. World Health Organization. 2019. URL: https:/
/www.who.int/about/what-we-do/GPW13_WHO_Impact_Framework_Indicator_Metadata.pdf [accessed 2020-12-05]
4. The European Programme of Work, 2020–2025 – “United Action for Better Health”. World Health Organization, regional
office for Europe. 2020. URL: https://www.euro.who.int/en/health-topics/health-policy/european-programme-of-work/
about-the-european-programme-of-work/european-programme-of-work-20202025-united-action-for-better-health-in-europe
[accessed 2020-12-15]
5. Whitlock EP, Lin JS, Chou R, Shekelle P, Robinson KA. Using existing systematic reviews in complex systematic reviews.
Ann Intern Med 2008 May 20;148(10):776-782. [doi: 10.7326/0003-4819-148-10-200805200-00010] [Medline: 18490690]
6. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and
meta-analyses: the PRISMA statement. PLoS Med 2009 Jul 21;6(7):e1000097 [FREE Full text] [doi:
10.1371/journal.pmed.1000097] [Medline: 19621072]
7. Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Cochrane Bias Methods Group, Cochrane Statistical
Methods Group. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ 2011 Oct 18;343:d5928
[FREE Full text] [doi: 10.1136/bmj.d5928] [Medline: 22008217]
8. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of
randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999 Nov
27;354(9193):1896-1900. [doi: 10.1016/s0140-6736(99)04149-5] [Medline: 10584742]
9. Baştanlar Y, Ozuysal M. Introduction to machine learning. Methods Mol Biol 2014;1107:105-128. [doi:
10.1007/978-1-62703-748-8_7] [Medline: 24272434]
10. Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic
reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 2017 Sep 21;358:j4008
[FREE Full text] [doi: 10.1136/bmj.j4008] [Medline: 28935701]
11. Freeman JD, Blacker B, Hatt G, Tan S, Ratcliff J, Woolf TB, et al. Use of big data and information and communications
technology in disasters: an integrative review. Disaster Med Public Health Prep 2019 Apr;13(2):353-367. [doi:
10.1017/dmp.2018.73] [Medline: 30047353]
12. Librenza-Garcia D, Kotzian BJ, Yang J, Mwangi B, Cao B, Pereira Lima LN, et al. The impact of machine learning
techniques in the study of bipolar disorder: A systematic review. Neurosci Biobehav Rev 2017 Sep;80:538-554. [doi:
10.1016/j.neubiorev.2017.07.004] [Medline: 28728937]
13. Sprockel J, Tejeda M, Yate J, Diaztagle J, González E. Intelligent systems tools in the diagnosis of acute coronary syndromes:
A systemic review. Arch Cardiol Mex 2018;88(3):178-189. [doi: 10.1016/j.acmx.2017.03.002] [Medline: 28359602]
14. Shatte ABR, Hutchinson DM, Teague SJ. Machine learning in mental health: a scoping review of methods and applications.
Psychol Med 2019 Jul;49(9):1426-1448. [doi: 10.1017/S0033291719000151] [Medline: 30744717]
15. Nielsen KB, Lautrup ML, Andersen JKH, Savarimuthu TR, Grauslund J. Deep learning-based algorithms in screening of
diabetic retinopathy: a systematic review of diagnostic performance. Ophthalmol Retina 2019 Apr;3(4):294-304. [doi:
10.1016/j.oret.2018.10.014] [Medline: 31014679]
16. Tomaselli Muensterman E, Tisdale JE. Predictive analytics for identification of patients at risk for QT interval prolongation:
a systematic review. Pharmacotherapy 2018 Aug;38(8):813-821. [doi: 10.1002/phar.2146] [Medline: 29882591]
17. Bridge J, Blakey JD, Bonnett L. A systematic review of methodology used in the development of prediction models for
future asthma exacerbation. BMC Med Res Methodol 2020 Feb 05;20(1):22 [FREE Full text] [doi:
10.1186/s12874-020-0913-7] [Medline: 32024484]
18. El Idrissi T, Idri A, Bakkoury Z. Systematic map and review of predictive techniques in diabetes self-management. Int J
Inf Manage 2019 Jun;46:263-277. [doi: 10.1016/j.ijinfomgt.2018.09.011]
19. Bernert RA, Hilberg AM, Melia R, Kim JP, Shah NH, Abnousi F. Artificial intelligence and suicide prevention: a systematic
review of machine learning investigations. Int J Environ Res Public Health 2020 Aug 15;17(16):5929 [FREE Full text]
[doi: 10.3390/ijerph17165929] [Medline: 32824149]
20. Woldaregay AZ, Årsand E, Walderhaug S, Albers D, Mamykina L, Botsis T, et al. Data-driven modeling and prediction
of blood glucose dynamics: Machine learning applications in type 1 diabetes. Artif Intell Med 2019 Jul;98:109-134 [FREE
Full text] [doi: 10.1016/j.artmed.2019.07.007] [Medline: 31383477]
21. Tripoliti EE, Papadopoulos TG, Karanasiou GS, Naka KK, Fotiadis DI. Heart failure: diagnosis, severity estimation and
prediction of adverse events through machine learning techniques. Comput Struct Biotechnol J 2017;15:26-47 [FREE Full
text] [doi: 10.1016/j.csbj.2016.11.001] [Medline: 27942354]
22. Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am
Med Inform Assoc 2019 Jun 01;26(6):561-576 [FREE Full text] [doi: 10.1093/jamia/ocz009] [Medline: 30908576]
23. Kruse CS, Goswamy R, Raval Y, Marawi S. Challenges and opportunities of big data in health care: a systematic review.
JMIR Med Inform 2016 Nov 21;4(4):e38 [FREE Full text] [doi: 10.2196/medinform.5359] [Medline: 27872036]
24. Klarenbeek SE, Weekenstroo HH, Sedelaar JM, Fütterer JJ, Prokop M, Tummers M. The effect of higher level computerized
clinical decision support systems on oncology care: a systematic review. Cancers (Basel) 2020 Apr 22;12(4):1032 [FREE
Full text] [doi: 10.3390/cancers12041032] [Medline: 32331449]
25. Burke TA, Ammerman BA, Jacobucci R. The use of machine learning in the study of suicidal and non-suicidal self-injurious
thoughts and behaviors: A systematic review. J Affect Disord 2019 Feb 15;245:869-884. [doi: 10.1016/j.jad.2018.11.073]
[Medline: 30699872]
26. Fleuren LM, Klausch TLT, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, et al. Machine learning for the prediction
of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020 Mar;46(3):383-400
[FREE Full text] [doi: 10.1007/s00134-019-05872-y] [Medline: 31965266]
27. Harris M, Qi A, Jeagal L, Torabi N, Menzies D, Korobitsyn A, et al. A systematic review of the diagnostic accuracy of
artificial intelligence-based computer programs to analyze chest x-rays for pulmonary tuberculosis. PLoS One
2019;14(9):e0221339 [FREE Full text] [doi: 10.1371/journal.pone.0221339] [Medline: 31479448]
28. Patil S, Habib Awan K, Arakeri G, Jayampath Seneviratne C, Muddur N, Malik S, et al. Machine learning and its potential
applications to the genomic study of head and neck cancer-A systematic review. J Oral Pathol Med 2019 Oct;48(9):773-779.
[doi: 10.1111/jop.12854] [Medline: 30908732]
29. Pehrson L, Nielsen M, Ammitzbøl Lauridsen C. Automatic pulmonary nodule detection applying deep learning or machine
learning algorithms to the LIDC-IDRI database: a systematic review. Diagnostics (Basel) 2019 Mar 07;9(1):29 [FREE Full
text] [doi: 10.3390/diagnostics9010029] [Medline: 30866425]
30. Davidson L, Boland MR. Enabling pregnant women and their physicians to make informed medication decisions using
artificial intelligence. J Pharmacokinet Pharmacodyn 2020 Aug;47(4):305-318 [FREE Full text] [doi:
10.1007/s10928-020-09685-1] [Medline: 32279157]
31. Li D, Mikela Vilmun B, Frederik Carlsen J, Albrecht-Beste E, Ammitzbøl Lauridsen C, Bachmann Nielsen M, et al. The
performance of deep learning algorithms on automatic pulmonary nodule detection and classification tested on different
datasets that are not derived from LIDC-IDRI: a systematic review. Diagnostics (Basel) 2019 Nov 29;9(4):207 [FREE Full
text] [doi: 10.3390/diagnostics9040207] [Medline: 31795409]
32. Chaki J, Thillai Ganesh S, Cidham S, Ananda Theertan S. Machine learning and artificial intelligence based diabetes
mellitus detection and self-management: a systematic review. J King Saud Univ Comput Inf Sci 2020 Jul:In press [FREE
Full text] [doi: 10.1016/j.jksuci.2020.06.013]
33. Scardoni A, Balzarini F, Signorelli C, Cabitza F, Odone A. Artificial intelligence-based tools to control healthcare associated
infections: A systematic review of the literature. J Infect Public Health 2020 Aug;13(8):1061-1077 [FREE Full text] [doi:
10.1016/j.jiph.2020.06.006] [Medline: 32561275]
34. Luo G, Nkoy FL, Stone BL, Schmick D, Johnson MD. A systematic review of predictive models for asthma development
in children. BMC Med Inform Decis Mak 2015 Nov 28;15:99 [FREE Full text] [doi: 10.1186/s12911-015-0224-9] [Medline:
26615519]
35. Kannan S, Subbaram K, Ali S, Kannan H. The role of artificial intelligence and machine learning techniques: race for
COVID-19 vaccine. Arch Clin Infect Dis 2020 May 10;15(2):e103232. [doi: 10.5812/archcid.103232]
36. Gonçalves WGE, Dos Santos M, Lobato FMF, Ribeiro-Dos-Santos Â, de Araújo GS. Deep learning in gastric tissue diseases:
a systematic review. BMJ Open Gastroenterol 2020 Mar 26;7(1):e000371 [FREE Full text] [doi:
10.1136/bmjgast-2019-000371] [Medline: 32337060]
37. Albahri AS, Hamid RA, Alwan JK, Al-Qays ZT, Zaidan AA, Zaidan BB, et al. Role of biological data mining and machine
learning techniques in detecting and diagnosing the novel coronavirus (COVID-19): a systematic review. J Med Syst 2020
May 25;44(7):122 [FREE Full text] [doi: 10.1007/s10916-020-01582-x] [Medline: 32451808]
38. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine learning and data mining methods
in diabetes research. Comput Struct Biotechnol J 2017;15:104-116 [FREE Full text] [doi: 10.1016/j.csbj.2016.12.005]
[Medline: 28138367]
39. Galetsi P, Katsaliaki K, Kumar S. Values, challenges and future directions of big data analytics in healthcare: A systematic
review. Soc Sci Med 2019 Nov;241:112533. [doi: 10.1016/j.socscimed.2019.112533] [Medline: 31585681]
40. Abhari S, Niakan Kalhori SR, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial intelligence applications in type 2
diabetes mellitus care: focus on machine learning methods. Healthc Inform Res 2019 Oct;25(4):248-261 [FREE Full text]
[doi: 10.4258/hir.2019.25.4.248] [Medline: 31777668]
41. Arani LA, Hosseini A, Asadi F, Masoud SA, Nazemi E. Intelligent computer systems for multiple sclerosis diagnosis: a
systematic review of reasoning techniques and methods. Acta Inform Med 2018 Dec;26(4):258-264 [FREE Full text] [doi:
10.5455/aim.2018.26.258-264] [Medline: 30692710]
42. Layeghian Javan S, Sepehri MM, Aghajani H. Toward analyzing and synthesizing previous research in early prediction of
cardiac arrest using machine learning based on a multi-layered integrative framework. J Biomed Inform 2018 Dec;88:70-89
[FREE Full text] [doi: 10.1016/j.jbi.2018.10.008] [Medline: 30389440]
43. Wang W, Kiik M, Peek N, Curcin V, Marshall IJ, Rudd AG, et al. A systematic review of machine learning models for
predicting outcomes of stroke with structured data. PLoS One 2020;15(6):e0234722 [FREE Full text] [doi:
10.1371/journal.pone.0234722] [Medline: 32530947]
44. Murray NM, Unberath M, Hager GD, Hui FK. Artificial intelligence to diagnose ischemic stroke and identify large vessel
occlusions: a systematic review. J Neurointerv Surg 2020 Feb;12(2):156-164. [doi: 10.1136/neurintsurg-2019-015135]
[Medline: 31594798]
45. Alonso SG, de la Torre-Díez I, Hamrioui S, López-Coronado M, Barreno DC, Nozaleda LM, et al. Data mining algorithms
and techniques in mental health: a systematic review. J Med Syst 2018 Jul 21;42(9):161. [doi: 10.1007/s10916-018-1018-2]
[Medline: 30030644]
46. GBD 2016 Disease Injury Incidence Prevalence Collaborators. Global, regional, and national incidence, prevalence, and
years lived with disability for 328 diseases and injuries for 195 countries, 1990-2016: a systematic analysis for the Global
Burden of Disease Study 2016. Lancet 2017 Sep 16;390(10100):1211-1259 [FREE Full text] [doi:
10.1016/S0140-6736(17)32154-2] [Medline: 28919117]
47. Nass SJ, Levit LA, Gostin LO. Rule I of M (US) C on HR and the P of HITHP. In: The value, importance, and oversight
of health research. Washington, DC: National Academy of Sciences; 2009:334.
48. Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. J Bus
Res 2017 Jan;70:263-286. [doi: 10.1016/j.jbusres.2016.08.001]
49. Cassidy R, Singh NS, Schiratti P, Semwanga A, Binyaruka P, Sachingongu N, et al. Mathematical modelling for health
systems research: a systematic review of system dynamics and agent-based models. BMC Health Serv Res 2019 Nov
19;19(1):845 [FREE Full text] [doi: 10.1186/s12913-019-4627-7] [Medline: 31739783]
50. Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol 2019
May;20(5):e262-e273. [doi: 10.1016/S1470-2045(19)30149-4] [Medline: 31044724]
51. Lee S, Mohr NM, Street WN, Nadkarni P. Machine learning in relation to emergency medicine clinical and operational
scenarios: an overview. West J Emerg Med 2019 Mar;20(2):219-227 [FREE Full text] [doi: 10.5811/westjem.2019.1.41244]
[Medline: 30881539]
52. Llewelyn H. Sensitivity and specificity are not appropriate for diagnostic reasoning. BMJ 2017 Sep 06;358:j4071. [doi:
10.1136/bmj.j4071] [Medline: 28877897]
53. Cunha W, Mangaravite V, Gomes C, Canuto S, Resende E, Nascimento C, et al. On the cost-effectiveness of neural and
non-neural approaches and representations for text classification: A comprehensive comparative study. Inf Process Manage
2021 May;58(3):102481. [doi: 10.1016/j.ipm.2020.102481]
54. Deo RC. Machine learning in medicine. Circulation 2015 Nov 17;132(20):1920-1930 [FREE Full text] [doi:
10.1161/CIRCULATIONAHA.115.001593] [Medline: 26572668]
55. National Institute for Health Research: PROSPERO - International Prospective Register of Systematic Reviews. University
of York Centre for Reviews and Dissemination. URL: https://www.crd.york.ac.uk/prospero/ [accessed 2020-12-04]
56. Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev
Cardiol 2016 Jun;13(6):350-359. [doi: 10.1038/nrcardio.2016.42] [Medline: 27009423]
57. Tabary MY, Memariani A, Ebadati E. Chapter 3 - Developing a decision support system for big data analysis and cost
allocation in national healthcare. In: Dey N, Ashour AS, Bhat C, Fong SJ, editors. Healthcare data analytics and management.
Cambridge, MA: Academic Press; 2019:89-109.
58. Dagliati A, Tibollo V, Sacchi L, Malovini A, Limongelli I, Gabetta M, et al. Big data as a driver for clinical decision support
systems: a learning health systems perspective. Front Digit Humanit 2018 May 1;5:8. [doi: 10.3389/fdigh.2018.00008]
Abbreviations
AI: artificial intelligence
AUC: area under the receiver operating characteristic curve
AMSTAR 2: A Measurement Tool to Assess Systematic Reviews 2
CNN: convolutional neural network
DM: diabetes mellitus
DSS: decision support system
EPW: European Programme of Work
GPW13: Thirteenth General Programme of Work
ML: machine learning
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-analyses
QUOROM: Quality of Reporting of Meta-analyses
RF: random forest
SVM: support vector machine
WHO: World Health Organization
Edited by R Kukafka, G Eysenbach; submitted 19.01.21; peer-reviewed by Y Mejova, A Benis; comments to author 09.02.21; revised
version received 19.02.21; accepted 24.03.21; published 13.04.21
Please cite as:
Borges do Nascimento IJ, Marcolino MS, Abdulazeem HM, Weerasekara I, Azzopardi-Muscat N, Gonçalves MA, Novillo-Ortiz D
Impact of Big Data Analytics on People’s Health: Overview of Systematic Reviews and Recommendations for Future Studies
J Med Internet Res 2021;23(4):e27275
URL: https://www.jmir.org/2021/4/e27275
doi: 10.2196/27275
PMID:
©Israel Júnior Borges do Nascimento, Milena Soriano Marcolino, Hebatullah Mohamed Abdulazeem, Ishanka Weerasekara,
Natasha Azzopardi-Muscat, Marcos André Gonçalves, David Novillo-Ortiz. Originally published in the Journal of Medical
Internet Research (http://www.jmir.org), 13.04.2021. This is an open-access article distributed under the terms of the Creative
Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly
cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright
and license information must be included.