Machine Learning in Public Health A Review
Machine Learning in Public Health A Review
Department of Health Education Bangabandhu Sheikh Mujibur Rahamn Science and Technology University,
Gopalganj, Bangladesh
Abstract
In recent years Machine learning that has been used for disease diagnosis and prediction in public healthcare sector. It plays an essential
role in healthcare and is rapidly being applied to education. It is one of the driving forces in science and technology, but the emergence of big
data involves paradigm shifts in the implementation of machine learning techniques from traditional methods. Computers are now well
equipped to diagnose many health issues with the availability of large health care datasets and progressions in machine learning techniques.
Several machine learning techniques have been used by researchers in public health. Several of these methods, including Support Vector
Machines (SVM), Decision Trees (DT), Naïve Bayes (NB), Random Forest (RF) and K-Nearest Neighbors (KNN), are widely used in
predictive model design research, resulting in effective and accurate decision-making. The predictive models discussed here are based on
different supervised ML techniques as well as various input characteristics and data samples. Therefore, the predictive models can be used
to support healthcare professionals and patients globally to improve public health as well as global health. Finally we provide some basic
problems and challenges which face the researcher in public health.
Keywords
Machine learning • Prediction • Classification • Public Health • Disease
*Address to correspondence: MD Asadullah, Department of Health Education Bangabandhu Sheikh Mujibur Rahamn Science and Technology
University, Gopalganj, Bangladesh; Email: [email protected]
Copyright: © 2021 Asadullah MD, et al. This is an open-access article distributed under the terms of the creative commons attribution license which permits
unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
use methods for pattern recognition and identification such as Data schema Increasing the burden of disease,
machine learning. The repository should highlight the specifications multimorbidity and disability driven by aging
and epidemiological transition
of clinical machine learning tasks and thus motivate the ML
community by providing a platform for the publication, exchange / Ethics, laws and regulations Health care choices can have an
collection of data sets, benchmarking of statistical evaluators immediate impact on a person's well-being or
even life.
and methods for challenging machine learning problems. The
main purpose of applying the classification method is to allow Epidemic Social health inequalities, a small
number of local healthcare
healthcare organizations to provide accurate medication professionals, and a weak
quantities. At every stage of development and application of infrastructure for healthcare.
machine learning in advancing health, ethical design thinking is Big Data Data mining methods, advantages and
essential. To this end, honesty and innovation physicians will weaknesses of current works and
work closely with software and data scientists to re-imagine recommendations for future work
clinical medicine and foresee its ethical implications. It is crucial Treatment effect Treatment of patient outcomes in order
that data from mobile health and consumer-facing technologies be to select the correct treatment
systematically validated, especially in cases where dynamic Clinical Data Real clinical information environment,
intervention is provided [2]. Three developments in machine incomplete and erroneous data.
learning may be of interest to public health researchers and
Data regularity, timing and reliability The regularity, pacing and granularity of
practitioners. Machine Learning techniques have showed success available data is the control of population
in prediction and diagnosis of numerous critical diseases. Some set health.
of features are used in this strategy to represent each instance in Characteristics identifying The features of communities,
any dataset. Research comparing the quality of different ecosystems and policies are defined in
classification /prediction methods to predict the existence of population health
disease, disease etiology, or disease subtype is minimal. For Health Tackling Health as an approach to the existing
many types of medical diagnosis, a good machine learning situation and challenges.
approach to classification will apply. Dataset imbalance Forming an ensemble of multiple
models with matched numbers of
Page 2 of 5
Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021
positive and negative slides trained on data Impurity and missing The high-volume data also has
subsets. problems with impurity and missing values
Page 3 of 5
Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021
MLP, KNN, CART, SVM, NB. Breast cancer MLP=96.70% Heart Disease 5 fold
outside the sample to a new dataset also identified as test data. The Heart Disease 5 fold
reason for using cross-validation techniques is to fit it into a training
Breast cancer Disease 10 fold
dataset when we fit a model. Cross-validation was applied to achieve
the best results in order to measure the numerical performance of a Breast cancer Disease 5 fold
learning operator. This was not achieved to properly isolate and
Liver cancer Disease 10 fold
compare the performance of the different methods with respect to the
weighting of the propensity score. Through several steps, we Heart disease 10 fold
measured the quality of the various propensity score matching
Cancer Disease 5 fold
methods. The classifier's accuracy calculation is the average
accuracy of k-folds. Subsampling is done in bootstrap validation with Chronic kidney disease 10 fold
equivalent substitution from the training dataset. Effective use of the Heart Disease 5 fold
10-fold cross-validation was found to be a good and reasonable
compromise between offering accurate performance estimates and Alzheimer’s Disease 10 fold
Page 4 of 5
Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021
and thus risk confusing random noise or other confounding in the Moreover, overfitting and data imbalances are big problems in public
training data. This is a pervasive problem in numerical health. In our review paper we find some problems and challenges
machine learning because it is often possible to set the which keep in mind every public health researcher because most of
complexity of the model as high as required to achieve the research paper discussed about these problems and most of the
arbitrarily high prediction accuracy. Some of the limitations of researchers have faced these problems.
traditional medical scoring systems are the presence in the
input set of intrinsic linear combinations of variables, and
therefore they are not able to model complex nonlinear interactions References
in medical domains. In this study, this weakness is addressed by 1. Li, Wei, Chai Yuanbo, Khan Fazlullah, Ullah Jan Syed Rooh, and Sahil Verma,
using classification models that can implicitly detect complex et al. "A comprehensive survey on machine learning-based big data analytics
nonlinear associations between independent and dependent for IoT-enabled smart healthcare system." Mob Netw Appl 26, (2021): 1-19.
variables as well as the ability to identify any potential correlations 2. Panch, Trishan, Szolovits Peter, and Atun Rifat. "Artificial Intelligence,
between predictor variables. Machine Learning and Health Systems." J Glob Health" 8, (2018).
3. Nair, Lekha R, D Shetty Sujala, and D Shetty Siddhanth. "Applying Spark
Conclusion Based Machine Learning Model on Streaming Big Data for Health Status
Prediction." Comput Electr 65, (2018): 393-399.
To inform clinicians and policy makers, systems powered by 4. Kubota, Ken J, A Chen Jason, and Little Max A. "Machine Learning for Large‐
machine learning will have to deliver results of interest in Scale Wearable Sensor Data in Parkinson's Disease: Concepts, Promises,
action through clinical trials or real-world performance Pitfalls, and Futures." Mov Disord 31, (2016): 1314-1326.
observations. Eventually, classification approaches such as 5. Nakajima, Tetsushi, Katsumata Kenji, Kuwabara Hiroshi, and Soya Ryoko,
et al. "Urinary Polyamine Biomarker Panels with Machine-Learning
clustering and artificial neural networks would require a complete set Differentiated C olorectal C ancers, Benign Disease, and Healthy
of experiments. Most of the researcher used the traditional Controls." Int J Mol Sci 19, (2018): 756.
machine learning algorithm to analysis public health data like as
SVM, RF, NB, LR, NN, KNN, ANN and DT and 10-fold cross
validation provide the better results. But in public health a major
challenge is posed by the high volume of healthcare data. This is How to cite this article: Asadullah, MD, Rashid Mamunar, Basu
our big challenges in public health to handle big data. Besides there Priyanka and Murad Hossain MD. "Machine Learning in Public Health: A
are a lot of public health researcher facing problems. Most of the Review of the Problems and Challenges." J Health Edu Res Dev 9 (2021) :
27767
problems that have been found in different research paper is
classification problems in public health data.
Page 5 of 5