0% found this document useful (0 votes)
45 views5 pages

Machine Learning in Public Health A Review

Uploaded by

Bzar0720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Machine Learning in Public Health A Review

Uploaded by

Bzar0720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Review Article

Volume 9:2, 2021 Journal of Health Education Research &


Development
ISSN: 2380-5439 Open Access

Machine Learning in Public Health: A Review of the


Problems and Challenges
MD Asadullah*, Mamunar Rashid, Priyanka Basu, Md Murad Hossain

Department of Health Education Bangabandhu Sheikh Mujibur Rahamn Science and Technology University,
Gopalganj, Bangladesh

Abstract

In recent years Machine learning that has been used for disease diagnosis and prediction in public healthcare sector. It plays an essential
role in healthcare and is rapidly being applied to education. It is one of the driving forces in science and technology, but the emergence of big
data involves paradigm shifts in the implementation of machine learning techniques from traditional methods. Computers are now well
equipped to diagnose many health issues with the availability of large health care datasets and progressions in machine learning techniques.
Several machine learning techniques have been used by researchers in public health. Several of these methods, including Support Vector
Machines (SVM), Decision Trees (DT), Naïve Bayes (NB), Random Forest (RF) and K-Nearest Neighbors (KNN), are widely used in
predictive model design research, resulting in effective and accurate decision-making. The predictive models discussed here are based on
different supervised ML techniques as well as various input characteristics and data samples. Therefore, the predictive models can be used
to support healthcare professionals and patients globally to improve public health as well as global health. Finally we provide some basic
problems and challenges which face the researcher in public health.

Keywords
Machine learning • Prediction • Classification • Public Health • Disease

the feature descriptors. A number of different ML classifiers are


Introduction experimentally validated into a real data set in the present study [5].
Machine learning, a method of developing a prototype that learns Machine learning is involved in many of these, but streaming data is
to enhance its quality through experience, belongs to the context of only addressed in a few plays. The machine learning library consists
artificial intelligence and is increasingly being used in various fields of common learning algorithms such as classification, clustering,
of science [1]. Such algorithms can be applied to help track the collaborative sorting, etc., which is useful when dealing with problems
progress of a person, what variables make their symptoms worse, with machine learning. Machine learning typically extends these
predict how long they would take to be completely rehabilitated, etc. It methods to cope with high dimensionality and nonlinearity, which in
is likely to deliver technically superior results, but it is not going to be wearable sensor data is of particular importance. It overlaps with
perfect. As such, while machine learning can deliver superior artificial intelligence, but traditional biomedical statistics usually
technical performance, inequities can be compounded. The recognize the problems it seeks to solve. Extraction of the function
intervention was particularly effective among the group with a renders the issue of machine-learning traceable because it greatly
moderate likelihood of participation. Targeting using the results of the reduces the number of data dimensions. These techniques can help
prediction model using the machine-learning method has been useful enhance the ability to discriminate by combining multiple metabolites
in identifying suitable intervention targets. Traditional machine- ' predictive abilities. However, these methods are monitored and
learning approaches have been successful, mostly because the therefore various validations are key factors in preventing over fitting.
complexity of molecular interactions has been reduced by In this paper, a new approach is proposed to automatically identify
investigating only one or two dimensions of the molecular structure in fundus objects. The method uses pre-processing techniques for

*Address to correspondence: MD Asadullah, Department of Health Education Bangabandhu Sheikh Mujibur Rahamn Science and Technology
University, Gopalganj, Bangladesh; Email: [email protected]
Copyright: © 2021 Asadullah MD, et al. This is an open-access article distributed under the terms of the creative commons attribution license which permits
unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Received: 08 September, 2021; Accepted: 22 September, 2021; Published: 29 September, 2021


Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021

image and data to improve the performance of classifiers for machine


learning. Machine learning techniques are applied to these data, Challenges in Public Health
which are useful for data analysis and are used in specific Overall, health systems face multiple challenges: rising disease
fields. Recently it can be used to analyze medical data and are burden, multimorbidity and disability driven by aging and
useful for medical diagnosis to identify various complex diagnostic epidemiological transition, increased demand for health services,
problems. We can improve the accuracy, speed, reliability and higher social expectations, and increased health spending.
performance of the diagnosis on the current system by using Healthcare offers unique machine learning challenges where the
machine learning classification algorithms for any particular requirements for explaining ability, model fidelity, and performance in
disease. It has been used to estimate vegetation parameters and general are much higher than in most other fields. Ethical, legal and
to detect disease, with less consideration being given to the effects regulatory challenges are unique to health care since health care
of disease symptoms on their performance. decisions can have an immediate impact on a person's well-being or
even life. The primary focus in health informatics is on computational
aspects of big data, which includes challenges, current Big Data
Machine Learning in Public Health Mining techniques, strengths and limitations of current works, and an
Machine learning plays an essential role in the healthcare field and outline of directions for future work. A major challenge is posed by the
is rapidly being applied to healthcare, including segmentation high volume of healthcare data, the need for flexible processing and
of medical images, authentication of images, fusion of support for decentralized queries across multiple data sources.
multimodal images, computer-aided diagnosis, image-guided Global health as an approach to the current situation and challenges,
therapy, image classification and retrieval of image databases, and the use of digital health as an ideal way to address health
where failure could be fatal. Statistical models developed using challenges associated with conflict-affected environments. There are
machine-learning methods can be viewed in many ways as a number of ways in which the proposed models of machine learning
extensions from epidemiology and health econometrics of more can help address public health challenges. The regularity, reliability
conventional health services research methodologies. In view of and granularity of available data is a major challenge in tracking
the wide availability of free packages to support this work, many population health. Model estimates can play an important role in
researchers have been encouraged to apply deep learning to any strategic decision-making if they are able to achieve sufficient
data mining and pattern recognition topic related to health precision, and machine learning models can provide a route to this
informatics. In medical fields, machine learning has also shown required level of precision. Several writers describe different
promise when the aim is to discover clusters in the data such as challenges in public health
therapeutic choice imaging research. Here, the new features can be
Challenges Description
checked with a radiologist or neurologist expert assessment which
varies from the prediction environment where observed marks exist Development C hallenges in the acquisition of talent and
in the data. Screening and prognosis of patients with cancer growth capital

use methods for pattern recognition and identification such as Data schema Increasing the burden of disease,
machine learning. The repository should highlight the specifications multimorbidity and disability driven by aging
and epidemiological transition
of clinical machine learning tasks and thus motivate the ML
community by providing a platform for the publication, exchange / Ethics, laws and regulations Health care choices can have an
collection of data sets, benchmarking of statistical evaluators immediate impact on a person's well-being or
even life.
and methods for challenging machine learning problems. The
main purpose of applying the classification method is to allow Epidemic Social health inequalities, a small
number of local healthcare
healthcare organizations to provide accurate medication professionals, and a weak
quantities. At every stage of development and application of infrastructure for healthcare.
machine learning in advancing health, ethical design thinking is Big Data Data mining methods, advantages and
essential. To this end, honesty and innovation physicians will weaknesses of current works and
work closely with software and data scientists to re-imagine recommendations for future work

clinical medicine and foresee its ethical implications. It is crucial Treatment effect Treatment of patient outcomes in order
that data from mobile health and consumer-facing technologies be to select the correct treatment
systematically validated, especially in cases where dynamic Clinical Data Real clinical information environment,
intervention is provided [2]. Three developments in machine incomplete and erroneous data.
learning may be of interest to public health researchers and
Data regularity, timing and reliability The regularity, pacing and granularity of
practitioners. Machine Learning techniques have showed success available data is the control of population
in prediction and diagnosis of numerous critical diseases. Some set health.
of features are used in this strategy to represent each instance in Characteristics identifying The features of communities,
any dataset. Research comparing the quality of different ecosystems and policies are defined in
classification /prediction methods to predict the existence of population health

disease, disease etiology, or disease subtype is minimal. For Health Tackling Health as an approach to the existing
many types of medical diagnosis, a good machine learning situation and challenges.
approach to classification will apply. Dataset imbalance Forming an ensemble of multiple
models with matched numbers of

Page 2 of 5
Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021

positive and negative slides trained on data Impurity and missing The high-volume data also has
subsets. problems with impurity and missing values

Biomarkers identify Build diagnostic, prognostic or guided therapy


predictive models Missing variables This results in the normal multivariate
methods, while machine-learning
Screening The area of early detection of cancer is approaches can still be appealing for other
packed with highlighting cautionary tales. reasons

Prediction The computer is equipped with a set of data


to improve the classification model after
Table 1: Public health Challenges. it can be used for future predictions

Problem Statement Table 2: Problem Statement in Public Health.


In public health, reducing constraints such as lack of resources
(human and logistic) in healthcare centers, high population Dataset
dispersion and lack of infrastructure. One problem with the concept of
"data health" is the lack of a practical idea of effective and efficient To generate the most effective results, machine learning
implementation of healthcare programs: each insurer has sought algorithms are used to analyze data over and over again. Machine
effective strategies through trials and errors [4]. The main problem is learning currently provides the basic machine for scrutinizing
the unstructured of the medical reports. High complexity and noise imaginative information. Today, medical clinics are very well
issues result from the multisource and multimodal nature of equipped with fully automatic machines, and these machines produce
healthcare data. Additionally, the high-volume data also has problems tremendous amounts of data, then collect and exchange these data
with impurity and missing values. All these issues are difficult to with information systems or doctors to take the necessary steps.
handle in terms of both size and reliability, although a range of Machine learning methods can be used to examine medical data and
methods have been developed to improve data accuracy and various technical diagnostic conditions can be found in medical
usability [2]. Machine learning methods are the leading option for diagnosis. Using machine learning, systems take patient data as an
achieving a better result in classification and prediction problems. In input such as symptoms, laboratory data and some of the important
a wide range of machine learning (ML) problems, classification plays attributes and produce reliable diagnostic results. Depending on the
a major role. Another major issue with the collection of data is the reliability of the test, the computer must determine the information for
potential lack of label accuracy. Over fitting is a potential problem in the future reference will be used as learning and qualified dataset.
machine learning. A general problem is that several of the existing Different Authors are used to different data determine the quality of
datasets are difficult to use in terms of permission. Table 2 displays the proposed classifiers which display.
the numerous public health issues facing them.
Problem Description Classification Technique
Classification The situation was linear in nature for all In many real-world issues, classification is one of the most
armed and unarmed group datasets
important decision-making techniques. The higher number of
Scalability Exists with two of the most widely used samples selected for many classification problems, but this does not
interpretable machine learning models lead to higher classification accuracy. Supervised machine-learning
algorithms are mainly used for classification or regression issues
Lack of infrastructure Lack of resources in health care where the patient sample class label is already available.
centers (human and logistic), high
population dispersion Classification tasks are found in a wide range of decision-making
tasks in various fields such as medicine, science, industry, etc.
Effective and Efficient Through trial and error, every insurer tried
effective strategies
Several approaches are suggested in the literature on how to solve
classification problems [3]. In medical context, the identification
Exchange health information securely Scientists and clinicians across quality of commonly used machine learning models, including k-
institutional, provincial, or even
national jurisdictional boundaries across Nearest Neighbors, Nave Bayes, Decision Tree, Random Forest,
a given healthcare organization. Support Vector Machine and Logistic Regression. In this research
paper we conclude various research papers in a tabular form
Over fitting Because of its storage limitation, it may (Table-4) showing different methodologies and compare accuracy
not be appropriate for very large datasets
with high dimensional features Technique Disease Name Highest Accuracy

SVM,RF,KNN,DT Parkinson’s SVM=97.22%


Data Imbalanced Which are commonly used to resolve big
data clinical databases. NB,KNN,C4,5DT,RF,SV M Liver Disease KNN=98.6%

Clinical unstructured notes The multisource and multimodality of health


care data leads to high complexity and LR,Adaboost,SVM,DT, DB SVM=94.4%
noise problems
SVM,ANN Malaria SVM=89%

DNN Diabetes DNN=83.67%

Page 3 of 5
Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021

MLP, KNN, CART, SVM, NB. Breast cancer MLP=96.70% Heart Disease 5 fold

Alzheimer’s Disease 10 fold


NB,LS-SVM, Breast cancer Adaboost=99.08%
Adabag,Adaboost,
Table 5: Summary of validation Technique in Public Health.
BN,LR,MLP,SMO,DT Liver cancer SMO=93.33%

NB,SVM,RF,LR,ANN Heart disease. SVM=97.53%


Model Evaluation Technique
MLP,SVM,KNN,C4.5,RF Cancer RF=99.45%
After the estimation, the performance of the predictive models is
LR,NN,VM Chronic kidney VM=97.8%
evaluated in terms of accuracy, accuracy and recall of unseen data
KNN,SVM,RF, Adaboost Heart Disease RF=95.24% using k-fold cross validation technique to test their abilities.
Classification performance is evaluated by evaluating the precision,
PCA-KNN,PCA-SVM, EM- Breast Cancer EM-PCA-Fuzzy Rule-
C
P A-Fuzzy Rule-Based Based=93.6% sensitivity and specificity of each system as it is a widely accepted
tool of classification performance evaluation and generalization error
SVM, GEPSVM, TSVM Alzheimer’s TSVM=92.75%
estimation. It is important to mention that the F1 score can be
affected by distorted class ratios when used as a quality indicator.
SVM,L1-Logistic,L2- Alzheimer’s SVM=73.33% Both AUC and F1 scores are compared using paired t-tests to
Logistic,RF,RUSRF
updated Bonferroni inference thresholds. Here we can summarize
RF,SVM,AB,BT,GL Diabetes RUSRF=90.60% different methods of performance evaluation as below
Disease Name Validation Methods
Table 4: Techniques are used in Public Health.
Parkinson Disease 10 fold

Cross Validation Technique Liver Disease 10 fold

Diabetes Disease 10 fold


The predictive performance of the models is evaluated using
Cross-Validation technique to estimate how each model performs Malaria Disease 5 fold

outside the sample to a new dataset also identified as test data. The Heart Disease 5 fold
reason for using cross-validation techniques is to fit it into a training
Breast cancer Disease 10 fold
dataset when we fit a model. Cross-validation was applied to achieve
the best results in order to measure the numerical performance of a Breast cancer Disease 5 fold
learning operator. This was not achieved to properly isolate and
Liver cancer Disease 10 fold
compare the performance of the different methods with respect to the
weighting of the propensity score. Through several steps, we Heart disease 10 fold
measured the quality of the various propensity score matching
Cancer Disease 5 fold
methods. The classifier's accuracy calculation is the average
accuracy of k-folds. Subsampling is done in bootstrap validation with Chronic kidney disease 10 fold
equivalent substitution from the training dataset. Effective use of the Heart Disease 5 fold
10-fold cross-validation was found to be a good and reasonable
compromise between offering accurate performance estimates and Alzheimer’s Disease 10 fold

being computationally feasible and preventing over fitting [4].


Table 6: Summary of Performance Evaluation Methods.
Disease Name Validation Methods

Parkinson Disease 10 fold

Liver Disease 10 fold


Limitations
While the application of machine learning approaches to
Diabetes Disease 10 fold
healthcare problems is unavoidable given the complexity of
Malaria Disease 5 fold processing massive amounts of data, the need to standardize
standards of interpretable ML in this field is critical Although very
Heart Disease 5 fold
broad, these data sets can also be very limited (e.g., system data can
Breast cancer Disease 10 fold only be accessible for a small subset of individuals). Several methods
Breast cancer Disease 5 fold
of machine learning effectively address these limitations but are still
subject to the usual sources of bias commonly found in experimental
Liver cancer Disease 10 fold studies [5]. The limitation of using SVM is its interpretation,
Heart disease 10 fold computational costs for larger datasets, and SVM is essentially a
binary classifier. Simplified decision tree with four attributes for a
Cancer Disease 5 fold
multi-class decision problem. A model that is over fitted is more
Chronic kidney disease 10 fold complicated than the data can explain. For genuine disease-related
structure, an over fitted model may have too many free parameters

Page 4 of 5
Asadullah MD, et al. J Health Edu Res Dev, Volume 9:2, 2021

and thus risk confusing random noise or other confounding in the Moreover, overfitting and data imbalances are big problems in public
training data. This is a pervasive problem in numerical health. In our review paper we find some problems and challenges
machine learning because it is often possible to set the which keep in mind every public health researcher because most of
complexity of the model as high as required to achieve the research paper discussed about these problems and most of the
arbitrarily high prediction accuracy. Some of the limitations of researchers have faced these problems.
traditional medical scoring systems are the presence in the
input set of intrinsic linear combinations of variables, and
therefore they are not able to model complex nonlinear interactions References
in medical domains. In this study, this weakness is addressed by 1. Li, Wei, Chai Yuanbo, Khan Fazlullah, Ullah Jan Syed Rooh, and Sahil Verma,
using classification models that can implicitly detect complex et al. "A comprehensive survey on machine learning-based big data analytics
nonlinear associations between independent and dependent for IoT-enabled smart healthcare system." Mob Netw Appl 26, (2021): 1-19.
variables as well as the ability to identify any potential correlations 2. Panch, Trishan, Szolovits Peter, and Atun Rifat. "Artificial Intelligence,
between predictor variables. Machine Learning and Health Systems." J Glob Health" 8, (2018).
3. Nair, Lekha R, D Shetty Sujala, and D Shetty Siddhanth. "Applying Spark
Conclusion Based Machine Learning Model on Streaming Big Data for Health Status
Prediction." Comput Electr 65, (2018): 393-399.
To inform clinicians and policy makers, systems powered by 4. Kubota, Ken J, A Chen Jason, and Little Max A. "Machine Learning for Large‐
machine learning will have to deliver results of interest in Scale Wearable Sensor Data in Parkinson's Disease: Concepts, Promises,
action through clinical trials or real-world performance Pitfalls, and Futures." Mov Disord 31, (2016): 1314-1326.
observations. Eventually, classification approaches such as 5. Nakajima, Tetsushi, Katsumata Kenji, Kuwabara Hiroshi, and Soya Ryoko,
et al. "Urinary Polyamine Biomarker Panels with Machine-Learning
clustering and artificial neural networks would require a complete set Differentiated C olorectal C ancers, Benign Disease, and Healthy
of experiments. Most of the researcher used the traditional Controls." Int J Mol Sci 19, (2018): 756.
machine learning algorithm to analysis public health data like as
SVM, RF, NB, LR, NN, KNN, ANN and DT and 10-fold cross
validation provide the better results. But in public health a major
challenge is posed by the high volume of healthcare data. This is How to cite this article: Asadullah, MD, Rashid Mamunar, Basu
our big challenges in public health to handle big data. Besides there Priyanka and Murad Hossain MD. "Machine Learning in Public Health: A
are a lot of public health researcher facing problems. Most of the Review of the Problems and Challenges." J Health Edu Res Dev 9 (2021) :
27767
problems that have been found in different research paper is
classification problems in public health data.

Page 5 of 5

You might also like