Paper CMBN
Paper CMBN
Copyright © 2023 the author(s). This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract: Diarrhea remains one of the leading causes of childhood morbidity and mortality worldwide, especially in
developing countries. Diarrhea is a digestive disorder characterized by the need to defecate more often than usual.
Diarrhea remains a major problem of public health in Indonesia, with 40% of cases occurring in children under five
years old. Bandung City is one of the largest cities in West Java Province with the sixth highest cases of diarrhea in
West Java. The high number of diarrhea cases in Bandung City is a serious concern in public health efforts, especially
among young children. The incidence of diarrhea in children under five in Bandung City in 2022 was recorded as high
as 6376 cases. Therefore, this study aims to cluster sub-districts in Bandung City based on factors that influence
diarrhea cases. Five factors are used, namely the prevalence of diarrhea in children under five years old, clean and
healthy living behaviour, healthy latrine facilities, population density, and many baby less than 6 months old are
exclusively breastfed . Based on the results of clustering with the Gaussian mixture model, 5 regional groups were
obtained based on similarity of characteristics with five factors. Cluster 2 are areas characterized by a high percentage
of households with healthy latrines and healthy living behaviour. Cluster 5 are areas characterized by a high prevalence
of diarrhea in children under five. The results of this clustering indicate the importance of interventions and strategies
by the government of Bandung city to prevent the increasing number of diarrhea cases among children under five.
Therefore, it can provide better insight into the distribution of diarrhea cases to achieve the Sustainable Development
Goals (SDGs) and improve the health status of children under five in Indonesia, particularly in Bandung City.
*
Corresponding author
E-mail address: [email protected]
Received November 14, 2023
1
2
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
1. INTRODUCTION
Diarrhea remains one of the leading causes of childhood morbidity and mortality worldwide,
especially in developing countries. Diarrhea is a digestive disorder characterised by the sufferer
passing stools more frequently than usual. Diarrhea is caused by a number of things, for example
Escherichia coli bacteria, viruses such as Rotavirus, parasites, food allergies, Crohn's disease, side
effects of certain drugs, and others. Diarrhea can be classified based on the frequency and duration
of bowel movements and the characteristics of the faeces [1]. The classification of diarrhea
includes acute diarrhea lasting 14 days, persistent diarrhea lasting > 14 days, and chronic diarrhea
lasting > 30 days [2][18]. Serious diarrhea can lead to malnutrition and in the most severe cases,
death due to lack of salt and water in the body [19][20]. Indonesia as the fourth most populous
country in the world is place to around 22 million children under five. The Health Ministry of the
Republic of Indonesia indicates diarrhea as the leading cause of death in children under 5 years of
age with a mortality rate of 10.7% in 2019 [3]. In addition to causing death, prolonged diarrhea
can lead to malnutrition and stunting in children [4].
Diarrhea remains a major problem of public health in Indonesia with 40% of cases occurring
in children under five years old. Bandung is one of the major cities with the sixth highest diarrhea
cases in West Java. The high incidence of diarrhea in Bandung City is a serious concern in efforts
to address public health, especially children under five. In 2022, the incidence of diarrhea in
children under five in Bandung City was recorded as many as 6376 cases served. Figure 1 shows
that Babakan Ciparay is the sub-district with the most diarrhea cases in Bandung City.
There are various factors that cause diarrheal diseases, one of which is hygiene problems
including improper sanitation facilities [17]. Germs that cause diarrhea are easily spread from one
person to another through contaminated water, food or objects [5]. Food hygiene is associated with
the development of diarrhea and malnutrition in low socioeconomic children [6]. In addition,
limited and inadequate sanitation facilities are likely to have a bad level of hygiene and may
increase the risk of diarrhea.
3
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES
500
400
300
CASES
200
100
Cibiru
Regol
Cidadap
Andir
Babakan Ciparay
Bandung Kulon
Batununggal
Bojong Kaler
Buah Batu
Cibeunying Kidul
Gedebage
Mandalajati
Sukasari
Ujung Berung
Cinambo
Coblong
Kiaracondong
Lengkong
Rancasari
Sukajadi
Antapani
Arcamanik
Astana Anyar
Bojongloa Kidul
Cibeunying Kaler
Cicendo
Bandung Kidul
Panyileukan
Sumur Bandung
Bandung Wetan
SUB-DISTRICT
Figure 1. High incidence of diarrhea among children under five in Bandung city
Various programmes have been carried out to reduce the occurrence of diarrhea, one of which
is the provision of clean water and sanitation in area that still difficult access to clean water. The
provision of sanitation and clean water the goal 6 of SDGs [31]. However, morbidity and mortality
rates from diarrhea remain high due to the high prevalence of contributing factors. Therefore, this
study aims to cluster sub-districts in Bandung City based on the percentage of diarrhea prevalence,
percentage of households with healthy latrine facilities, percentage of households with clean and
healthy living behaviours, and population density per hectare as an effort to provide better insight
into the distribution of diarrhea cases for the achievement of Sustainable Development Goals
(SDGs).
The results of this study can show the importance interventions and efforts of the Bandung
City government in preventing high cases of diarrhea and optimizing the improvement of health
levels in children under five.
of Bandung City includes several variables, namely the prevalence of diarrhae in children
under five, the percentage of households with clean and healthy living behaviour, the
percentage of households with healthy latrine facilities, the population density per-hectare,
and the many baby less than 6 months old who are exclusively breastfed. This study used 30
observations, namely the number of sub-districts in Bandung City.
2.1.1. Prevalence of diarrhea in children under five
The prevalence of diarrhea in children under five years old is a record of the number of
diarrhea cases identified and treated. The data used is the prevalence of diarrhea in children
under five. In this study, the prevalence of diarrhea in children under five was measured in
each sub-district in Bandung City.
indicator to measure the population in an area [28]. This indicator is used to determine the
population density per-hectare of an area.
2.1.5. Many babies less than 6 months who are exclusively breastfed.
Breast milk is the ideal food for baby up to 6 months old in terms of physical and psychological
health [9]. Exclusive breastfeeding until the baby is 6 months old will influence the optimal
development of the child's intelligence potential [10]. Exclusive breastfeeding in children before
6 months of age is very important to reduce the risk of developing various diseases and breast milk
can accelerate recovery in sick children [29][30].
2.2. Data Standarization
Standardization is a technique used to transform data so that it has a mean equal to 0 and a
standard deviation equal to 1. Standardization is used in data analysis when the observed variables
have different scales or distributions [27]. Data values that have been standardized are notated as
z, x is the actual data value, µ is the mean of the data, and σ is the standard deviation of the data.
𝑥−µ
𝑧= (2)
𝜎
2.3. Variance Inflation Factor
Variance Inflation Factor (VIF) is a measure of the severity level of multicollinearity in
multiple linear regression models involving more than one variable. Multicollinearity is a measure
that refers to the comparison of variance when there is multicollinearity between predictor
variables and variance when there is no multicollinearity. The formula for calculating VIF is as
follows [11].
1
𝑉𝐼𝐹𝑖 = (3)
1 − 𝑅𝑖2
𝑅𝑖 in the equation states the determination coefficient of the ith variable. The occurrence of
multicollinearity in data based on the VIF value > 10 which indicates that the greater the VIF, the
more serious the multicollinearity [12].
2.4. Gaussian Mixture Model
Gaussian Mixture Model (GMM) is a statistical model used to represent complex data
distributions by combining multiple Gaussian distributions [21][22]. In this model, data is
6
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
considered to come from several different components of the Gaussian distribution. GMM is
used in various fields including clustering [26], dimensionality reduction, data distribution
modelling, image restoration, and others. In this research, the use of GMM aims to perform
clustering.
GMM assumes that the resulting component of the Gaussian distribution is the number of
clusters formed. The combination obtained from the mean and variance will represent each
Gaussian. The purpose of clustering using GMM is to determine the model parameters (mean
and matrix) that best fit the data [13]. The model used to perform clustering with respect to the
geometry formed from components of Gaussian with different parameters [14] shown in table
1.
Table 1. Covariance matrix and geometric formed of Mclust in the Gaussian Mixture Model.
Symbol Model Volume Geometry Shape Orientation Shape
EII 𝜆𝐼 Same Same - Spherical
VII 𝜆𝑘 𝐼 Different Same - Spherical
EEI 𝜆𝐴 Same Same Coordinate axes Diagonal
VEI 𝜆𝑘 𝐴 Differrent Same Coordinate axes Diagonal
EVI 𝜆𝐴𝑘 Same Different Coordinate axes Diagonal
VVI 𝜆𝑘 𝐴𝑘 Different Different Coordinate axes Diagonal
EEE 𝜆𝐷𝐴𝐷𝑇 Same Same Identity Ellipsoidal
EEV 𝜆𝐷𝑘 𝐴𝐷𝑘𝑇 Same Same Different Ellipsoidal
VEV 𝜆𝑘 𝐷𝑘 𝐴𝐷𝑘𝑇 Different Same Different Ellipsoidal
VVV 𝜆𝑘 𝐷𝑘 𝐴𝑘 𝐷𝑘𝑇 Different Different Different Ellipsoidal
1 −(𝑋−µ)2
𝑓(𝑋|µ, 𝜎) = 𝑒 2𝜎 2
𝜎√2𝜋 (4)
7
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES
µ and σ represent the mean and the standard deviation of the distribution. The probability
density function for d-dimensional multivariate data is expressed as follows:
1 1
(𝑋−µ)𝑇 ∑−1 (𝑋−µ))
𝑓(𝑋|µ, ∑) = 𝑒 (−2 (5)
√2𝜋|∑|
Where µ denotes the mean of the represented of distribution as a d-dimensional array, ∑ is the
covariance matrix of X, T denotes the transpose vector, and -1 denotes the invers of the matrix
[15]. To maximize the likelihood of data from GMM, the Expectation-Maximization (EM)
algorithm can be used. The steps are as follows [13][23]:
1) Initialize the value of µ𝑘 , 𝜎𝑘 , and 𝜋𝑘 randomly for all clusters, where π is the mixture
coefficient and k value is a number that indicates the cluster. The linear function of the cluster
distribution density is:
𝐾
𝑝(𝑋) = ∑ 𝜋𝑘 𝑓(𝑋|µ𝑘 , ∑𝑘 )
𝑘=1 (5)
Likelihood value:
1 −(𝑥𝑖 −µ𝑘 )2
(𝐶𝑘 |𝑥𝑖 ) = 𝑒 2𝜎2
√2𝜋𝑘 𝜎 (7)
3) M-Step is changing the value of µ𝑘 , 𝜎𝑘 , and 𝜌(𝐶𝑘 ) with the following calculation:
8
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
1 𝑖𝑓 max{𝑧̂𝑖𝑘 } 𝜖 𝑘
𝑀𝐴𝑃{𝑧̂𝑖𝑘 } = {
0 𝑜𝑡ℎ𝑒𝑟 (12)
Selection of the best model in analysis using the Gaussian Mixture Model (GMM) method is
based on the general approach, namely the Bayes Information Criterion (BIC) value for the model
of the parameters and the number of clusters formed [16][24][25].
3. RESULTS
Clustering analysis with sub-districts as observations on the research data, which included the
prevalence of diarrhea in children under five, percentage of households with clean and healthy
living behaviours, percentage of households with healthy latrine facilities, population density per-
hectare, and the baby < 6 months old who is exclusively breastfed d, was conducted using R
software. The mapping of sub-districts in Bandung City based on the characteristics of each
9
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES
Figure 3. Mapping of Sub-districts based on Percentage of Households with Clean and Healthy
Living Behaviours in Bandung City.
25 sub-districts in Bandung City have percentage of households with clean and healthy
behaviours above 50%, while 5 sub-districts with low percentage of households with clean and
healthy behaviours are Cibeunying Kidul, Cidadap, Cicendo, Bojongloa Kidul, and Bandung
Wetan.
10
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
Figure 6. Mapping of Sub-districts based on Number of Baby less than 6 Months Old who is
Exclusively Breastfed in Bandung City.
There are 5 sub-districts in the city of Bandung with a high number of babies less than 6
months old who is exclusively breastfed above 400 babies, namely Andir, Coblong, Bandung
Kulon, Sukajadi, and Ujung Berung sub-districts.
3.1. Multicollinearity Test
Testing the multicollinearity between variables is carried out as an initial stage in determining
variable selection. If there is multicollinearity in the variables used, it is necessary to select
variables with related methods, such as Principle Component Analysis (PCA). The results of
multicollinearity testing using the Variance Inflation Factor (VIF) can be viewed in table 2.
Table 2. VIF value
Variables VIF
Prevalence of diarrhea in children under five 1.268
Households with Clean and Healthy Living Behaviours 1.129
Households with Healthy Latrine Facilities 1.368
Population density 1.542
baby <6 months old who is exclusively breastfed 1.150
Based on table 2, not found the value of VIF more than 10. So it can be concluded that there
12
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
is no multicollinearity in the variables and all variables will be used in cluster analysis.
3.2. Identification of BIC Value
The identification of the BIC value is done to determine the best model and the number
of clusters formed. Based on the analysis results, a comparison chart of the BIC values of
various models was obtained. The Ellipsoidal, Equal Volume and Shape (EEV) model has the
highest value BIC based on figure 7.
The identification results of the Gaussian Mixture Model EEV based on the Expectation-
Maximization (EM) algorithm show a model with five components which can be viewed in Table
3.
Table 3. Five component of EEV model
Loglikehood n df BIC ICL
-31.9397 30 84 -349.58 -349.5899
Bayes Information Criterion (BIC) and Integrated Completed Likelihood (ICL) are metrics used
in Gaussian Mixture Model (GMM) analysis to identify the most appropriate number of Gaussian
components. In this study, the BIC value is used as a metric in determining the optimal number of
components. The BIC value of -349.58 is the highest value in the EEV model with 5 Gaussian
components formed, meaning that sub-districts in Bandung City can be grouped into 5 clusters
13
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES
4. CONCLUSION
Clustering was carried out in 30 sub-districts based on 5 variables studied including the prevalence
of diarrhea in children under five, the percentage of households with clean and healthy living behaviors,
the percentage of households with healthy latrine facilities, the population per-hectare, and the number
of baby less than 6 months old who are exclusively breastfed. Based on clustering using the Gaussian
Mixture Model (GMM) method with the best model being Ellipsoidal, Equal Volume and Shape (EEV)
as many as 5 components, it is concluded that cluster 5 is a group of areas that have a high prevalence
of diarrhea in children under five in 2022 so that the need for intervention and efforts of the Bandung
City government in preventing higher cases of diarrhea in children under five in six districts including
Bandung Kidul, Bandung Wetan, Cicendo, Cidadap, Cinambo, and Sumur Bandung. Cluster 2 consists
of sub-districts with good sanitation, including Antapani, Arcamanik, Buah Batu, Cibiru, Gedebage,
Lengkong, Mandalajati, Panyileukan, and Sukasari.
It is important to educate the sub-districts included in cluster 5 about better hygiene and
sanitation facilities to prevent an increase in under-five diarrhea cases to achieve the Sustainable
Development Goals (SDGs) and improve the level of children under five health in Bandung City.
FUNDING
This research is supported by the Department of Statistics, Padjadjaran University and the Rector
of Padjadjaran University.
CONFLICT OF INTERESTS
The authors declare that there is no conflict of interests.
16
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
REFERENCES
[1] N. J. CaJacob and M. B. Cohen, "Update on Diarrhea," Pediatrics in Review, vol. 37, no. 8, pp. 313-322, 2016.
[2] L. R. Schiller, D. S. Pardi, and J. H. Sellin, "Chronic diarrhea: diagnosis and management," Clinical
[3] "Prevalence and determinants of diarrhea among under-five children in five Southeast Asian countries: Evidence
[4] E. Yunitasari, R. Pradanie, H. Arifin, D. Fajrianti, and B. O. Lee, "Determinants of stunting prevention among
mothers with children aged 6–24 months," Open Access Macedonian Journal of Medical Sciences, vol. 9, no. B,
[5] M. M. Levine, D. Nasrin, S. Acácio, Q. Bassat, H. Powell, S. M. Tennant, et al., "Diarrheal disease and
subsequent risk of death in infants and children residing in low-income and middle-income countries: analysis
of the GEMS case-control study and 12-month GEMS-1A follow-on study," The Lancet Global Health, vol. 8,
[6] K. Takanashi, Y. Chonan, D. T. Quyen, N. C. Khan, K. C. Poudel, and M. Jimba, "Survei praktik kebersihan
makanan di rumah dan diare masa kanak-kanak di Hanoi, Vietnam," J Kesehatan Popul Nutr, vol. 27, no. 5, pp.
602-611, 2009.
[7] M. B. Karo, "Perilaku hidup bersih dan sehat (PHBS) strategi pencegahan penyebaran Virus Covid-19," in
[8] N. Rohmah and F. Syahrul, "Hubungan kebiasaan cuci tangan dan penggunaan jamban sehat dengan kejadian
diare balita," Jurnal Berkala Epidemiologi, vol. 5, no. 1, pp. 95-106, 2017.
[9] S. Tanuwidjaya, "Konsep umum tumbuh dan kembang," in Tumbuh kembang anak dan remaja. Edisi ke-1, h. 1-
[10] L. Novita, D. A. Gurnida, and H. Garna, "Perbandingan fungsi kognitif bayi usia 6 bulan yang mendapat dan
yang tidak mendapat ASI eksklusif," Sari Pediatri, vol. 9, no. 6, pp. 429-34, 2016.
[11] D. H. Vu, K. M. Muttaqi, and A. P. Agalgaonkar, "A variance inflation factor and backward elimination based
robust regression model for forecasting monthly electricity demand using climatic variables," Applied Energy,
[12] R. S. Gómez, A. R. Sánchez, C. G. García, and J. G. Pérez, "The VIF and MSE in raise regression," Quantitative
[13] Z. Wahidah and D. T. Utari, "COMPARISON OF K-MEANS AND GAUSSIAN MIXTURE MODEL IN
PROFILING AREAS BY POVERTY INDICATORS," BAREKENG: Jurnal Ilmu Matematika dan Terapan, vol.
[14] L. Scrucca, "Identifying connected components in Gaussian finite mixture models for clustering," Computational
[15] S. Belciug and D. G. Iliescu, "Deep learning and Gaussian Mixture Modelling clustering mix. A new approach
for fetal morphology view plane differentiation," Journal of Biomedical Informatics, vol. 143, p. 104402, 2023.
[16] N. Shen and B. González, "Bayesian information criterion for linear mixed-effects models," arXiv preprint
arXiv:2104.14725, 2021.
[17] C. E. Troeger et al., "Quantifying risks and interventions that have affected the burden of diarrhea among children
younger than 5 years: an analysis of the Global Burden of Disease Study 2017," The Lancet Infectious Diseases,
[18] N. Thapar and I. R. Sanderson, "Diarrhea in children: an interface between developing and developed countries,"
[19] V. Diwan, Y. D. Sabde, E. Byström, and A. De Costa, "Treatment of pediatric diarrhea: a simulated client study
at private pharmacies of Ujjain, Madhya Pradesh, India," The Journal of Infection in Developing Countries, vol.
[20] G. Mengistu et al., "Self-reported and actual involvement of community pharmacy professionals in the
management of childhood diarrhea: a cross-sectional and simulated patient study at two towns of Eastern
Ethiopia," Clinical Medicine Insights: Pediatrics, vol. 13, pp. 1179556519855380, 2019.
[21] V. Melnykov and R. Maitra, "Finite mixture models and model-based clustering."
[22] C. Fraley and A. E. Raftery, "Model-based clustering, discriminant analysis, and density estimation," Journal of
the American statistical Association, vol. 97, no. 458, pp. 611-631, 2002.
[23] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM
algorithm," Journal of the royal statistical society: series B (methodological), vol. 39, no. 1, pp. 1-22, 1977.
18
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA
[24] G. Schwarz, "Estimating the dimension of a model," The annals of statistics, pp. 461-464, 1978.
[25] R. A. Stine, "Model selection using information theory and the MDL principle," Sociological Methods &
[26] P. N. Tan, M. Steinbach, and V. Kumar, "Data mining cluster analysis: basic concepts and algorithms,"
[27] A. Athifaturrofifah, R. Goejantoro, and D. Yuniarti, "Perbandingan Pengelompokan K-Means dan K-Medoids
Pada Data Potensi Kebakaran Hutan/Lahan Berdasarkan Persebaran Titik Panas," EKSPONENSIAL, vol. 10,
[28] A. P. Kusuma and D. M. Sukendra, "Analisis spasial kejadian demam berdarah dengue berdasarkan kepadatan
penduduk," Unnes Journal of Public Health, vol. 5, no. 1, pp. 48-56, 2016.
[29] S. Rini and Rohayati, "Kejadian Diare pada Bayi dengan Pemberian ASI," Journal of Nursing, vol. 11, no. 2, pp.
153-156, 2015.
[30] N. M. E. Wardani, K. A. Witarini, P. J. Putra, and I. W. D. Artana, "Pengaruh Pemberian ASI Eksklusif Terhadap
Kejadian Diare pada Anak Usia 1-3 Tahun," Jurnal Medika Udayana, vol. 11, no. 01, pp. 12-17, 2022.
[31] L. B. Wadu, A. F. Gultom, and F. Pantus, "Penyediaan Air Bersih Dan Sanitasi: Bentuk Keterlibatan Masyarakat
Dalam Pembangunan Berkelanjutan," Jurnal Pendidikan Kewarganegaraan, vol. 10, no. 2, pp. 80-88, 2020.
450
400
350
300
CASES
250
200
150
100
50
SUB-DISTRICT