0% found this document useful (0 votes)
31 views30 pages

Paper CMBN

This document discusses clustering childhood diarrhea diseases in Bandung City, Indonesia using a Gaussian mixture model with a visual approach. The study uses data on diarrhea prevalence in children under 5, clean living behaviors, healthy latrine facilities, population density, and breastfeeding rates to cluster the city's sub-districts based on similarities across these factors. Cluster 2 areas have high percentages of healthy latrines and clean living behaviors, while Cluster 5 areas have high diarrhea prevalence in young children. The results can help the city government target interventions to prevent increasing diarrhea cases and improve child health.

Uploaded by

yustihermawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views30 pages

Paper CMBN

This document discusses clustering childhood diarrhea diseases in Bandung City, Indonesia using a Gaussian mixture model with a visual approach. The study uses data on diarrhea prevalence in children under 5, clean living behaviors, healthy latrine facilities, population density, and breastfeeding rates to cluster the city's sub-districts based on similarities across these factors. Cluster 2 areas have high percentages of healthy latrines and clean living behaviors, while Cluster 5 areas have high diarrhea prevalence in young children. The results can help the city government target interventions to prevent increasing diarrhea cases and improve child health.

Uploaded by

yustihermawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CLUSTERING OF CHILDHOOD DIARRHEA DISEASES USING

GAUSSIAN MIXTURE MODEL WITH VISUAL APPROACH


DEFI YUSTI FAIDAH*, ASHILLA MAULA HUDZAIFA

Department of Statistics, Padjadjaran University, Bandung 45363, Indonesia

Copyright © 2023 the author(s). This is an open access article distributed under the Creative Commons Attribution License, which permits

unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract: Diarrhea remains one of the leading causes of childhood morbidity and mortality worldwide, especially in

developing countries. Diarrhea is a digestive disorder characterized by the need to defecate more often than usual.

Diarrhea remains a major problem of public health in Indonesia, with 40% of cases occurring in children under five

years old. Bandung City is one of the largest cities in West Java Province with the sixth highest cases of diarrhea in

West Java. The high number of diarrhea cases in Bandung City is a serious concern in public health efforts, especially

among young children. The incidence of diarrhea in children under five in Bandung City in 2022 was recorded as high

as 6376 cases. Therefore, this study aims to cluster sub-districts in Bandung City based on factors that influence

diarrhea cases. Five factors are used, namely the prevalence of diarrhea in children under five years old, clean and

healthy living behaviour, healthy latrine facilities, population density, and many baby less than 6 months old are

exclusively breastfed . Based on the results of clustering with the Gaussian mixture model, 5 regional groups were

obtained based on similarity of characteristics with five factors. Cluster 2 are areas characterized by a high percentage

of households with healthy latrines and healthy living behaviour. Cluster 5 are areas characterized by a high prevalence

of diarrhea in children under five. The results of this clustering indicate the importance of interventions and strategies

by the government of Bandung city to prevent the increasing number of diarrhea cases among children under five.

Therefore, it can provide better insight into the distribution of diarrhea cases to achieve the Sustainable Development

Goals (SDGs) and improve the health status of children under five in Indonesia, particularly in Bandung City.

Keywords: childhood; clustering; diarrhea; gaussian mixture model; SDGs.

2020 AMS Subject Classification: 62H30, 92D20

*
Corresponding author
E-mail address: [email protected]
Received November 14, 2023
1
2
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

1. INTRODUCTION
Diarrhea remains one of the leading causes of childhood morbidity and mortality worldwide,
especially in developing countries. Diarrhea is a digestive disorder characterised by the sufferer
passing stools more frequently than usual. Diarrhea is caused by a number of things, for example
Escherichia coli bacteria, viruses such as Rotavirus, parasites, food allergies, Crohn's disease, side
effects of certain drugs, and others. Diarrhea can be classified based on the frequency and duration
of bowel movements and the characteristics of the faeces [1]. The classification of diarrhea
includes acute diarrhea lasting 14 days, persistent diarrhea lasting > 14 days, and chronic diarrhea
lasting > 30 days [2][18]. Serious diarrhea can lead to malnutrition and in the most severe cases,
death due to lack of salt and water in the body [19][20]. Indonesia as the fourth most populous
country in the world is place to around 22 million children under five. The Health Ministry of the
Republic of Indonesia indicates diarrhea as the leading cause of death in children under 5 years of
age with a mortality rate of 10.7% in 2019 [3]. In addition to causing death, prolonged diarrhea
can lead to malnutrition and stunting in children [4].
Diarrhea remains a major problem of public health in Indonesia with 40% of cases occurring
in children under five years old. Bandung is one of the major cities with the sixth highest diarrhea
cases in West Java. The high incidence of diarrhea in Bandung City is a serious concern in efforts
to address public health, especially children under five. In 2022, the incidence of diarrhea in
children under five in Bandung City was recorded as many as 6376 cases served. Figure 1 shows
that Babakan Ciparay is the sub-district with the most diarrhea cases in Bandung City.
There are various factors that cause diarrheal diseases, one of which is hygiene problems
including improper sanitation facilities [17]. Germs that cause diarrhea are easily spread from one
person to another through contaminated water, food or objects [5]. Food hygiene is associated with
the development of diarrhea and malnutrition in low socioeconomic children [6]. In addition,
limited and inadequate sanitation facilities are likely to have a bad level of hygiene and may
increase the risk of diarrhea.
3
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

500

400

300
CASES

200

100

Cibiru

Regol
Cidadap
Andir

Babakan Ciparay

Bandung Kulon

Batununggal
Bojong Kaler

Buah Batu

Cibeunying Kidul

Gedebage

Mandalajati

Sukasari

Ujung Berung
Cinambo
Coblong

Kiaracondong
Lengkong

Rancasari

Sukajadi
Antapani
Arcamanik
Astana Anyar

Bojongloa Kidul

Cibeunying Kaler

Cicendo
Bandung Kidul

Panyileukan

Sumur Bandung
Bandung Wetan

SUB-DISTRICT

Figure 1. High incidence of diarrhea among children under five in Bandung city
Various programmes have been carried out to reduce the occurrence of diarrhea, one of which
is the provision of clean water and sanitation in area that still difficult access to clean water. The
provision of sanitation and clean water the goal 6 of SDGs [31]. However, morbidity and mortality
rates from diarrhea remain high due to the high prevalence of contributing factors. Therefore, this
study aims to cluster sub-districts in Bandung City based on the percentage of diarrhea prevalence,
percentage of households with healthy latrine facilities, percentage of households with clean and
healthy living behaviours, and population density per hectare as an effort to provide better insight
into the distribution of diarrhea cases for the achievement of Sustainable Development Goals
(SDGs).
The results of this study can show the importance interventions and efforts of the Bandung
City government in preventing high cases of diarrhea and optimizing the improvement of health
levels in children under five.

2. MATERIALS AND METHODS


2.1. Data
The data used for this study is secondary data obtained from the data portal and health profile
4
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

of Bandung City includes several variables, namely the prevalence of diarrhae in children
under five, the percentage of households with clean and healthy living behaviour, the
percentage of households with healthy latrine facilities, the population density per-hectare,
and the many baby less than 6 months old who are exclusively breastfed. This study used 30
observations, namely the number of sub-districts in Bandung City.
2.1.1. Prevalence of diarrhea in children under five
The prevalence of diarrhea in children under five years old is a record of the number of
diarrhea cases identified and treated. The data used is the prevalence of diarrhea in children
under five. In this study, the prevalence of diarrhea in children under five was measured in
each sub-district in Bandung City.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑜𝑓 𝑑𝑖𝑎𝑟𝑟ℎ𝑜𝑒𝑎 𝑖𝑛 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛 𝑢𝑛𝑑𝑒𝑟 𝑓𝑖𝑣𝑒


𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 = × 100% (1)
𝑇𝑜𝑡𝑎𝑙 𝑐𝑎𝑠𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑟𝑒𝑔𝑖𝑜𝑛

2.1.2. Percentage of households with clean and healthy living behaviour


Households with Clean and Healthy Living Behaviour is all health behaviours which is
done out of awareness and taking an active role in community health activities [7]. PHBS
involves several elements, namely households, schools, workplaces, health facilities, and
public places (Health Profile of Bandung City 2022). The data used in the study is the
percentage of households identified as implementing clean and healthy living behaviours.
2.1.3. Percentage of households with healthy latrine facilities
Healthy latrines are proper sanitation facilities that are able to prevent themselves from
various diseases. The criteria included in the operational definition for healthy latrines are a
latrine building that is closed and has a non-slip floor, no odor and no visible dirt, septic tank
distance ≥ 10 metres, available cleaning tools, and free from insects [8]. The data used in the
study is the percentage of households that have healthy latrine facilities.
2.1.4. Population density of a sub-district per-hectare
Population density is the number of people per unit area (ha). Population density is an
5
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

indicator to measure the population in an area [28]. This indicator is used to determine the
population density per-hectare of an area.
2.1.5. Many babies less than 6 months who are exclusively breastfed.
Breast milk is the ideal food for baby up to 6 months old in terms of physical and psychological
health [9]. Exclusive breastfeeding until the baby is 6 months old will influence the optimal
development of the child's intelligence potential [10]. Exclusive breastfeeding in children before
6 months of age is very important to reduce the risk of developing various diseases and breast milk
can accelerate recovery in sick children [29][30].
2.2. Data Standarization
Standardization is a technique used to transform data so that it has a mean equal to 0 and a
standard deviation equal to 1. Standardization is used in data analysis when the observed variables
have different scales or distributions [27]. Data values that have been standardized are notated as
z, x is the actual data value, µ is the mean of the data, and σ is the standard deviation of the data.
𝑥−µ
𝑧= (2)
𝜎
2.3. Variance Inflation Factor
Variance Inflation Factor (VIF) is a measure of the severity level of multicollinearity in
multiple linear regression models involving more than one variable. Multicollinearity is a measure
that refers to the comparison of variance when there is multicollinearity between predictor
variables and variance when there is no multicollinearity. The formula for calculating VIF is as
follows [11].
1
𝑉𝐼𝐹𝑖 = (3)
1 − 𝑅𝑖2
𝑅𝑖 in the equation states the determination coefficient of the ith variable. The occurrence of
multicollinearity in data based on the VIF value > 10 which indicates that the greater the VIF, the
more serious the multicollinearity [12].
2.4. Gaussian Mixture Model
Gaussian Mixture Model (GMM) is a statistical model used to represent complex data
distributions by combining multiple Gaussian distributions [21][22]. In this model, data is
6
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

considered to come from several different components of the Gaussian distribution. GMM is
used in various fields including clustering [26], dimensionality reduction, data distribution
modelling, image restoration, and others. In this research, the use of GMM aims to perform
clustering.
GMM assumes that the resulting component of the Gaussian distribution is the number of
clusters formed. The combination obtained from the mean and variance will represent each
Gaussian. The purpose of clustering using GMM is to determine the model parameters (mean
and matrix) that best fit the data [13]. The model used to perform clustering with respect to the
geometry formed from components of Gaussian with different parameters [14] shown in table
1.
Table 1. Covariance matrix and geometric formed of Mclust in the Gaussian Mixture Model.
Symbol Model Volume Geometry Shape Orientation Shape
EII 𝜆𝐼 Same Same - Spherical
VII 𝜆𝑘 𝐼 Different Same - Spherical
EEI 𝜆𝐴 Same Same Coordinate axes Diagonal
VEI 𝜆𝑘 𝐴 Differrent Same Coordinate axes Diagonal
EVI 𝜆𝐴𝑘 Same Different Coordinate axes Diagonal
VVI 𝜆𝑘 𝐴𝑘 Different Different Coordinate axes Diagonal
EEE 𝜆𝐷𝐴𝐷𝑇 Same Same Identity Ellipsoidal
EEV 𝜆𝐷𝑘 𝐴𝐷𝑘𝑇 Same Same Different Ellipsoidal
VEV 𝜆𝑘 𝐷𝑘 𝐴𝐷𝑘𝑇 Different Same Different Ellipsoidal
VVV 𝜆𝑘 𝐷𝑘 𝐴𝑘 𝐷𝑘𝑇 Different Different Different Ellipsoidal

The probability density function for a one-dimensional Gaussian distribution is:

1 −(𝑋−µ)2
𝑓(𝑋|µ, 𝜎) = 𝑒 2𝜎 2
𝜎√2𝜋 (4)
7
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

µ and σ represent the mean and the standard deviation of the distribution. The probability
density function for d-dimensional multivariate data is expressed as follows:

1 1
(𝑋−µ)𝑇 ∑−1 (𝑋−µ))
𝑓(𝑋|µ, ∑) = 𝑒 (−2 (5)
√2𝜋|∑|

Where µ denotes the mean of the represented of distribution as a d-dimensional array, ∑ is the
covariance matrix of X, T denotes the transpose vector, and -1 denotes the invers of the matrix
[15]. To maximize the likelihood of data from GMM, the Expectation-Maximization (EM)
algorithm can be used. The steps are as follows [13][23]:
1) Initialize the value of µ𝑘 , 𝜎𝑘 , and 𝜋𝑘 randomly for all clusters, where π is the mixture
coefficient and k value is a number that indicates the cluster. The linear function of the cluster
distribution density is:
𝐾

𝑝(𝑋) = ∑ 𝜋𝑘 𝑓(𝑋|µ𝑘 , ∑𝑘 )
𝑘=1 (5)

2) E-Step is evaluating the log-likelihood results using the parameter µ𝑘 , 𝜎𝑘 , and 𝜋𝑘 .


Suppose the cluster 𝐶𝑘 represented by a Gaussian distribution (µ𝑘 , 𝜎𝑘 ), The probability of
𝑋𝑖 in cluster 𝐶𝑘 can be calculated by:
𝜌(𝑥𝑖 |𝐶𝑘 )𝜌(𝐶𝑘 )
𝑧𝑖𝑘 /𝜌(𝐶𝑘 |𝑥𝑖 ) =
𝜌(𝑥𝑖 ) (6)

Likelihood value:

1 −(𝑥𝑖 −µ𝑘 )2
(𝐶𝑘 |𝑥𝑖 ) = 𝑒 2𝜎2
√2𝜋𝑘 𝜎 (7)

𝜌(𝑥𝑖 ) = ∑ 𝜌(𝑥𝑖 |𝐶𝑘 )𝜌(𝐶𝑘 ) (8)


𝑘

3) M-Step is changing the value of µ𝑘 , 𝜎𝑘 , and 𝜌(𝐶𝑘 ) with the following calculation:
8
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

∑𝑖(𝐶𝑘 |𝑥𝑖 )𝑥𝑖


µ𝑘 = (9)
∑𝑖 𝜌(𝐶𝑘 |𝑥𝑖 )
∑𝑖(𝐶𝑘 |𝑥𝑖 )(𝑥𝑖 − µ𝑘 )2
𝜎𝑘 = (10)
∑𝑖 𝜌(𝐶𝑘 |𝑥𝑖 )
∑𝑖(𝐶𝑘 |𝑥𝑖 )
𝜋𝑘 = (11)
𝑛
4) Perform steps 2 and 3 again until the convergence criteria are met. Therefore, set a certain
threshold value for the change of mean and variance in successive iterations, so that the cluster
members can be clustered by the Maximum a Posteriori (MAP) method.

1 𝑖𝑓 max{𝑧̂𝑖𝑘 } 𝜖 𝑘
𝑀𝐴𝑃{𝑧̂𝑖𝑘 } = {
0 𝑜𝑡ℎ𝑒𝑟 (12)

Selection of the best model in analysis using the Gaussian Mixture Model (GMM) method is
based on the general approach, namely the Bayes Information Criterion (BIC) value for the model
of the parameters and the number of clusters formed [16][24][25].

𝐵𝐼𝐶𝑘 = 2 log P(y|𝜃̂𝑘 , 𝑀𝑘 ) − 𝑉𝑘 log (𝑛) ≈ 2 log P(y|𝑀𝑘 )


(12)

P(y|𝑀𝑘 ) : integration of likelihoods for 𝑀𝑘 model.

P(y|𝜃̂𝑘 , 𝑀𝑘 ) : integrated the maximum mixed likelihood for 𝑀𝑘 model.

𝑉𝑘 : number of independent parameters estimated in the 𝑀𝑘 model.


The best model and number of clusters are determined based on the highest BIC value.

3. RESULTS
Clustering analysis with sub-districts as observations on the research data, which included the
prevalence of diarrhea in children under five, percentage of households with clean and healthy
living behaviours, percentage of households with healthy latrine facilities, population density per-
hectare, and the baby < 6 months old who is exclusively breastfed d, was conducted using R
software. The mapping of sub-districts in Bandung City based on the characteristics of each
9
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

variable can be seen in figure 2, figure 3, figure 4, figure 5, and figure 6.

Figure 2. Mapping of Sub-districts based on Prevalence Rate of Diarrhea in Bandung City.


The sub-district with the highest prevalence of diarrhea in children under five is Cinambo at
0.895.

Figure 3. Mapping of Sub-districts based on Percentage of Households with Clean and Healthy
Living Behaviours in Bandung City.
25 sub-districts in Bandung City have percentage of households with clean and healthy
behaviours above 50%, while 5 sub-districts with low percentage of households with clean and
healthy behaviours are Cibeunying Kidul, Cidadap, Cicendo, Bojongloa Kidul, and Bandung
Wetan.
10
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

Figure 4. Mapping of Sub-districts based on Percentage of Households with Healthy Latrine


Facilities in Bandung City.
12 sub-districts in Bandung City have a percentage of households with healthy latrine
facilities 100% including Buah Batu, Panyileukan, Antapani, Mandalajati, Gedebage,
Rancasari, Lengkong, Arcamanik, Sukasari, Ujung Berung, Cibiru, and Cinambo.

Figure 5. Mapping of Sub-districts based on Population Per-hectare in Bandung City.


The highest population density at 399 per-hectare is Sub-district of Bojong Kaler.
11
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

Figure 6. Mapping of Sub-districts based on Number of Baby less than 6 Months Old who is
Exclusively Breastfed in Bandung City.
There are 5 sub-districts in the city of Bandung with a high number of babies less than 6
months old who is exclusively breastfed above 400 babies, namely Andir, Coblong, Bandung
Kulon, Sukajadi, and Ujung Berung sub-districts.
3.1. Multicollinearity Test
Testing the multicollinearity between variables is carried out as an initial stage in determining
variable selection. If there is multicollinearity in the variables used, it is necessary to select
variables with related methods, such as Principle Component Analysis (PCA). The results of
multicollinearity testing using the Variance Inflation Factor (VIF) can be viewed in table 2.
Table 2. VIF value
Variables VIF
Prevalence of diarrhea in children under five 1.268
Households with Clean and Healthy Living Behaviours 1.129
Households with Healthy Latrine Facilities 1.368
Population density 1.542
baby <6 months old who is exclusively breastfed 1.150
Based on table 2, not found the value of VIF more than 10. So it can be concluded that there
12
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

is no multicollinearity in the variables and all variables will be used in cluster analysis.
3.2. Identification of BIC Value
The identification of the BIC value is done to determine the best model and the number
of clusters formed. Based on the analysis results, a comparison chart of the BIC values of
various models was obtained. The Ellipsoidal, Equal Volume and Shape (EEV) model has the
highest value BIC based on figure 7.

Figure 7. BIC Value of GMM results.

The identification results of the Gaussian Mixture Model EEV based on the Expectation-
Maximization (EM) algorithm show a model with five components which can be viewed in Table
3.
Table 3. Five component of EEV model
Loglikehood n df BIC ICL
-31.9397 30 84 -349.58 -349.5899
Bayes Information Criterion (BIC) and Integrated Completed Likelihood (ICL) are metrics used
in Gaussian Mixture Model (GMM) analysis to identify the most appropriate number of Gaussian
components. In this study, the BIC value is used as a metric in determining the optimal number of
components. The BIC value of -349.58 is the highest value in the EEV model with 5 Gaussian
components formed, meaning that sub-districts in Bandung City can be grouped into 5 clusters
13
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

based on predetermined variables.


3.3. Clustering
The grouping of kecamatan into 5 clusters based on the EEV model shows that cluster 1 consists
of 6 kecamatan, cluster 2 consists of 9 kecamatan, cluster 3 consists of 5 kecamatan, cluster 4
consists of 4 kecamatan, and cluster 5 consists of 6 kecamatan with the characteristics of each
cluster can be viewed in table 4.
Table 4. Means of cluster
Variabel Cluster Cluster Cluster Cluster Cluster
1 2 3 4 5
Prevalence of Diarrhea in -0.345 -0.051 -0.547 -0.057 0.915
Children Under Five
Households with Clean and -0.238 0.790 -0.508 0.849 -1.089
Healthy Living Behaviours
Households with Healthy 0.037 0.884 -0.380 -1.328 -0.158
Latrine Facilities
Population Density 0.149 -0.547 1.643 0.266 -0.877
Baby < 6 Months Old who is 1.476 -0.420 -0.001 0.098 -0.905
Exclusively Breastfed
The values in table 4 show the average value that represents the centre of the component in the
data space. The average of the variables in each Gaussian component is different from each other,
indicating that the distribution of sub-districts in each cluster is based on the extent to which sub-district
characteristics are close to the centre of a particular cluster.
The grouping of 30 sub-districts into 5 clusters can be viewed in table 5, and the cluster mapping
can be viewed in figure 8.
14
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

Figure 8. Cluster map of diarrhea in children under five in Bandung City.


Based on Figure 8, the different colours of the regions indicate the different clusters. Areas
with black colour are areas characterised by a high prevalence of diarrhea.
Table 5. Clustering and characteristics of each cluster
Cluster Characteristics Sub-districts
1 Components in cluster 1 tend to be Andir, Bandung Kulon,
characterized by a high number of baby less Bojongloa Kidul, Coblong,
than 6 months old who is exclusively Rancasari, and Ujung Berung.
breastfed.
2 Components in cluster 2 tend to be Antapani, Arcamanik, Buah
characterized by a high percentage of Batu, Cibiru, Gedebage,
households clean and healthy living behaviours, Lengkong, Mandalajati,
and also healthy latrine facilities. Panyileukan, and Sukasari.
3 Components in cluster 3 tend to be Astana Anyar, Batununggal,
characterized by high population density per- Bojong Kaler, Cibeunying
hectare. Kidul, and Kiaracondong,
4 Components in cluster 4 tend to be Babakan Ciparay, Cibeunying
characterized by a high percentage of Kaler, Regol, and Sukajadi
households clean and healthy living behaviours
and a high per-hectare population density.
15
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

5 Components in cluster 5 tend to be Bandung Kidul, Bandung


characterized by a high prevalence of diarrhea Wetan, Cicendo, Cidadap,
among under-fives, as well as a low percentage Cinambo, and Sumur
of households clean and healthy living Bandung.
behaviours and number of baby less than 6
months old who is exclusively breastfed.

4. CONCLUSION
Clustering was carried out in 30 sub-districts based on 5 variables studied including the prevalence
of diarrhea in children under five, the percentage of households with clean and healthy living behaviors,
the percentage of households with healthy latrine facilities, the population per-hectare, and the number
of baby less than 6 months old who are exclusively breastfed. Based on clustering using the Gaussian
Mixture Model (GMM) method with the best model being Ellipsoidal, Equal Volume and Shape (EEV)
as many as 5 components, it is concluded that cluster 5 is a group of areas that have a high prevalence
of diarrhea in children under five in 2022 so that the need for intervention and efforts of the Bandung
City government in preventing higher cases of diarrhea in children under five in six districts including
Bandung Kidul, Bandung Wetan, Cicendo, Cidadap, Cinambo, and Sumur Bandung. Cluster 2 consists
of sub-districts with good sanitation, including Antapani, Arcamanik, Buah Batu, Cibiru, Gedebage,
Lengkong, Mandalajati, Panyileukan, and Sukasari.
It is important to educate the sub-districts included in cluster 5 about better hygiene and
sanitation facilities to prevent an increase in under-five diarrhea cases to achieve the Sustainable
Development Goals (SDGs) and improve the level of children under five health in Bandung City.

FUNDING
This research is supported by the Department of Statistics, Padjadjaran University and the Rector
of Padjadjaran University.

CONFLICT OF INTERESTS
The authors declare that there is no conflict of interests.
16
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

REFERENCES
[1] N. J. CaJacob and M. B. Cohen, "Update on Diarrhea," Pediatrics in Review, vol. 37, no. 8, pp. 313-322, 2016.

[2] L. R. Schiller, D. S. Pardi, and J. H. Sellin, "Chronic diarrhea: diagnosis and management," Clinical

Gastroenterology and Hepatology, vol. 15, no. 2, pp. 182-193, 2017.

[3] "Prevalence and determinants of diarrhea among under-five children in five Southeast Asian countries: Evidence

from the demographic health survey," ScienceDirect.

[4] E. Yunitasari, R. Pradanie, H. Arifin, D. Fajrianti, and B. O. Lee, "Determinants of stunting prevention among

mothers with children aged 6–24 months," Open Access Macedonian Journal of Medical Sciences, vol. 9, no. B,

pp. 378-384, 2021.

[5] M. M. Levine, D. Nasrin, S. Acácio, Q. Bassat, H. Powell, S. M. Tennant, et al., "Diarrheal disease and

subsequent risk of death in infants and children residing in low-income and middle-income countries: analysis

of the GEMS case-control study and 12-month GEMS-1A follow-on study," The Lancet Global Health, vol. 8,

no. 2, pp. e204-e214, 2020.

[6] K. Takanashi, Y. Chonan, D. T. Quyen, N. C. Khan, K. C. Poudel, and M. Jimba, "Survei praktik kebersihan

makanan di rumah dan diare masa kanak-kanak di Hanoi, Vietnam," J Kesehatan Popul Nutr, vol. 27, no. 5, pp.

602-611, 2009.

[7] M. B. Karo, "Perilaku hidup bersih dan sehat (PHBS) strategi pencegahan penyebaran Virus Covid-19," in

Prosiding seminar nasional hardiknas, vol. 1, pp. 1-4, 2020.

[8] N. Rohmah and F. Syahrul, "Hubungan kebiasaan cuci tangan dan penggunaan jamban sehat dengan kejadian

diare balita," Jurnal Berkala Epidemiologi, vol. 5, no. 1, pp. 95-106, 2017.

[9] S. Tanuwidjaya, "Konsep umum tumbuh dan kembang," in Tumbuh kembang anak dan remaja. Edisi ke-1, h. 1-

12, Jakarta: Sagung Seto, 2002.

[10] L. Novita, D. A. Gurnida, and H. Garna, "Perbandingan fungsi kognitif bayi usia 6 bulan yang mendapat dan

yang tidak mendapat ASI eksklusif," Sari Pediatri, vol. 9, no. 6, pp. 429-34, 2016.

[11] D. H. Vu, K. M. Muttaqi, and A. P. Agalgaonkar, "A variance inflation factor and backward elimination based

robust regression model for forecasting monthly electricity demand using climatic variables," Applied Energy,

vol. 140, pp. 385-394, 2015.


17
CLUSTERING OF CHILDHOOD DIARRHEAL DISEASES

[12] R. S. Gómez, A. R. Sánchez, C. G. García, and J. G. Pérez, "The VIF and MSE in raise regression," Quantitative

Methods for Economics and Finance, p. 325, 2021.

[13] Z. Wahidah and D. T. Utari, "COMPARISON OF K-MEANS AND GAUSSIAN MIXTURE MODEL IN

PROFILING AREAS BY POVERTY INDICATORS," BAREKENG: Jurnal Ilmu Matematika dan Terapan, vol.

17, no. 2, pp. 0717-0726, 2023.

[14] L. Scrucca, "Identifying connected components in Gaussian finite mixture models for clustering," Computational

Statistics & Data Analysis, vol. 93, pp. 5-17, 2016.

[15] S. Belciug and D. G. Iliescu, "Deep learning and Gaussian Mixture Modelling clustering mix. A new approach

for fetal morphology view plane differentiation," Journal of Biomedical Informatics, vol. 143, p. 104402, 2023.

[16] N. Shen and B. González, "Bayesian information criterion for linear mixed-effects models," arXiv preprint

arXiv:2104.14725, 2021.

[17] C. E. Troeger et al., "Quantifying risks and interventions that have affected the burden of diarrhea among children

younger than 5 years: an analysis of the Global Burden of Disease Study 2017," The Lancet Infectious Diseases,

vol. 20, no. 1, pp. 37-59, 2020.

[18] N. Thapar and I. R. Sanderson, "Diarrhea in children: an interface between developing and developed countries,"

The Lancet, vol. 363, no. 9409, pp. 641-653, 2004.

[19] V. Diwan, Y. D. Sabde, E. Byström, and A. De Costa, "Treatment of pediatric diarrhea: a simulated client study

at private pharmacies of Ujjain, Madhya Pradesh, India," The Journal of Infection in Developing Countries, vol.

9, no. 05, pp. 505-511, 2015.

[20] G. Mengistu et al., "Self-reported and actual involvement of community pharmacy professionals in the

management of childhood diarrhea: a cross-sectional and simulated patient study at two towns of Eastern

Ethiopia," Clinical Medicine Insights: Pediatrics, vol. 13, pp. 1179556519855380, 2019.

[21] V. Melnykov and R. Maitra, "Finite mixture models and model-based clustering."

[22] C. Fraley and A. E. Raftery, "Model-based clustering, discriminant analysis, and density estimation," Journal of

the American statistical Association, vol. 97, no. 458, pp. 611-631, 2002.

[23] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM

algorithm," Journal of the royal statistical society: series B (methodological), vol. 39, no. 1, pp. 1-22, 1977.
18
DEFI YUSTI FAIDAH, ASHILLA MAULA HUDZAIFA

[24] G. Schwarz, "Estimating the dimension of a model," The annals of statistics, pp. 461-464, 1978.

[25] R. A. Stine, "Model selection using information theory and the MDL principle," Sociological Methods &

Research, vol. 33, no. 2, pp. 230-260, 2004.

[26] P. N. Tan, M. Steinbach, and V. Kumar, "Data mining cluster analysis: basic concepts and algorithms,"

Introduction to data mining, pp. 487-533, 2013.

[27] A. Athifaturrofifah, R. Goejantoro, and D. Yuniarti, "Perbandingan Pengelompokan K-Means dan K-Medoids

Pada Data Potensi Kebakaran Hutan/Lahan Berdasarkan Persebaran Titik Panas," EKSPONENSIAL, vol. 10,

no. 2, pp. 143-152, 2020.

[28] A. P. Kusuma and D. M. Sukendra, "Analisis spasial kejadian demam berdarah dengue berdasarkan kepadatan

penduduk," Unnes Journal of Public Health, vol. 5, no. 1, pp. 48-56, 2016.

[29] S. Rini and Rohayati, "Kejadian Diare pada Bayi dengan Pemberian ASI," Journal of Nursing, vol. 11, no. 2, pp.

153-156, 2015.

[30] N. M. E. Wardani, K. A. Witarini, P. J. Putra, and I. W. D. Artana, "Pengaruh Pemberian ASI Eksklusif Terhadap

Kejadian Diare pada Anak Usia 1-3 Tahun," Jurnal Medika Udayana, vol. 11, no. 01, pp. 12-17, 2022.

[31] L. B. Wadu, A. F. Gultom, and F. Pantus, "Penyediaan Air Bersih Dan Sanitasi: Bentuk Keterlibatan Masyarakat

Dalam Pembangunan Berkelanjutan," Jurnal Pendidikan Kewarganegaraan, vol. 10, no. 2, pp. 80-88, 2020.
450

400

350

300
CASES

250

200

150

100

50

SUB-DISTRICT

Figure 1. Many Cases of Diarrhea in Children Under Five in Bandung City.


Table 1. Covariance matrix and geometric formed of Mclust in the Gaussian Mixture Model.
Symbol Model Volume Geometry Shape Orientation Shape
EII 𝜆𝐼 Same Same - Spherical
VII 𝜆𝑘 𝐼 Different Same - Spherical
EEI 𝜆𝐴 Same Same Coordinate axes Diagonal
VEI 𝜆𝑘 𝐴 Differrent Same Coordinate axes Diagonal
EVI 𝜆𝐴𝑘 Same Different Coordinate axes Diagonal
VVI 𝜆𝑘 𝐴𝑘 Different Different Coordinate axes Diagonal
EEE 𝜆𝐷𝐴𝐷𝑇 Same Same Identity Ellipsoidal
EEV 𝜆𝐷𝑘 𝐴𝐷𝑘𝑇 Same Same Different Ellipsoidal
VEV 𝜆𝑘 𝐷𝑘 𝐴𝐷𝑘𝑇 Different Same Different Ellipsoidal
VVV 𝜆𝑘 𝐷𝑘 𝐴𝑘 𝐷𝑘𝑇 Different Different Different Ellipsoidal
Figure 2. Mapping of Sub-districts based on Prevalence Rate of Diarrhea in Bandung City.
Figure 3. Mapping of Sub-districts based on Percentage of Households with Clean and Healthy Living Behaviours in Bandung City.
Figure 4. Mapping of Sub-districts based on Percentage of Households with Healthy Latrine Facilities in Bandung City.
Figure 5. Mapping of Sub-districts based on Population Per-hectare in Bandung City.
Figure 6. Mapping of Sub-districts based on Number of Baby less than 6 Months Old who is Exclusively Breastfed in Bandung City.
Table 2. VIF value
Variables VIF
Prevalence of diarrhea in children under five 1.268
Households with Clean and Healthy Living Behaviours 1.129
Households with Healthy Latrine Facilities 1.368
Population density 1.542
baby <6 months old who is exclusively breastfed 1.150
Figure 7. BIC Value of GMM results.
Table 3. Five component of EEV model
Loglikehood n df BIC ICL
-31.9397 30 84 -349.58 -349.5899

Table 4. Means of cluster


Variabel Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5
Prevalence of Diarrhea in Children Under Five -0.345 -0.051 -0.547 -0.057 0.915
Households with Clean and Healthy Living Behaviours -0.238 0.790 -0.508 0.849 -1.089
Households with Healthy Latrine Facilities 0.037 0.884 -0.380 -1.328 -0.158
Population Density 0.149 -0.547 1.643 0.266 -0.877
Baby < 6 Months Old who is Exclusively Breastfed 1.476 -0.420 -0.001 0.098 -0.905
Figure 8. Cluster map of diarrhea in children under five in Bandung City.
Table 5. Clustering and characteristics of each cluster
Cluster Characteristics Sub-districts
1 Components in cluster 1 tend to be characterized by a high number of Andir, Bandung Kulon, Bojongloa Kidul, Coblong,
baby less than 6 months old who is exclusively breastfed. Rancasari, and Ujung Berung.
2 Components in cluster 2 tend to be characterized by a high percentage Antapani, Arcamanik, Buah Batu, Cibiru, Gedebage,
of households clean and healthy living behaviours, and also healthy latrine Lengkong, Mandalajati, Panyileukan, and Sukasari.
facilities.
3 Components in cluster 3 tend to be characterized by high population Astana Anyar, Batununggal, Bojong Kaler, Cibeunying
density per-hectare. Kidul, and Kiaracondong,
4 Components in cluster 4 tend to be characterized by a high percentage Babakan Ciparay, Cibeunying Kaler, Regol, and
of households clean and healthy living behaviours and a high per-hectare Sukajadi
population density.
5 Components in cluster 5 tend to be characterized by a high prevalence Bandung Kidul, Bandung Wetan, Cicendo, Cidadap,
of diarrhea among under-fives, as well as a low percentage of Cinambo, and Sumur Bandung.
households clean and healthy living behaviours and number of baby less
than 6 months old who is exclusively breastfed.

You might also like