Peerj Cs 2093
Peerj Cs 2093
ABSTRACT
In the realm of multi-label learning, instances are often characterized by a plurality
of labels, diverging from the single-label paradigm prevalent in conventional datasets.
Multi-label techniques often employ a similar feature space to build classification mod-
els for every label. Nevertheless, labels typically convey distinct semantic information
and should possess their own unique attributes. Several approaches have been suggested
to identify label-specific characteristics for creating distinct categorization models.
Our proposed methodology seeks to encapsulate and systematically represent label
correlations within the learning framework. The innovation of improved multi-label
Naïve Bayes (iMLNB) lies in its strategic expansion of the input space, which assimilates
meta information derived from the label space, thereby engendering a composite input
domain that encompasses both continuous and categorical variables. To accommodate
the heterogeneity of the expanded input space, we refine the likelihood parameters of
iMLNB using a joint density function, which is adept at handling the amalgamation of
data types. We subject our enhanced iMLNB model to a rigorous empirical evaluation,
utilizing six benchmark datasets. The performance of our approach is gauged against the
Submitted 13 February 2024 traditional multi-label Naïve Bayes (MLNB) algorithm and is quantified through a suite
Accepted 8 May 2024 of evaluation metrics. The empirical results not only affirm the competitive edge of our
Published 10 December 2024 proposed method over the conventional MLNB but also demonstrate its superiority
Corresponding author across the aforementioned metrics. This underscores the efficacy of modeling label
Mhd Omar Al-Kadri, dependencies in multi-label learning environments and positions our approach as a
[email protected]
significant contribution to the field.
Academic editor
Ivan Miguel Pires
Additional Information and Subjects Algorithms and Analysis of Algorithms, Artificial Intelligence, Data Mining and
Declarations can be found on Machine Learning, Neural Networks
page 27 Keywords Multi-label Naïve Bayesian classification, Label dependency, Input space expansion,
DOI 10.7717/peerj-cs.2093 Heterogeneous feature space, Mixed joint density distribution
Copyright
2024 Chitra et al. INTRODUCTION
Distributed under The advent of multi-label classification has brought forth a paradigm shift in machine
Creative Commons CC-BY 4.0
learning, challenging the traditional confines of single-label datasets (Han et al., 2023). In a
OPEN ACCESS multi-label context, instances are inherently complex, often associated with multiple labels
How to cite this article Chitra PKA, Balasubramanian SB, Khattab O, Al-Kadri MO. 2024. Label dependency modeling in Multi-Label
Naïve Bayes through input space expansion. PeerJ Comput. Sci. 10:e2093 http://doi.org/10.7717/peerj-cs.2093
simultaneously, which reflects the multifaceted nature of real-world data. This complexity
necessitates algorithms capable of discerning the subtle interdependencies among labels
to make accurate predictions. Naïve Bayes, a stalwart in the pantheon of classification
algorithms due to its simplicity and efficacy, traditionally operates under the assumption
of label independence—an assumption which is starkly at odds with the multi-label
environment. Our research introduces an innovative approach to multi-label Naïve
Bayes (iMLNB), which not only acknowledges but also capitalizes on the dependencies
between labels. By expanding the input space to include meta information from the label
space, we construct a more informative and nuanced feature set. This paper delineates
the development of this enhanced iMLNB model, its theoretical underpinnings, and the
empirical validation of its performance across diverse datasets. In doing so, we aim to set a
new benchmark for multi-label classification and open avenues for further exploration in
this dynamic field.
Traditional single-label classification (SLC) algorithm associates every instance in data
f exactly to single output (`). SLC is defined as in Eq. (1).
SLC(h) : f i 7→ ` (1)
where |`| =1. By definition (Han et al., 2023), multi-label classification (MLC) is a
generalization of supervised single-label classification undertaking at which just about
every data example could possibly be related to a pair of class labels and each label contains
binary outcomes. Let F be the set of Features, F = { f 1 , f 2 , . . . , f k } and L be the set of
labels, L = {`1 ,`2 ,`3 .....,`m }, the multi-label dataset D is D = {F, L}, the `i = 1, if the
label is and relevant with an instance and `i =0, otherwise. The objective of multi-label
classification is to map a sample instance χ ∈{D} to a label set ` ¤ {£} defined as in Eq. (2)
m
Y
MLC(h) : f i 7→ ` (2)
i=1
where |`| ≥ 2. Each label contains binary outcomes. Due to an increase in application
areas wherein samples demand more than single-label, multi-label is an emerging field
of machine learning. Various types of classification tasks include: (i) binary classification,
(ii) multi class classification and (iii) multi-label classification. Multi-label classification
is used in the field of bioinformatics document and music categorization (Huang et al.,
2023; Trohidis et al., 2008), semantic image and video annotation (Zhang, 2024; Feng &
Xu, 2010), drug side effect prediction (Zhang et al., 2016), species distribution modelling
(Jones, Miller & White, 2011), etc. Naïve Bayes is a supervised learning method where a
simple probabilistic classifier based on Bayes’ theorem is used to predict unseen data and
features are assumed to be independent.
Naïve Bayes had been extended to multi-label Naïve Bayes (MLNB) under the
assumption of label independence. MLNB uses conditional probability to estimate
likelihood of the class label conditioned on predictive attribute space. P probability of
a class for given attributes is the factorization of joint probability, as in Eq. (3).
Y
HNB : X → (P(`b1 ),P(`b2 ),...P(`b|L| )) (3)
1≤j≤nd
LITERATURE SURVEY
The exploration of multi-label methods has yielded significant insights into the
transformation of data within a multi-label environment. Du et al. (2024) underscored
the binary relevance (BR) method as a fundamental baseline for this transformation. BR
simplifies the complexity of multi-label problems by decomposing them into individual
binary classification tasks, one for each label within the label space L. This decomposition
allows for each instance in the training set to be distinctly labelled as relevant or irrelevant,
contingent upon its association with the respective multi-label example.
Building upon this foundation, an innovative modification to BR was introduced
by Radovanović et al. (2023), termed the classifier chain (CC). This method employs a
sequence of classifiers, each predicting the binary association of labels while enhancing the
feature space with the labels previously predicted in the chain. Despite its efficacy, the CC
method introduces an additional layer of complexity due to the necessity of maintaining
the order of the chains. This complexity was adeptly managed through the implementation
of ensemble methods, which aggregate the predictions of labels, allowing each label to
accrue votes. A threshold is then applied to discern the relevant labels, streamlining the
decision-making process.
The utility of BR, despite its limitations, was further elucidated by Luaces et al. (2012).
The authors demonstrated BR’s effectiveness as a baseline method for multi-label learning,
particularly for its computational efficiency and its capability to produce optimal methods
when targeting macro-averaged measure-based loss functions. Their assertions were
substantiated through the use of benchmark datasets as well as synthetic datasets, which
also revealed BR’s competitive edge against more intricate, conventional approaches,
especially in scenarios characterized by high-dimensional and highly correlated label
spaces.
Addressing the often-overlooked label dependencies in BR, Alvares-Cherman, Metz &
Monard (2012) extended the BR approach to effectively predict label combinations by
enriching the feature space with comprehensive label information from the label space,
The burgeoning volume of social media usage and the consequent increase in data
generation was assessed by Rani et al. (2023). The authors highlighted the importance of
pre-processing techniques such as stemming and stop word elimination in sentiment
analysis, with their methodology showing that models employing these techniques,
particularly in conjunction with multi-class Naïve Bayes, significantly outperformed
others.
Lastly, the performance comparison between artificial neural networks (ANN ),
multinomial Naïve Bayes, and support vector machine (SVM ) approaches in data
classification was addressed by Rani et al. (2023). The authors selected ANN for its
versatility across various fields and tasks, including data generation, classification, and
regression. Their research utilized diverse datasets, ranging from single-label to multi-
label, and employed k-fold validation to ensure the robustness and reproducibility of the
model’s results, with performance metrics such as the F1-score serving as benchmarks for
evaluation. The ability to identify and interpret emotions conveyed through written text has
become crucial in various domains. Chochlakis et al. (2023) examined methods to leverage
label correlations in multi-label emotion recognition models in order to enhance emotion
One_error ↓. It quantifies the lack of highly ranked labels against relevant label in the
training set; with values ranging from 0 to 1 and is similar to classification error in
conventional methods. Smaller one_error value describes better performance and is
described in Eq. (6).
N
1X
One_error(h̄) = argmax (xi λ) (6)
N λεL
i=1
Ranking loss ↓. For the provided example, it calculates the mean of label pairs that are
arranged in reverse; and is presented in Eq. (7).
N
1 X |la ,lb |
Ranking Loss = (7)
N l l
i=1 xi xi
Where rank of la is greater than lb and (la ,lb ) ∈ xil X xil and the algorithm performs better
when the ranking loss is less.
Average precision ↑. It figures out the average ratio of significant labels in each label pair
before averaging all relevant labels. It is defined in Eq. (8),
0
1 X 1 X λ ∈ xi
N l
Average Precision = l
(8)
N xi ri (λ)
i=1 λ∈xi
l
0
where ri λ ≤ ri (λ).The learning algorithm performs better with a higher average
precision value, and when average precision equals 1, the method performs flawlessly.
Subset accuracy ↑. The Jaccard similarity coefficient among label sets (x i) and (y i) provides
as its definition. This assessment is an average across all instances. The description is given
in Eq. (9).
N
1 X h̄(xi ) ∩ yi
Subset _accuracy = (9)
N
i=1
h̄(xi ) ∪ yi
Proposed method
The multi-label Naïve Bayes is implemented on the basis of Bayes’ rule as in Eq. (13).
prior X likelihood
P l bi ∈ L|∀xi ∈ D =
argmax (13)
b∈{0,1}and1≤i≤|L| evidance
The MLNB is implemented with the main baseline binary relevance (BR) method. The
posterior probability of an unseen instance from a test instance is proportional to the
product of conditional probability of all attribute values as in Eq. (14).
P(L = l bi ) 1≤j≤nd P(xj |l bi )
Q
P (L|X ) = argmaxb∈{0,1}and1≤i≤|L| (14)
P(X = xj ∈ D)
The prior probability P(L = l bi ) could be calculated from labelled training data and
denominator P(X = xj ) is not dependent on label L.
The likelihood estimation is done based on the nature of the datasets used in research.
The BR method is criticized for its label independent assumption. Improvement of MLNB
is twofold: (1) feature space reconstruction with meta data and (2) modelling parameters to
handle new input space at single shot. Input space expansion method involves two options:
(1) method uses predicted value of response variables as input for the following response;
(2) actual values of the response variable are used as input for prediction of next response.
The first approach may propagate errors of the predicted value in the training phase. The
latter approach uses true values only where error propagation is minimized and results in
a more accurate classifier performance. This research work uses latter approach for input
space expansion. Figure 1 illustrates the proposed framework.
F is the newly expanded feature space with all labels except the label of concern for that
current iteration. The augmentation passes label information as meta data in to feature
space, allowing the Naïve Bayes to include label dependencies and overcomes the label
independence problem by MLNB. In the training phase, the meta data carries actual label
values, but in the testing phase the meta data is the predicted value of response variables.
However, the size of newly generated feature space may grow exponentially with huge
E[l 0i ] is the expected value of l 0i . The responses with correlation p > 0.5 are taken as the
response with richer information, others (p < 0.5) are excluded as it may cause noise in
the feature space. The targets with richer information are appended to the feature space.
This enhanced feature space contains the information of predictors as well as the targets.
This feature set includes all the relevant labels in the training phase. This expanded space
is the union of multiple continuous and discrete binary predictors. The multi-label Naïve
Bayes has to be enhanced to handle this expanded space with heterogeneous predictors.
// n → number of instances in the train data, m → number of labels in the label space,
Smooth Parameter S // Set to 1
Initialization: Initialize the Dnew // Initialize to null
Output: posterior probability of the response variables
Algorithm iMLNB
Begin
Stage 1 –Feature Space Reconstruction
for k ∈ 1 . . . |L|// for every label in the label space
do // construct dataset with label information
Dnew ← { } // initialization
for (x,l) ∈ D
do
1+f
P(Fi = fi |L = lj ;2ij ); 2ij = q+
Pm kj // discrete case
k=n+1 fkj
σi2 ) ×
Qn Qm
Likelihood = i=0 g (fi |lj ;µi ), k=n+1 P(Fk = fk |L = lj ;2kj )
end for
end for
P(L=lj )× ni=0 g (fi |lj ;µi ,σi2 )× m
Q Q
k=n+1 P(Fk =fk |L=lj ;2kj )
P(lj = 1|F )) = argmaxlj εL;fi εF P(F =fi )
P lj = 0 F ) = 1 - P lj = 1 F )
end for
end
For each continuous predictor F = fi , the probability density function is g fi lj ;µij ,σij2 ),
where µij and σij2 are the mean and standard deviation of F = fi . This work assumes the
continuous predictors follow Gaussian distribution. The conditional probability density
function for new feature space with length of m is defined as follows in Eqs. (18) and (19).
By applying Bayesian rule under the assumption of class conditional independence among
features and employing the law of total probability, Eq. (20) defines Naïve Bayes’ classifier.
P(lj = 1|F ) =
Qn
i=0 g (fi |lj ;µi ,σi ) ×
2
Qk
P(L = lj ) × k=n+1 P(Fk = fk |L = lj ;2kj )
argmaxlj εL;fi εF (20)
P(F = fi )
is the Bernoulli Probability distribution for discrete predictors and the parameter 2ij is
defined as follows in Eq. (23).
1 + fkj
2ij = Pm (23)
q+ k=n+1 fkj
g (fi |lj ;µi ,σi2 ), is the Gaussian probability distribution for continuous predictors as in
Eq. (24) and the parameters µi , σi2 are as follows in Eqs. (25) and (26),
−(fi −µij )2
1 2σij2
P(Fi = fi |L = lj ) = g (fi |lj ;µij ,σij2 ) = √ e (24)
σij 2π
Predicted value of labels in the training phase, used for the input space expansion, are
used for the prediction of test instances that have no label values and the pseudo code for
testing phase is shown in Algorithm 2.
EXPERIMENTAL SETUP
This section describes the list of multi-label datasets used for this experimentation along
with the evaluation metrics used to evaluate the learning algorithms.
Datasets description
Sixteen benchmark datasets are used for the analysis of state-of-the-art algorithms and
the proposed method. All benchmark data used here are available in the MULAN data
repository. All datasets can be found at http://mulan.sourceforge.net/datasets-mlc.html and
http://meka.sourceforge.net/#datasets. Corel16k, MediaMill, CAL500, Enron, Corel5k, Rcv1
(subset1), Rcv2 (subset2), EUR-Lex (subject matters), tmc2007, Yeast, Bibtex, Medical,
Genbase, Slashdot, Emotions, and Scene are the datasets used for this research. The
descriptions of each dataset are as follows:
The datasets used for the experimentation are taken from various domains like music,
text, image, video, and biology.
Corel5k: Each image segment is represented by 36 features. There are 5,000 Corel images
in this dataset. The vocabulary consists of 374 words in total, so each image contains 4–5
keywords. Segmenting of image is calculated via normalized cuts. For every image, there
are typically 5–10 sectors, and only those bigger than a threshold are used. The final image
is depicted by 499 blobs generated by the aggregation of the regions using k-means. The
dataset was described by De Lima et al. (2022) and the same is available in the MULAN
source forge forum.
MediaMill: The American National Institute of Standards and Technology made an
effort to push the research in the areas of indexing and content-based retrieval of digital
video and automatic segmentation. A processed version of 85 h of video content from
TRECVID data was presented by Long et al. (2024). Each item in the MediaMill dataset
can correspond to one or more classes in a general video indexing problem. It includes
low-level multimedia elements that have already been pre-computed from the 85 h of
worldwide broadcast news footage in the TRECVID 2005/2006 benchmark. A total of
101 concept lexicons were chosen random from the videos. Tony Blair, Desert, Prisoner,
Smoke, Waterfall, Football and Tree were used as the concepts. Features were taken from
visual content from specific regions within each key frame by choosing the colour-invariant
S. no Dataset Domain #n #d #l LC LD
1. Corel6k Image 5,000 499 44 2.214 0.050
2. Mediamill Video 43,907 120 29 4.010 0.138
3. CAL500 Music 502 68 174 25.058 0.202
4. Enron Text 1,702 1,001 53 3.378 0.130
5. Corel15k Image 13,766 500 161 2.867 0.018
6. Rcv1(subset 1) Image 6,000 472 42 2.458 0.059
7. Rcv1(subset 2) Image 6,000 472 39 2.170 0.056
8. Eurlex-sm Text 19,348 250 27 1.492 0.055
9. tmc2007 Text 28,596 500 15 2.100 0.140
10. Yeast Biology 2,417 103 13 4.233 0.325
11. Bibtex Text 7,395 1,836 159 2.402 0.015
12. Medical Text 978 1,449 45 1.275 0.077
13. Genbase biology 662 1,186 27 1.252 0.046
14. Slashdot Text 3,782 53 14 1.134 0.081
15. Emotions Music 593 72 6 1.869 0.311
16. Scene image 2,407 294 6 1.074 0.179
of the documents that include Design, Align and Contamination which has 500 features
in the form of Boolean bag-of-words is being used as the feature representation of the free
text reports and there are 285,596 reports and 22 possible problem categories (Lewis et al.,
2004).
Yeast: This dataset comprises gene micro-array expressions and phylogenetic
information of yeast Saccharomyces cerevisiae organism which is used to predict the
functional class of genes of the yeast Saccharomyces cerevisiae based on micro-array
expressions data (Fürnkranz et al., 2008). The dataset is generated by conducting multiple
tests on yeast after modifying gene response. Experiments include diauxtic shift, the mitotic
cell division cycle, temperature reduction shocks and speculation. Each gene is associated
with a collection of functional classes whose maximum size may be more than 190, and is
described by the concatenation of micro-array expression data and phylogenetic profile.
In reality, the entire collection of functional classes is organized into hierarchies that can
reach a depth of four tiers and the authors of Fürnkranz et al. (2008) used a reduced set of
classes to 14 categories and examples of these classes including protein synthesis, cellular
biogenesis and metabolism. The dataset describes 2,417 genes using 103 numerical features.
Table 2 presents the description of the sixteen datasets used for this research.
Bibtex: The targets in this dataset have the automatic annotation of bookmarks and
Bibtex entries (Srivastava & Zane-Ulman, 2005) in the Bibsonomy social bookmarking
website. In this dataset, the issue of tagging Bibtex entries is addressed. The authors
kept journal, Bibtex abstract and title fields of the Bibtex entries as the relevant tags for
the desired classification system. Here all the textual contents are converted into binary
bag-of-words and these are treated as the features of this dataset. The authors specifically
focused on tags that were associated with 50 Bibtex entries and words appearing in a
prevalent families of protein function families, 27 class labels are made up. Examples
of target label families include receptors, transferees, cytokines-and-growth-factors, and
DNA- or RNA-associated proteins.
Scene: The objective is to automatically predict the label set for test images by analyzing
images that have label sets (Trohidis et al., 2008). The 2,000 photos of natural scenes make
up the experimental data collection. Each image is given a series of labels that are manually
allocated, and the class labels that can be used include desert, mountains, sea, sunset, and
trees. Over 22% of the dataset consists of photos with more than one class, such as sea
and sunset, but several combined classifications, such as mountains, sunsets and trees, are
incredibly uncommon. These dataset descriptions are available in MULAN source forge
forum (Diplaris et al., 2005).
achieves a low Hamming loss value of 0.01012, followed closely by MLNB at 0.01201, and
MLkNN at 0.0103, indicating a strong performance in accurately classifying example-label
pairs. Table 4 provides a comparative analysis of subset accuracy, which is another crucial
metric for evaluating multi-label classifiers. Subset accuracy requires an exact match
between the predicted and true label sets for an instance to be considered correct. The
performance ranking of the classifiers, according to subset accuracy, is as follows: BPMLL
performs the least accurately, followed by PCT, ML-C4.5, MLNB, MLkNN, and iMLNB,
which leads the group. The proposed iMLNB demonstrates significantly higher subset
accuracy, with a performance value of 0.8034, outperforming the existing MLNB method’s
score of 0.7998 and the scores of other comparative methods.
Tables 5 and 6 detail the label-based metrics of recall and F1-score, respectively, for the
evaluated methods, including the proposed iMLNB. In these label-centric evaluations, the
iMLNB method consistently demonstrates superior performance. Tree-based approaches,
on the other hand, fall short in recall values, indicating a tendency to miss relevant labels.
For the F1-score, balances precision and recall, both the neural network-based method
and the k-nearest neighbor approach exhibit suboptimal performance, with tree-based
methods again yielding the least favorable results.
The classifiers’ ranking, based on the Hamming loss measure, places the improved
multi-label Naïve Bayes (iMLNB) as the most accurate, followed by the standard MLNB,
ML-C4.5, Probabilistic classifier trees (PCT ), multi-label k-nearest neighbors (MLkNN ),
and finally the back-propagation multi-label learning (BPMLL). Notably, iMLNB achieves
the lowest Hamming loss in four out of the six datasets examined. While the MLkNN, a lazy
learning method, shows commendable performance following closely behind the Bayesian
approaches; the neural network-based method registers a markedly lower performance with
a subset accuracy of 0.2340, indicating a less effective approach in this context. This suggests
that while Bayesian methods are robust for multi-label classification, neural network-based
methods may require further refinement to achieve competitive performance in this
domain.
Tables 7 and 8 focus on evaluation of one-error and ranking loss. One-error measures
the frequency with which the top-ranked label is not in the set of true labels of an instance,
while ranking loss evaluates the average number of label pairs that are incorrectly ordered.
In these assessments, the iMLNB method outshines the others, delivering the most accurate
results. The standard MLNB method follows closely behind, securing the second-best
performance. The k-nearest neighbor and neural network methods trail behind the Naïve
Bayes-based approaches, indicating that while they have merit, they may not be as adept
at handling the complexities of multi-label classification as the Naïve Bayes variants,
particularly in the context of this study.
Ranking loss exhibits how often the top-ranked labels are not included in the set of
correct labels for the example. The proposed method offers most accurate predictions for
the top-ranked labels in the examples used for modelling. The average precision defines
the mean value of fraction of labels ranked above a particular label l ∈ L that is present in
L and the results for average precision are presented in Table 9. The predictions of iMLNB
have the smallest deviation from the true label set of instances. The iMLNB provides the
most accurate predictions, followed by the MLNB and MLkNN, respectively. The decision
tree gives low precision value.
Major strengths
Innovative approach
The study presents a new method for multi-label classification known as improved multi-
label Naïve Bayes. It allows for the investigation of label correlations by enlarging the input
space with meta information from the label space.
Table 8 Ranking loss comparison of existing methods and the proposed method.
Performance evaluation
The paper evaluates the upgraded iMLNB model’s performance through a thorough
empirical examination that uses six benchmark datasets. The performance of the suggested
method is contrasted with the conventional multi-label Naïve Bayes algorithm through
the use of a variety of assessment criteria. The empirical results indicate that the proposed
method outperforms the conventional MLNB algorithm across the evaluated metrics.
Table 10 shows running times of all classifiers used in this research. The training time for
iMLNB is O (|8| Dl i∈L +|L|), 8 = |L| –{l j }. PCT shows low training time, MLNB follows the
above. Proposed method iMLNB exhibits a slightly higher running time than the MLNB
method as the feature spce for this method increases as the number of labels increases
in label space. However, iMLNB consumes less time as similar to PCT. Figure 3 shows a
comparison of running times among existing methods and the proposed methods.
Table 11 consolidates the average performance metrics, providing a holistic view of the
effectiveness of the proposed iMLNB system in comparison to the existing MLNB method
and other contemporary state-of-the-art approaches. The data underscore the enhanced
capability of iMLNB, which consistently outperforms the other methods across various
measures. The test time for iMLNB is O (Dl i∈L |L|), as the number of labels for the feature
space expansion is unknown for the test data. Figure 4 illustrates the competence of the
proposed iMLNB method over conventional methods. The iMLNB method outperforms
other techniques in classification accuracy, albeit with a slightly higher training time than
MLNB. But in average the PCT takes less time to model the training data. Training time
for ML-C4.5 is longer than neural network.
Figure 3 Comparison of running time among existing methods and proposed method.
Full-size DOI: 10.7717/peerjcs.2093/fig-3
CONCLUSION
In this study, we have revisited the traditional multi-label Naïve Bayes (MLNB) method
and introduced an enhanced classification approach, the improved multi-label Naïve Bayes
(iMLNB). The cornerstone of iMLNB is the expansion of the feature space to include all-
but-one label information, which allows for the modeling of label dependencies. However,
this expansion can lead to an exponentially growing input space and potential overfitting.
To mitigate this, iMLNB selectively enriches the input space with highly correlated labels,
ensuring a more robust and informative feature set. The iMLNB method employs a
two-stage process. Initially, it expands the feature space as described, and subsequently, it
constructs a model using this augmented space, which comprises a heterogeneous mix of
continuous and discrete data. To accommodate this diversity, the likelihood estimation
in Naïve Bayes is adapted to a mixed joint density function, combining Gaussian and
Bernoulli distributions, thus enabling the handling of both data types effectively. The
expanded feature space is pivotal, as it allows the iMLNB to integrate label correlation into
the prediction process, significantly enhancing classification accuracy. The empirical results
from this research are compelling, with the iMLNB method not only incorporating label
information more effectively during prediction but also improving classification accuracy
and reducing Hamming loss.
Funding
The authors received no funding for this work. The Open Access APC is provided by Qatar
National Library. The funders had no role in study design, data collection and analysis,
decision to publish, or preparation of the manuscript.
Competing Interests
The authors declare there are no competing interests.
Data Availability
The following information was supplied regarding data availability:
The dataset is available from an open-source Java library, MULAN: https://mulan.
sourceforge.net/datasets-mlc.html. These multi-label datasets consist of training examples
of a target function with multiple binary target variables.
The raw data and code are available in the Supplemental Files.
Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.2093#supplemental-information.
REFERENCES
Alvares-Cherman E, Metz J, Monard MC. 2012. Incorporating label dependency into
the binary relevance framework for Multi label classification. Expert Systems with
Applications 39(2):1647–1655 DOI 10.1016/j.eswa.2011.06.056.
Cheng Y, Qian K, Wang Y, Zhao D. 2020. Missing Multi label learning with non-
equilibrium based on classification margin. Applied Soft Computing 86:105924
DOI 10.1016/j.asoc.2019.105924.
Chochlakis G, Mahajan G, Baruah S, Burghardt K, Lerman K, Narayanan S. 2023.
Leveraging label correlations in a multi-label setting: a case study in emotion. In:
ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal pro-
cessing (ICASSP). Piscataway: IEEE, 1–5 DOI 10.1109/ICASSP49357.2023.10096864.
De Lima RR, Fernandes AM, Bombasar JR, Da Silva BA, Crocker P, Leithardt VRQ.
2022. An empirical comparison of portuguese and multilingual bert models for auto-
classification of ncm codes in international trade. Big Data and Cognitive Computing
6(1):8 DOI 10.3390/bdcc6010008.
Diplaris S, Tsoumakas G, Mitkas PA, Vlahavas I. 2005. Protein classification with
multiple algorithms. In: Advances in informatics: 10th panhellenic conference on