Garbage Content Estimation Using Internet of Things and Machine Learning
Garbage Content Estimation Using Internet of Things and Machine Learning
ABSTRACT Much garbage is produced daily in homes due to living activities, including cooking and eating.
The garbage must be adequately managed for human well-being and environmental protection. Although
the existing IoT-based smart garbage systems have gained high garbage classification accuracy, they still
have a problem that they provide a small number of garbage categories, not enough for reasonable practices
of household garbage separation. This study presents a new smart garbage bin system, SGBS, embedded
with multiple sensors to solve the problem. We deployed temperature, humidity, and gas sensors to know the
condition and identify the garbage content disposed of. Then, we introduce a new garbage content estimation
method by training a machine learning model using daily collected fuse sensor readings combined with
detailed household garbage contents annotations to perform garbage classification tasks. For evaluation,
we deployed the designed SGBS in five households over one month. As a result, we confirmed that the
leave-one-house cross-validation results showed an accuracy of 91% in 5 kitchen waste contents, also, 89%
in 5 paper/softbox contents, and 85% in the 8 garbage categories for the classification tasks.
INDEX TERMS IoT-based smart garbage system, garbage content estimation, machine learning algorithms.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
13000 VOLUME 11, 2023
E. Likotiko et al.: Garbage Content Estimation Using Internet of Things and Machine Learning
management [5], [6], [7], [8]. However, the existing sys- of in households, built with data-efficiency machines
tems have the following problems: first, they can not learn learning classifiers with satisfactory relative accuracy.
the amount of garbage disposed of each time; second, they The remainder of this paper is as follows: Section II pro-
provide a small number of garbage categories, not enough vides an overview of related work from the recent work on
for reasonable practices of household garbage separation; garbage classification using the image and Deep learning
and third, they can not understand the routine behaviour of models also Municipal garbage separation rules. Section III
garbage disposal by households. describes the materials and tools used in the study, including
In our previous study [9], we addressed the first problem systems design and development details. Section IV presents
by proposing a smart garbage bin system with ToF and weight the experiment, data collection and pre-processing data pro-
sensors and the ARIMA model based garbage growth predic- cedures. Section V introduces the garbage content estimation
tion method. In this paper, we focus on solving the second model and the step by steps process of building the model
and the third problems, we propose a newly designed and using a machine learning algorithms. Finally, Section VI
developed smart garbage bin system (SGBS) embedded with discusses results from the classification tasks and compares
multiple sensors to identify the garbage contents disposed of. our approach with literature works, whereas, in Section VII,
The SGBS architecture comprised two subsystems. we conclude our paper.
The first subsystem is the smart garbage bin (SGB),
embedded with DHT22 (temperature and humidity) and
II. RELATED WORK
MQ135 gas sensors to know the conditions and identify
the disposed garbage content since garbage contents have This section gives an overview of related work from two
different shapes and moisture. Therefore, the type of garbage different perspectives. First, we provide an overview of the
content affects the humidity and air quality found in the smart separation and disposal of garbage with an emphasis on
bin. Also, the SGB is embedded with ToF (time of flight) and municipals in Japan, where this study was conducted. Sec-
load cell sensors to detect the new garbage content disposed ondly, we discuss recent work on garbage classification from
of each time. Then, data are updated and stored in the cloud images using deep learning to recall existing approaches to
via a Wi-Fi gateway. assess it. Thirdly we briefly discuss our preliminary study.
The second subsystem is a garbage annotation mobile
application (GAA). The GAA interface consists of 8 garbage A. SEPARATION AND DISPOSAL OF GARBAGE IN JAPAN
categories and 25 garbage content identities, providing an Garbage separation has been a major challenge across devel-
easy way for household users to annotate garbage content oping countries than in developed countries where there are
they dispose of daily using a handy smartphone. various collection systems for house-separated garbage, such
We conducted experiments where the SGBS was deployed as in Sweden and Germany [10], China [11], and Japan [12].
in five houses of heterogeneous characteristics to examine the While in other developed countries, garbage separation is
impact. As a result, the household user daily uses the installed often classified into three categories: recyclable, household,
smart garbage bin system and annotates their garbage con- and vegetation garbage. In Japan, the garbage separation and
tents, which they dispose of in smart garbage bins. Therefore, disposal system is different and complex. The rules for the
information about identified garbage and produced amounts separation and disposal of garbage depend on the particular
were continuously monitored and collected in the garbage log local municipality, whereby each city in Japan provides a
for each household. To perform garbage classification tasks, well-documented pamphlet explaining the garbage disposal
we introduce a new garbage content estimation method by rules. In general, garbage is divided into four categories:
training a machine learning model using daily collected fuse Burnable garbage (Kitchen waste, paper scraps, clothing,
sensor readings combined with detailed household garbage etc.), non-burnable garbage (Metal, glass, ceramics and pot-
contents annotations. As a result, we confirmed that the leave- tery, etc.), recyclable (Plastic bottles, container jars, cans,
one-house-out cross-validation results showed an accuracy of newspapers, etc.), and oversized (Large furniture, etc.) [12].
91% in 5 kitchen waste contents, also, 89% in 5 paper/softbox Therefore, each municipality uses such a general garbage
contents, and 85% in the 8 garbage categories for the clas- division to classify garbage for their residents. Table 1 pro-
sification tasks. In summary, the contributions of this work vides an overview of the division of burnable garbage con-
are: tent in four cities in Japan: Kashihara [13], Ikoma [14],
1) Identification of garbage content and understanding Nara [15] and Kyoto [16]. Apart from garbage descriptions
household garbage disposal behaviour for influencing from the municipal pamphlets, residents use designated plas-
family’s behaviour change in the garbage disposal and tic garbage bags of up to 45 litres to dispose of garbage.
increase home monitoring. Moreover, garbage collection for each category of garbage is
2) The provision of more satisfactory garbage content set by the municipal for instance, Mondays and Thursdays
categories for the reasonable practice of separating in Ikoma city [14] are used for the collection of burnable
garbage in the household. garbage only. The above facts show that families in Japan play
3) Providing and discussing a new garbage content esti- a hand role in their municipal rules for garbage separation
mation model based on daily garbage contents disposed and disposal systems. However, the failure of households
TABLE 1. Overview of burnable garbage separation in Japan. Besides, there is still a shortage of publicly available garbage
image datasets and an information gap in their experimental
procedures.
Furthermore, Wang et al. [7] revealed garbage sorting and
classification at the source, the beginning of garbage collec-
tion while utilizing the combined method of IoT and CNN.
The study used experimental data available in the Trash-
net [23] dataset, merged with other datasets thus, resulted
in nine categories of garbage (Kitchen waste, other waste,
hazardous waste, plastic, glass, paper or cardboard, metal,
fabric and other recyclable waste). In addition, the study
developed an intelligent bin embedded with ultrasonic sen-
sors, MQ9, and MQ135 gas sensors to monitor the garbage’s
running state in the bin. Finally, the CNN model was
deployed in mobile phones and cloud computing servers
for garbage classification. The system required citizens to
take pictures of garbage using their mobile phones and send
them to a cloud server to run the deep-learning algorithm
to recognize categories. Despite the high-performance accu-
racies of 92.44% and 92.00% achieved by Xception and
to sort the garbage renders the whole system useless [7]. MobileNetV3 models on classifying nine types of garbages,
Therefore, automation tools are necessary to monitor daily the author presented more generalizable garbage categories
family garbage disposal and improve garbage separation and that need to be improved for proper household garbage
management. separation.
Besides, a distributed architecture for smart recycling using
B. GARBAGE CLASSIFICATION FROM IMAGES WITH DEEP machine learning was realized by Ziouzios et al. [6] as a solu-
LEARNING MODELS tion for garbage classification in collection facilities to solve
A possible solution to overcome the existing challenges in the problem of non-segregated garbage, which exists more
household garbage separation and management is to adopt in developing and developed countries. The Trashnet [23]
sustainable automation tools to improve garbage separation. dataset was used for training the models by utilizing computa-
Presently, several works have been devoted to the automa- tion offloading to the cloud. The CNN architecture classified
tion and detection of garbage from images, which has now the garbage materials into five categories: paper, glass, plas-
become a popular choice to replace manual garbage sep- tic, metal, carton, and trash. Similarly, Sami et al. [24] used
aration while taking advantage of the rapid advances in the Trashnet [23] dataset to automate the garbage classifica-
computer vision and artificial intelligence. Various stan- tion problem into six classes: glass, paper, metal, cardboard,
dard CNN architectures have been recently proposed to and trash using a Support Vector Machine, Random Forest,
perform image classification tasks with high accuracies, Decision tree, and CNN to find the optimal algorithm that best
such as VGGNet [17], AlexNet [18], ResNet [19] and fits garbage classification solution. However, the available
DenseNet [20]. public garbage image datasets need more classes of garbage
Nnamoko et al. [5] investigated the problem of manual categories for proper garbage classification. Therefore, the
household garbage separation into two categories, namely, garbage categories presented in both studies [6], [24] are not
organic and recyclable. Experiments presented in this paper practical for household garbage separation and for improving
were conducted with Sekar’s waste classification image the garbage management systems.
dataset available in the Kaggle library [21]. Later, a bespoke Despite the high accuracies achieved by the existing solu-
5-layer CNN architecture was used to perform image clas- tions on garbage classification through the automation and
sification tasks. In this work, the training was conducted detection of garbage from images by the deep learning mod-
on two datasets, smaller model (80 × 45 pixels) and a els, they still have problems: (Problem 1) They can not learn
larger model (225 × 264 pixels), for performance com- the amount of garbage disposed of each time; (Problem 2)
parison, thus obtaining similar cross-validation accuracy of They provide a small number of garbage categories, not
79%. Likewise, Mookkaiah et al. [22] proposed a model to enough for reasonable practices of household garbage sep-
identify and classify two types of garbage, biodegradable aration; (Problem 3) They can not understand households’
and non-biodegradable. First, the images were collected in routine behaviour of garbage disposal. Therefore, to the best
the respective garbage bin by Raspberry Pi Camera Mod- of our knowledge, an automation tool that can learn and
ule v2. Then garbage classification task was done by CNN identify the daily garbage content disposed of in homes and
architecture. However, separating garbage into two categories perform classification tasks, as investigated throughout this
is insufficient for logical household garbage separation. work, has yet to be considered.
C. CLASS IMBALANCE
A lower frequency of disposing of a particular type of
garbage content than the others experienced in all houses
of children, and city as the criteria for selecting participants leads to a minority of such garbage content. Therefore, the
for the experiment. Table 3 outlines the participant’s infor- minority class labels affect the model-building process, i.e.,
mation. All participants were well informed about the exper- a model that always chooses the majority class regardless
iment and provided their own consent to participate in the of the corresponding feature. To solve this, we utilize the
experiment. In addition, smart garbage bins were distributed resampling technique to enhance the classifier model’s size
and installed in each house. Fig. 2 shows the overview of the and quality and avoid biases class during training. There are
deployed SGBS. two main approaches for random resampling: Oversampling,
which duplicates the minority class, and Undersampling,
B. DATASETS which deletes the majority class. In our case, due to the low
The experiment resulted in five garbage logs data from number of annotations in garbage category 4 (Fabric/textile),
the five households. The garbage log consists of data from garbage category 5 (Plastic), garbage category 6 (Dust), and
the SGB (i.e., timestamp, filling level, weight, temperature, garbage category 7 (Plant) experience in all five houses (see
humidity, and air quality), collected every one-minute inter- Table 4), we applied the Oversampling technique to increase
val. Also, data from the GAA (i.e., timestamp, garbage cat- the minority class using the imbalanced-learn sci-kit-learn
egories, and content identities) collected only when a user library. Table 5 and Table 6 show the total number of datasets
disposes of and annotates the garbage in a smart garbage of garbage categories and content identities before and after
bin. The frequency of garbage disposal and annotation of resampling.
garbage contents differ in each household due to household
characteristics. Table 4 details the full annotations of garbage V. GARBAGE CONTENT ESTIMATION MODEL
contents found in houses 1 to 5 by the household users during This study aims to identify garbage contents disposed of
the experiment. Therefore, we define the following rules to and perform the garbage classification from garbage con-
merge the multiple sensor data from the smart garbage bin tents disposed of daily in the household by adopting IoT
(as features) and garbage content annotations by the house- and data-efficient machine learning algorithms. Therefore
holds (as labels) to create a single dataset of each house. we present a garbage content estimation model to classify
TABLE 5. Re-sampling and cross-validation split for the 8 garbage first, we utilize repeated k-fold cross-validation to evaluate
categories.
the machine learning models in steps 1 and step 2 (see Fig. 4).
Then, we averaged the results with 4-fold cross-validations to
compute the final validation score for each investigated model
configuration. Therefore, the model created in step 1 used
the original (unbalanced) datasets, i.e., before resampling
(see Table 5). While the model developed in step 2 used the
balanced class dataset, i.e., after resampling (see Table 6),
as discussed in Section IV-C. Thus, for performance compari-
son of balanced and unbalanced datasets, our model-building
process output two models, an unbalanced model and a bal-
TABLE 6. Re-sampling and cross-validation split for the 25 garbage
content identities. anced model (see Fig. 4).
Afterwards, for better comparison reasons of the
cross-validation methods applied to the classifiers, and,
in order to increase the training set, in step 3 (see Fig. 4),
we changed the cross-validation method to leave one house
out cross-validation method where we repeatedly trained our
models with total balanced datasets from the four houses
and testing the model with the remaining one house. Thus,
we obtained the Leave one house out model.
Furthermore, we built the overall result models in step 4
(see Fig. 4) of the classification tasks for both class garbage
8 categories of garbage and a total of 25 garbage contents categories and content identities for each house to investigate
identities relating to a particular category, as demonstrated in the overall performance of the classifiers. We first made the
Fig. 3 of the garbage annotation application. The subsequent overall result model on all 8 garbage categories, i.e. Kitchen
section details the process of building classification models. waste, Meal garbage, Paper/softbox, Fabric/textile, Plastic,
Dust, Plant, and All others found in House 1, House 2,
A. MODEL BUILDING House 3, House 4 and House 5. Nonetheless, because each
Fig. 4 demonstrates model building steps and order of oper- garbage category comprises 5 to 2 specific garbage content
ations.Below we give a details explanation of the importance identities (see Fig. 3), in total, there are 25 different garbage
of each model-building step. We performed the classification content identities belonging to the eight categories expected
tasks from daily collected fuse sensor readings combined to be annotated by the users daily using the garbage annota-
with detailed household garbage contents annotations intend- tions application. Therefore because of the majority number
ing to find the class (i.e., 8 garbage categories: Kitchen waste, of garbage content identities and differences in frequency
Meal garbage, Paper/softbox, Fabric/textile, Plastic, Dust, behaviour of garbage disposal and annotation exhibited from
Plant, and All others) and (i.e., 25 garbage content identi- each house (see Table 4). In this study, we first selected
ties: Food garbage, Edible food, Sink basin, Kitchen waste the five garbage content identities from the Kitchen waste
bag, Unclean cup, Unclean container, Unclean packages, (category 1) as it has had a higher frequency of annotation in
Waste wood, Tissues, Mixed Papers, Milk/Juice box, Masks, house 3, house 4 and house 5. Also, we chose the five garbage
Clothes, Shoe, bag, Rubber products, Disposable diapers, content identities from the paper/softbox (category 3) as it has
Plastic product, Toys, CD, Cigarette ashes/stick, Vacuum had a higher frequency of annotation in house 1 and house 2 to
cleaner, Plant and Others) to which a new unseen observation learn the performance of the classifiers on garbage content
belongs. During the model-building steps in Fig. 4, we only identities. Therefore, to this point of the study, we created
consider utilizing data-efficient methods, namely: Random three overall result models for garbage content estimation,
forest, Naive Bayes, Extreme Gradient Boosting (Xgboost), namely;
and Decision tree algorithms to build the garbage content esti- 1) Overall result model for general garbage categories
mation model, for the reasons such as the comparison of the 2) Overall result model for kitchen waste contents identi-
machine learning classifiers, the small number of available ties
datasets, the popularity of the classifier and data preprocess- 3) Overall result model for paper, softbox contents identi-
ing to avoid minority class labels. We eventually defined the ties
order of operations applied to the selected classifiers during
the model-building steps.
More precisely, we train and test by spliting the dataset of B. PERFORMANCE EVALUATION
each house into four (4) chucks of 25% equal size dataset as Our model evaluation performance is based on accuracy,
shown in the Table 5 and Table 6 for garbage categories and which is the percentage of correct comparison classifica-
content identities. To avoid overfitting as much as possible, tions. Moreover, we evaluate the performance of our models
TABLE 8. Leave one house cross-validation performance accuracy for the C. RESULTS
8 garbage categories.
Throughout this subsection, we describe results obtained
from the classification tasks as detailed in Section V-B.
Specifically, we look into and compare the performance accu-
racy from the unbalanced, balanced, leave one house out,
and overall result models using the four machine learning
classifiers.
1) UNBALANCED MODEL
We see from the results of the unbalanced model (see Table 7)
and (see Table 9) using the 4-fold cross-validations that
using other metrics, such as Confusion matrices, Precision, Random forest performs slightly better than other classifiers
Recall and F1-score. We will especially give the most infor- (Naive Bayes, Xgboost, and Decision tree), for classification
mative metrics for the overall result models because they tasks of both garbage categories and garbage content identi-
aggregated the garbage class label results from all houses ties. For garbage categories, the highest accuracy was 90%
belonging to the same classification and averaged the result obtained in house 1, and the 67% lowest accuracy resulted
into a single metric measurement. Furthermore, the model from the Decision tree in the same house. Also, 93% for
parameters tuning was applied on all classifiers, Random garbage content identities was the highest accuracy found in
forest, Naive Bayes, Extreme Gradient Boosting (Xgboost), house 1 by Random forest, and the lowest accuracy was 80%
and Decision tree. As a result, the accuracy slightly increased by the Decision tree found in house 4.
by increasing the number of parameters such as estimators,
criterion, and random state for each model separately. There- 2) BALANCED MODEL
fore, we independently investigated the model performance Afterwards, we compared the four classifiers with the same
on all experimental datasets found in House 1, House 2, 4-fold cross-validations method in all five houses on a bal-
House 3, House 4, and House 5 on garbage categories and anced dataset with the approaches discussed in Section IV-C
garbage content identities classification tasks. The percentage to deal with the unequal class balance. The results can be seen
performance accuracy results using 4-fold cross-validation in Table 7 and Table 9. We observed that the performance
and leave-one-house-out cross-validation as applied to the accuracy slightly decreased compared with the unbalanced
four machine learning classifiers for the 8 garbage categories model performance. Yet, Random forest manifested the high-
and 25 garbage identities are summarized in Table 7, Table 8, est accuracy and thus outperformed the rest of the classifiers.
Table 9, and Table 10. For the garbage categories, the Random forest exhibited 86%
TABLE 9. 4-fold cross-validation performance accuracy for the 25 garbage content identities.
TABLE 10. Leave one house cross-validation performance accuracy for of garbage categories, (2) Overall result model of kitchen
the 25 garbage content identities.
waste contents identities and (3) Overall result model of
Paper/softbox contents identities. The performance accuracy
results for the three models are shown in Table 11. Moreover,
we compared the Recall, Precision, and F1-score for the
overall result models as they can better judge the performance
by showing the metric measurements of each class label.
For the garbage categories overall result model (see
Table 11), Random forest achieved the highest accuracy of
85%, followed by Naive Bayes at 82% and Xgboost at 80%,
in house 3, and 63% by the Decision tree in house 2 was while the decision tree lags with the least accuracy of 64%.
the lowest accuracy. While for garbage content identities, Table 12 summarises the metric accuracies of the 8 garbage
the accuracy was 88% by Random forest from house 1 and categories overall result model with Recall, Precision, and
house 2, and the most insufficient accuracy was 62% by a F1-score using the Random forest classifier.
decision tree in house 5. Further, for the overall result model of kitchen waste con-
tents identities (see Table 11) (i.e., food garbage, edible food,
3) LEAVE ONE HOUSE OUT MODEL sink basin, kitchen waste bag, and others). The Random
In the next step, we compare the results of the repeated 4-fold forest has steadily revealed the best classification accuracy of
cross-validation in step 2 to the Leave one house out (LoH) 91%, while the accuracies of the rest of the models are; 88%
cross-validation approaches in step 3 (see Fig. 4). In order to Naive Bayes, 84% Xgboost and 76% Decision tree. Likewise,
investigate the classification performance in all five houses. the overall result model of the paper/softbox contents identi-
Therefore, we applied the LoH on the balanced class datasets ties (see Table 11) (i.e., tissues, mixed papers, milk/juice box,
using the four classifiers in step 3. However, we maintained masks, and others) are 85% Naive Bayes, 83% Xgboost and
the same order of operation as in step 2. With this approach, 71% Decision tree were outperformed by the Random forest
the sum of four houses increases the size of the training at 89%. The summary of the Recall, Precision, and F1-score
set during repeated testing with only one house dataset. The for the overall result models of the 5 kitchen waste and the
results for Random forest, Naive Bayes, XGBoost, and Deci- 5 paper/softbox content identities are shown in Table 13 and
sion tree in the case of the garbage categories and garbage Table 14, using the Random forest as it has been portrayed as
content identities for all four classifier sets are shown in the best classifier.
Table 8 and Table 10. We see an apparent accuracy increase The aggregated confusion matrix plots using the Random
in each house compared to the balanced model of 4-fold forest of each overall result model are shown in Fig. 6,
cross-validation in Table 7 and Table 9. For the garbage where the columns represent the actual values (Truth) of the
categories, the Random forest revealed the highest accuracy target class label. The rows represent the predicted values
of 88% in house 3, while the decision tree showed the lowest (Predicted) of the target variable class label. The number of
accuracy of 57% in house 1. In addition, garbage content validation samples that were correctly classified are demon-
identities in the leave one house out model achieved the strated in the diagonal cells, and that were incorrectly classi-
highest accuracy of 91% and 90% by Random forest in house fied are demonstrated in the off-diagonal cells.
1 and house 2, respectively. On the other hand, the Decision In addition, to investigate the impact of the collected
tree exhibited unsatisfactory performance, 65% in house 5. multiple sensor readings on the garbage content estima-
Moreover, Random forest again steadily outperformed the tion model, we applied the features importance method
rest of the classifiers. using the Random forest classifier as our chosen classi-
fier for the garbage content estimation model. The results
4) OVERALL RESULT MODEL in Fig. 5 show that air quality, humidity, temperature,
To realize the performance of the three overall result mod- and fill level values are more relevant features for iden-
els described in Section V-A above Overall result model tifying garbage content in the smart bin. Therefore, the
TABLE 13. Summary of 5 Kitchen waste contents identities overall result the highest accuracy is between 85% and 91%, and the low-
model.
est is 64%, which is satisfactory for garbage content clas-
sification tasks. However, the lowest amount of annotation
on certain class (imbalance) labels makes the classification
task difficult. We start the detailed discussion by compar-
ing garbage annotations from each house and then classifi-
cation tasks by the machine learning algorithms, followed
by the usefulness of the garbage content estimation model.
Finally, we look at the comparison of our approach to the
literature.
TABLE 14. Summary of 5 Paper/softbox contents identities overall result
model. A. COMPARISON OF HOUSEHOLD GARBAGE DISPOSAL
ANNOTATION AND CLASSIFICATION
In general, we observed different behaviour of garbage dis-
posal in all five houses, which is due to the heterogeneity
behaviour in each family, such as living style, size of the fam-
ily, type of the family, number of children/infants, age group,
and city. In this case, the study observed differences in the
routine frequency of garbage disposal and the type of garbage
content disposed among the houses. Therefore, using the
smooth garbage annotation interface (see Fig. 3) that allowed
identified garbage content disposed of daily and annota- household users to annotate garbage contents during disposal,
tion procedures contributes to the garbage classification the study found that certain garbage contents were important
tasks. Furthermore, the cross-validation approaches provided in some houses, i.e., daily disposed and annotated, com-
satisfactory results, especially for the leave-one-house-out pared to others. Table 4 shows the annotation frequency of
cross-validation, which performed better than the 4-fold garbage category disposal among houses, as briefly detailed
cross-validation. below.
• House 1: as shown in Table 3, this house consists of
VI. DISCUSSION a married couple in Kyoto prefecture. In this house,
Throughout this section, we discuss our findings and possible garbage category 3 (Paper/softbox) was the most impor-
implications. Due to the sufficient classification outcomes, tant category compared to other categories annotated
we chose the Random forest algorithm as the best classifier. 374 times during the experiment (see Table 4). In com-
We also decided on the overall result models as the final parison, garbage category 5, which consisted of plastic
model for our garbage content estimation tasks. Generally, contents, appeared as the least important annotated only
5 times. In addition, other categories had almost a similar annotated only once each. Moreover, category 7 (Dust)
frequency of annotation, such as Kitchen waste (78), was not annotated in this house.
Meal garbage (66), All others (74), and Dust (50). On the • House 5: This house comprises a young married couple
other hand, fabric/textile had 21 annotations, while the with an infant in Ikoma city (see Table 3). Contrary to
plant had 19 annotations. all other houses, the study observed a fewer annotation
• House 2: consists of a married couple with two chil- frequency of garbage category 3 (Paper/softbox), which
dren living in Nara city (see Table 3). Like in house 1 prevailed in houses 1, 2, 3, and 4 as the most important
(see Table 4), garbage category 3 (Paper/softbox) was garbage category (see Table 4). Instead, kitchen waste
the most important category in this house, annotated was the most important category in this house, with
200 during the experiment, and Category 5 (Plastic) was 152 annotations, followed by Meal garbage (135) and
the least annotated, only 4 times. Compared with other Fabric/textile (77) third in the ranks. The high anno-
categories, Kitchen waste had 37 annotations, Meal tation frequency of category 4 (Fabric/textile) was due
garbage 63, All others 24, Fabric/textile had 16, dust to the disposal frequency of disposable diapers (the
11, and Plant 9. House 2 had fewer annotations than fourth garbage content in the Fabric/textile category
house 1. 4 see Fig. 3) thus increasing the number of fabric/textile.
• House 3: as shown in Table 3, this house comprises On the other hand, Plant category 7 was annotated only
a young married couple in Ikoma city. Even though once and therefore appeared as a minor category, similar
garbage category 3 (Paper/softbox) is steady as the most to house 3. Plastic had 9 annotations, and dust had
important and Plastic as the minor category observed in 6 annotations.
houses 1 and house 2, in this house, the study observed Eventually, daily disposed garbage contents and detailed
a slight difference in annotation frequency exhibited garbage annotation frequency by households impacted the
among Kitchen waste, Meal garbage, and Paper/softbox classification tasks in each house. For instance, in Random
categories. The result in Table 4 shows that the annota- forests, the chosen classifier for this study (see Table 7)
tions frequency kept, such as Paper/softbox (183), was and (see Table 9), the accuracies for classification tasks of
the most important, followed by Meal garbage (125), both garbage category and content identities in house 1 were
and Kitchen waste (104) was the third in the garbage higher than in house 4, which had fewer annotations frequen-
category importance ranking. cies. Moreover, the study found that the Decision tree was
• House 4: While Houses 1, 2, 3, and 5 comprise married the insufficient classifier model compared to Random forest,
couples, house 4 consists of two singles living in a shared Naive Bayes, Xgboost applied on the datasets in all five
house in Ikoma city (see Table 3). The study observed houses. Over and above that, the leave-one-house-out cross-
less annotations frequency in this house than in other validation method showed better performance compared to
houses. However, similar to houses 1, 2, and 3, garbage the 4-fold cross-validation approach despite its computational
category 3 (Paper/softbox) had the highest annotation cost (see Table 8 and Table 10). Therefore, in the overall
frequency and ranked as the most important, while the result models, we aggregated the classification result of the
plastic was minor. Therefore, the annotation frequency same class label into one metric performance using the leave-
in Table 4 is as follows: Paper/softbox had 61 annota- one-house-out approach, which has manifested better perfor-
tions, followed by Kitchen waste (23) and Meal garbage mance than 4-fold cross-validation on the balanced model.
(11), which similarly ranks with house 3. In addition, The following section compares our approaches with the
not only Plastic was the minor but also dust which was literature.