Segmentation 2
Segmentation 2
Segmentation 2
Article
Automatic Recognition of Indoor Fire and Combustible Material
with Material-Auxiliary Fire Dataset
Feifei Hou , Wenqing Zhao and Xinyu Fan *
School of Automation, Central South University, Changsha 410083, China; [email protected] (F.H.);
[email protected] (W.Z.)
* Correspondence: [email protected]
Abstract: Early and timely fire detection within enclosed spaces notably diminishes the response
time for emergency aid. Previous methods have mostly focused on singularly detecting either fire
or combustible materials, rarely integrating both aspects, leading to a lack of a comprehensive
understanding of indoor fire scenarios. Moreover, traditional fire load assessment methods such as
empirical formula-based assessment are time-consuming and face challenges in diverse scenarios.
In this paper, we collected a novel dataset of fire and materials, the Material-Auxiliary Fire Dataset
(MAFD), and combined this dataset with deep learning to achieve both fire and material recognition
and segmentation in the indoor scene. A sophisticated deep learning model, Dual Attention Network
(DANet), was specifically designed for image semantic segmentation to recognize fire and combustible
material. The experimental analysis of our MAFD database demonstrated that our approach achieved
an accuracy of 84.26% and outperformed the prevalent methods (e.g., PSPNet, CCNet, FCN, ISANet,
OCRNet), making a significant contribution to fire safety technology and enhancing the capacity to
identify potential hazards indoors.
Keywords: fire detection; combustible material recognition; deep learning; indoor fire scene;
semantic segmentation
MSC: 68T45
Figure 1.
Figure 1. Indoor fire
fire and
and combustible
combustible material
material recognition
recognition framework.
framework.
The subsequent sections of this paper are organized as follows. Section 2 provides an
overview of the literature concerning fire detection and the recognition of material. Our
Mathematics 2024, 12, 54 3 of 16
deep learning framework for fire and combustible material segmentation within indoor
scenes is detailed in Section 3. Section 4 introduces the development of our dataset, the
MAFD, and presents extensive experiments with it. Section 5 concludes the paper by
summarizing key points and proposing future research directions.
2. Literature Review
With the continuous advancement of technology, there is growing attention toward
developing efficient and reliable methods to identify fire and smoke. Numerous compre-
hensive reviews and surveys have been conducted within the realm of fire and smoke
detection. Among them, the methods utilized can be categorized into two main groups:
traditional methods and deep learning-based approaches.
Traditional methods are usually based on image processing algorithms such as edge
detection, morphological processing, and threshold segmentation. Wu et al. [6] used camera
sensors for fire smoke detection, extracting static and dynamic features, and achieving
strong results with AdaBoost. Russo et al. [7] proposed a method for the smoke detection
of surveillance cameras based on local binary pattern (LBP) and support vector machine
(SVM). Wang et al. [8] proposed a rapid smoke detection method using slope fitting in
video image histogram, addressing false alarms in early fire smoke detection. Cao et al. [9]
proposed patchwise dictionary learning within the wavelet domain to detect smoke in
forest fire videos. Their method aims to distinguish fire smoke from other objects in the
forest that share a similar visual grayscale appearance. Fire smoke can be distinguished
from other challenging objects in the forest with a similar visual grayscale appearance.
Gagliardi et al. [10] introduced video-based smoke detection technology using techniques
like the Kalman estimator, blob labeling, and decision-making processes. Hossain et al. [11]
introduced a novel technique for forest fire detection that relied on fire-specific color
features and the multi-color space local binary pattern to identify distinct attributes of
flames and smoke. They also employed support vector machines as classifiers. Their results
showed that this method has a higher performance compared to other color or texture-
based methods. However, these traditional methods typically require manual parameter
selection and adjustment and have poor adaptability to various smoke densities and color
variations, which also suffer from several drawbacks including a high rate of false alarms
(FAR), restricted accuracy, and a reduced detection range.
In contrast, deep learning methods, by utilizing learned features to identify and seg-
ment fire and smoke patterns and adapting to various smoke conditions, have introduced
a novel research avenue for addressing early fire detection challenges. Jia et al. [12] utilized
domain knowledge and transfer learning from deep convolutional neural networks (CNN)
for video smoke detection and reduced the false positive rate of the video smoke detec-
tion (VSD) systems to some extent. However, low-level features were not utilized. Peng
et al. [13] combined manually crafted features with deep learning features. They utilized an
algorithm designed manually to extract areas suspected to contain smoke, which were then
processed using an enhanced SqueezeNet deep neural network for smoke detection. Cheng
et al. [14] employed Deeplabv3+ and conditional random fields for accurate segmentation,
established smoke thickness heatmaps and predicted smoke trends with generative adver-
sarial networks, contributing to fire protection and evacuation planning. To address issues
in video-based smoke detection, Yuan et al. [15] introduced a deep smoke segmentation
network designed to derive precise segmentation masks from unclear smoke images. Lin
et al. [16] devised an integrated detection framework by combining a faster Region-CNN
(RCNN) and 3D CNN, enhancing video smoke detection by maximizing the utilization
of temporal information within video sequences. Li et al. [17] introduced an adaptive
linear feature-reuse network (ALFRNet) for rapid forest fire smoke detection, effectively
reducing information loss and interference caused by image blurring during the smoke
image acquisition process. Liu et al. [18] introduced a smoke detection approach using
an ensemble of simple deep CNNs by capturing diverse smoke aspects and aggregating
subnetwork responses via majority voting, outperforming existing methods on newly
Mathematics 2024, 12, 54 4 of 16
established noisy smoke image datasets. To meet the needs of complex aerial forest fire
smoke detection tasks, Zhan et al. [19] proposed an adjacent layer composite network based
on a recursive feature pyramid with deconvolution and dilated convolution and global
optimal non-maximum suppression (ARGNet) for the high-accuracy detection of forest fire
smoke. Hu et al. [20] proposed a novel method for early forest fire smoke detection called
multi-oriented detection. This method integrated a value conversion-attention mechanism
module and Mixed-Non-Maximum Suppression (Mixed-NMS) to overcome common mis-
detection and missed detection issues, elevating target detection accuracy. To change the
fact that the majority of current computer vision-based fire detection methods can only
identify either flames or smoke, Hosseini et al. [21] introduced a unified flame and smoke
detection method, named UFS-Net, which can identify potential fire risks by categorizing
video frames into eight distinct classes. Khan et al. [22] proposed an energy-efficient system
based on VGG-16 architecture for early smoke detection in both normal and foggy IoT
environments. He et al. [23] also proposed a method targeting foggy environments that
combined attention mechanisms and feature-level and decision-level fusion modules. From
various perspectives including overall, individual categories, small smoke, and challenging
negative sample detection, their approach achieved higher detection accuracy, precision,
recall, and F1 scores. To meet the requirements of smoke detection within an industrial
environment, Muhammad et al. [24] proposed an energy-friendly edge intelligence-assisted
method for smoke detection in foggy surveillance environments using deep CNN.
The concept of fire load is of utmost importance in fire safety and building resilience.
Many combustible materials used indoors are one of the main causes of fires. Numerous
studies have focused on material recognition. Strese et al. [25] proposed a tool-mediated
surface classification method. This method combines the extracted feature information
such as sound, image, friction, and acceleration with a naive Bayesian classifier to identify
different materials. Zhang et al. [26] proposed a novel hierarchical multi-feature fusion
(HMF2) model, aiming to gather essential information and employ a classifier for training
a novel material recognition model. They tested the simplicity, effectiveness, robustness,
and efficiency of the HMF2 model on two benchmark datasets. Lee et al. [27] proposed
a material-type identification method using a deep CNN based on color and reflectance
features. The proposed method was evaluated on public datasets, showing promising
results for material type identification.
Although researchers have conducted extensive algorithmic research in the field of
material recognition and have high-quality public datasets, there is currently no algorithmic
research for complex indoor fire scenarios, nor is there a relevant public dataset. In addition,
the main limitation of these methods is the lack of the simultaneous evaluation of fire
objects and fire loads. Fortunately, our work has solved these problems. Specifically,
Section 3 provides details of the proposed methodology, while Section 4 discusses the
experimental validation.
Figure 2. Dual Attention Network (DANet) framework with vector output distribution at each stage.
Figure 2. Dual Attention Network (DANet) framework with vector output distribution at each stage.
3.2. Attention
Attention Modules
Modules for Feature Representation
3.2.1. Position Attention Module
The input
inputofofthethe
position attention
position module
attention is a feature
module is a map A, expressed
feature map A, as C × H × W,
expressed as
𝐶×𝐻×
where C 𝑊, where 𝐶the
represents number of
represents thechannels,
number while H represents
of channels, while the𝐻 represents
height of thethefeature
height
map,
of the and W represents
feature map, and the𝑊 represents
width of the thefeature
width of map. The specific
the feature map. working principle
The specific of
working
this module is shown in Figure 2, and the vector size obtained at each stage is
principle of this module is shown in Figure 2, and the vector size obtained at each stage is also marked
in Figure
also marked2. in
Since the 2.
Figure focus ofthe
Since thefocus
position
of theattention
positionmodule
attentionis to mine is
module thetosimilarity
mine the
relationship between each pixel, in order to better use the attention module,
similarity relationship between each pixel, in order to better use the attention module, the feature
the
featureB,maps
maps C, and𝐵,D𝐶,are
andreshaped to obtaintothree
𝐷 are reshaped obtainsizes C × H,
of sizes
three 𝐶 × 𝐻, N
of where = H×
where 𝑁 W.= 𝐻These
× 𝑊.
three
Thesematrices correspond
three matrices to Q, K,toand
correspond V and
𝑄, 𝐾, of the𝑉 self-attention mechanism.
of the self-attention Each step
mechanism. Eachis
described in detail
step is described inbelow:
detail below:
(1) Calculating the
(1) Calculating the similarity
similarity matrix
matrix between
between pixels,
pixels, the
the process
process is
is to
to obtain
obtain aa similarity
similarity
between pixels with a size of N × N through QT × K,
matrix between pixels with a size of 𝑁 × 𝑁 through 𝑄𝑇 × 𝐾, that is, the (𝑁 ×(N
matrix that is, the 𝐶)× C)
ma-
matrix multiplied by the (C × N)
trix multiplied by the (𝐶 × 𝑁) matrix;matrix;
(2) Perform aa softmax
(2) Perform softmax operation
operation on
on the
the similarity
similarity matrix
matrix to
to obtain
obtain each
each relative
relative factor
factor
that affects the pixel;
that affects the pixel;
(3) Multiply the similarity matrix S after softmax with the V matrix, that is, multiply
(3) Multiply the similarity matrix S after softmax with the V matrix, that is, multiply the
the (C × N) matrix by the (N × N) matrix, and finally obtain the recoded feature
(𝐶 × 𝑁) matrix by the (𝑁 × 𝑁) matrix, and finally obtain the recoded feature repre-
representation, and its size is also C × N, where the generation formula of S is shown
sentation, and its size is also 𝐶 × 𝑁, where the generation formula of 𝑆 is shown in
in Equation (1). The purpose of multiplying the original matrix by the similarity matrix
Equation (1). The purpose of multiplying the original matrix by the similarity matrix
is to amplify the influence of pixels that are similar to it and reduce the influence of
pixels that are not similar to it, which can also be called a re-encoding operation;
(4) Perform the reshape operation on the finally obtained new feature matrix to obtain a
recoded feature map with a size of C × H × W;
(5) Add the feature map to the features extracted from the upper network to obtain the
output E of the final position attention module, whose size is still C × H × W, where
Mathematics 2024, 12, 54 6 of 16
the generation formula of E is shown in Equation (2). The scaling factor α initially
begins at 0 and gradually adjusts to attain higher weights.
exp( Bi Cj
S ji = (1)
∑iN=1 exp( Bi Cj
N
Ej = α ∑ s ji Di + A j
(2)
i =1
C
Ej = β ∑ x ji Ai + A j
(4)
i =1
conv
norm_layer
maxpool
layer1 Bottleneck
Bottleneck
Bottleneck
d=1
Figure3.3.Dilated
Figure DilatedResNet50
ResNet50backbone.
backbone.
Transferlearning
Transfer learningaims
aimstotoutilize
utilizepreviously
previouslyacquired
acquiredknowledge
knowledgeto toefficiently
efficientlysolvesolve
new but
new but similar
similar problems. Unlike
Unliketraditional
traditionalmachine
machinelearning
learningmethods,
methods, it capitalizes
it capitalizeson
on knowledge
knowledge gathered
gathered fromfrom auxiliary
auxiliary domains’
domainsʹ data
data to to enhance
enhance predictive
predictive modeling modeling
for dis-
for disparate
parate data patterns
data patterns withinwithin the present
the present domain. domain. The fundamental
The fundamental idea of transfer
idea of transfer learning
learning is to extract
is to extract the knowledge
the knowledge from from a previous
a previous or or source
source taskand
task andapply
apply the extracted
extracted
knowledge
knowledgeto toaanew/target
new/target task.
task. A conceptual
conceptual metaphor
metaphor is is that
thatititwill
willbe
beeasier
easierforforaachild
child
to
tolearn
learnhowhowto torecognize
recognizepeaches
peachesififthey
theyhave
havealready
alreadylearned
learnedhow howto torecognize
recognizeapplesapples
and
andpears.
pears.
We
Weemployed
employedthe thetransfer
transferlearning
learningto toreduce
reducethethetraining
trainingdifficulty
difficultyfor forour
ourrelatively
relatively
small dataset as well as enhance performance. Transfer learning
small dataset as well as enhance performance. Transfer learning shows promise shows promise in minimiz-
in mini-
ing the dependence
mizing the dependenceon a large
on anumber
large of target domain
number of targetdata by transferring
domain data by knowledge
transferring
from diverse from
knowledge yet related
diversesource domains
yet related [30]. The
source deep [30].
domains learningThemodel was first model
deep learning pretrained
was
on a large general dataset VOC, and then trained and tested on our dataset.
first pretrained on a large general dataset VOC, and then trained and tested on our dataset.
4. Experiments and Results
4. Experiments and Results
4.1. Experimental Settings and Evaluation Metrics
4.1. Experimental Settings and Evaluation Metrics
The implementation was carried out in a Python environment (version 3.8.10, pro-
videdTheby implementation was carried
the Python Software out in Wilmington,
Foundation, a Python environment
DE, USA) (version
using the 3.8.10, pro-
PyTorch
vided by the Python Software Foundation, Wilmington, Delaware, United States)
deep learning package (PyTorch:1.10.0 + cu111, TorchVision: 0.11.0 + cu111) and a single using
the PyTorch
NVIDIA deepRTX
GeForce learning package
3060 GPU. (PyTorch:1.10.0[31]
MMSegmentation + cu111, TorchVision:PyTorch-based
is an open-source 0.11.0 + cu111)
and a single NVIDIA GeForce RTX 3060 GPU. MMSegmentation [31] is an
toolbox specifically designed for semantic segmentation tasks. It decomposes the open-source
semantic
segmentation framework into different components. By combining different modules, a cus-
tomized semantic segmentation framework can be easily built. The toolbox provides direct
support for prevalent and contemporary semantic segmentation frameworks, providing
pre-trained semantic segmentation models on various mainstream datasets. In this paper,
the semantic segmentation framework provided in MMSegmentation was utilized for both
model training and verification, employing mmcv version 2.0.0rc4 and MMSegmentation
version 1.1.2. DANet employs ResNet as the model backbone. The images in our dataset
were randomly divided into training (90%) and validation (10%) sets with a 9:1 split ratio.
The model was trained on the training dataset and tested on the validation dataset.
During training, images were resized to 512 × 512 pixels for input. The optimizer for
the three models was stochastic gradient descent (SGD) with a learning rate of
Mathematics 2024, 12, 54 8 of 16
5 × 10−4 , a momentum of 0.975 for L2 regularization, and a weight decay factor of 0.0004.
SGD was selected instead of adaptive optimization methods (e.g., AdaGrad, RMSProp,
or Adam) due to its potential to achieve a higher test accuracy, converge toward a flatter
minimum, and consequently yield improved generalization [32]. The decode head predicts
the segmentation map from the feature map using the decoding head of DAHead, and the
loss function uses CrossEntropyLoss, and the loss weight is 0.3. Auxiliary_head encourages
the backbone network to learn lower-level features that are not used for prediction. The
decoding head of FCNHead is used. The loss function uses CrossEntropyLoss and the loss
weight is 0.15. Data augmentation used during training included horizontal flipping and
random cropping. The training used the PolyLR scheduler, which reduces the learning
rate according to a polynomial function, the minimum learning rate is 1 × 10−6 , and is
scheduled according to each iteration. The maximum number of iterations of the training
loop was 40,000, and there was an interval of 10,000 loops for verification. Table 1 displays
the experimental settings.
Setting Value
Batch size 1
Crop size 512 × 512
Momentum 0.975
Initial learning rate 0.0005
Weight decay 0.0004
The Acc (accuracy) [31] refers to the proportion of accurately classified pixels to the total
number of pixels in the segmentation result. In image segmentation, we usually compare
the predicted label for each pixel with the true label, and then calculate the accuracy rate
between them. Specifically, we can count the number of pixels in the segmentation result
that are the same as the real result, and then divide them by the total number of pixels
to obtain the segmentation accuracy. The higher the Acc, the more pixels are correctly
classified in the segmentation result, and the better the segmentation performance. The
calculation formula of Acc is as follows:
∑ik=0 pii
Acc = (5)
∑ik=0 ∑kj=0 pij
Among them, i represents the real value, j represents the predicted value, Pij represents
the number of pixels that predict i as j, and k represents the total number of categories.
The mIoU (mean intersection over union) refers to the average value of the intersection
and union ratios between the segmentation results and the real segmentation results. In
image segmentation, we usually compare the predicted value with the ground truth for
each class, and then calculate the IoU between them. Specifically, for each category, we can
count the number of pixels in the segmentation result and the ground truth result, and then
calculate the intersection and union between them. We can then divide the intersection
by the union to obtain the IoU for that category. Finally, we can average the IoU of all
categories to obtain the mIoU of the whole image. The higher the mIoU, the closer the
segmentation result is to the real result, and the better the segmentation performance. The
calculation formula of mIoU is as follows:
1 k pii
k + 1 i∑
mIoU = k
(6)
=0 ∑ j=0 ij
p + ∑kj=0 p ji − pii
Among them, i represents the real value, j represents the predicted value, Pij represents
the number of pixels that predict i as j, and k represents the total number of categories.
Mathematics 2024, 12, 54 9 of 16
The mAcc (mean accuracy) refers to the average of the accuracy of each category. In
image segmentation, we usually compare the predicted value of each class with the true
value, and then calculate the accuracy between them. Specifically, for each category, we can
count the number of pixels that are correctly classified in the segmentation result and the
ground truth result, and count the number of pixels of that category in the ground truth
result. We can then divide the number of correctly classified pixels by the number of pixels
for that class to obtain the accuracy for that class. Finally, we can average the accuracies
across all classes to obtain the mAcc for the entire image. The higher the mAcc means that
each category is better recognized and distinguished, and the segmentation performance is
better. The calculation formula of mAcc is as follows:
1 k pii
k + 1 i∑
mAcc = k
(7)
=0 ∑ j=0 pij
Among them, i represents the real value, j represents the predicted value, Pij represents
the number of pixels that predict i as j, and k represents the total number of categories.
Figure 4. Sample
Sample images
images with
with annotated category and name.
Table 3. The aAcc, mIoU, and mAcc (%) of the DANet-50 and DANet-101 models trained over 20 k
and 40 k iterations.
We then further optimized the parameters of the selected DANet-101 model and added
training iterations. It was found that all indicators improved at first when the training
iterations were set to 100 k. When the training iterations were set to 100 k, the trained
Mathematics 2024, 12, 54 11 of 16
model performed best and its performance in the fire, wood, and fabric categories was
relatively average. When the training iterations continued to increase, the aAcc, mIOU, and
mAcc all decreased. Therefore, the corresponding model parameters when the training
iterations were 100 k were finally selected. Table 4 illustrates how the aAcc, mIoU, and
mAcc changed for the DANet-101 model across diverse training iterations.
Table 4. The aAcc, mIoU, and mAcc (%) of DANet-101 using the training iterations of 20 k, 40 k, 60 k,
80 k, 100 k, and 120 k.
Figure 5 showcases the performance curves over 100 k iterations. The loss function
represents the difference between the predicted output and the actual target. As the training
steps increase, the changes in the loss function will display the model’s degree of fit to the
training data during the training period as well as the optimization effect of the model.
From Figure 5a, it can be observed that with an increase in training steps, the value of
‘loss’ gradually decreased to around 0.2, and ‘aux.loss_ce’ (cross-entropy loss) decreased to
around 0.05. This indicates that the model progressively achieved a better fit to the training
Mathematics 2024, 12, 54 data. Figure 5b shows the changes in the test set classification evaluation indicators (mIoU,
12 of 17
mAcc, and aAcc) with the step size. When the step reached 40,000, the aAcc, mIoU, and
mAcc reached their maximum values of 84.26%, 64.85%, and 77.05%, respectively.
(a) (b)
Figure5.5.Performance
Figure Performancecurves
curves of
of100
100kkiterations:
iterations:(a)
(a)the
thevariation
variationof
oftraining
trainingset
setloss
lossfunction
functionwith
with
step size and (b) the changes in the test set classification evaluation indicators (mIoU, mAcc,
step size and (b) the changes in the test set classification evaluation indicators (mIoU, mAcc, and and
aAcc) with step size.
aAcc) with step size.
4.4.Experiment
4.4. Experiment2:2: Visualization
Visualization Results
Resultsofofthe
theProposed
ProposedModel
Model
Toverify
To verifythe
theeffectiveness
effectiveness ofof
thethe proposed
proposed model,
model, wewe provided
provided visual
visual results
results in var-
in various
indoor scenes such as a living room, kitchen, restaurant, and office
ious indoor scenes such as a living room, kitchen, restaurant, and office using the MAFD.using the MAFD. As
shown
As shownin Figure 6, the
in Figure 6,proposed
the proposed scheme was was
scheme able able
to segment fire objects
to segment well well
fire objects without any
without
post-processing.
any post-processing. The first
The and
first third rowsrows
and third in Figure 6 show
in Figure the input
6 show fire images,
the input while
fire images, the
while
corresponding
the corresponding output results
output segmented
results segmentedby the
byproposed
the proposedmodel are shown
model are shownin the
in second
the sec-
and
ondfourth rows rows
and fourth in Figure 6, where
in Figure the firethe
6, where objects are represented
fire objects by red by
are represented masks, wood
red masks,
objects
wood objects are represented by a yellow mask, fabric objects are represented by amask,
are represented by a yellow mask, fabric objects are represented by a green green
and other
mask, andareas
otherare darker.
areas It can be
are darker. seenbe
It can that
seenthethat
scenethein Figure
scene 6a is the
in Figure 6aliving
is the room.
living
The sofa
room. Thein sofa
the living
in the room
livingisroommade is of a flammable
made fabric fabric
of a flammable material, and the
material, andtables and
the tables
chairs are made
and chairs of a flammable
are made of a flammable wood woodmaterial. ThisThis
material. complex
complex indoor
indoorenvironment
environment is ais
a highly flammable place. Figure 6c shows the kitchen scene, and the kitchen stove is also
a dangerous place because of its obvious fire source. The fires in Figure 6b (restaurant)
and Figure 6d (office) were intense and accompanied by a lot of smoke. The above visual-
ization results can mark flame, fabric, and wood in various scenes in the MAFD, which
verifies that the scheme can realize the segmentation of flame, fabric, and wood in indoor
the corresponding output results segmented by the proposed model are shown in the sec-
ond and fourth rows in Figure 6, where the fire objects are represented by red masks,
wood objects are represented by a yellow mask, fabric objects are represented by a green
mask, and other areas are darker. It can be seen that the scene in Figure 6a is the living
Mathematics 2024, 12, 54 room. The sofa in the living room is made of a flammable fabric material, and the tables 12 of 16
and chairs are made of a flammable wood material. This complex indoor environment is
a highly flammable place. Figure 6c shows the kitchen scene, and the kitchen stove is also
ahighly
dangerous placeplace.
flammable because of its6cobvious
Figure shows thefirekitchen
source.scene,
The fires in Figure
and the kitchen6bstove
(restaurant)
is also a
and Figure 6d (office) were intense and accompanied by a lot of smoke. The
dangerous place because of its obvious fire source. The fires in Figure 6b (restaurant) above visual-
and
ization results
Figure 6d canwere
(office) mark flame,and
intense fabric, and woodbyina various
accompanied scenes
lot of smoke. in above
The the MAFD, which
visualization
verifies thatmark
results can the scheme can realize
flame, fabric, and woodthe segmentation of flame,
in various scenes in thefabric,
MAFD, and wood
which in indoor
verifies that
fire
the scenes.
schemeTherefore,
can realizeitthe
is feasible to use the
segmentation proposed
of flame, scheme
fabric, to identify
and wood fire, fabric,
in indoor and
fire scenes.
wood.
Therefore, it is feasible to use the proposed scheme to identify fire, fabric, and wood.
Input image
Segment result
Mathematics 2024, 12, 54 13 of 18
Input image
Segment result
Figure
Figure 6.
6. Examples
Examples of
of the
the model
model prediction
prediction output.
output.
4.5.
4.5. Experiment
Experiment 3:
3: Comparison
Comparison with
with State-of-the-Art
State-of-the-Art Methods
Methods
To comprehensively demonstrate
To demonstrate the the performance
performance of of our
our proposed
proposedmethod,
method,we wecon-
con-
ducted an
ducted an evaluation
evaluationcomparing
comparingit itwith established
with established state-of-the-art
state-of-the-artmodels
modelssuchsuch
as PSP-
as
Net [36] (Pyramid Scene Parsing Network), CCNet [37] (Criss-Cross
PSPNet [36] (Pyramid Scene Parsing Network), CCNet [37] (Criss-Cross Attention Net- Attention Network),
FCN [38],
work), FCN ISANet
[38], [39] (Interlaced
ISANet Sparse Self-Attention
[39] (Interlaced Network), Network),
Sparse Self-Attention and OCRNet and[40]OCRNet
(Object-
Contextual
[40] Representations
(Object-Contextual Network). They
Representations are all widely
Network). They are recognized
all widelymodels in themodels
recognized field of
semantic
in the fieldsegmentation. The basic experimental
of semantic segmentation. The basicsetup for other models
experimental remained
setup for consistent
other models re-
with ourconsistent
mained approachwithand all
ourexperiments
approach and were
all conducted
experiments exclusively on our exclusively
were conducted MAFD. on
For
our MAFD. comparison, we selected fire and one material, fabric, to represent the evaluation.
We first analyzed thewe
For comparison, comparative
selected fireresults
and onebased on aAcc,
material, fabric, Acc.fire, and Acc.fabric.
to represent the evaluation.The
results indicate that OCRNet performed relatively poorly in all of the evaluation
We first analyzed the comparative results based on aAcc, Acc.fire, and Acc.fabric. The re- metrics.
Our proposed
sults method
indicate that exhibited
OCRNet the highest
performed aAccpoorly
relatively and Acc.fire.
in all ofAlthough Acc.fabric
the evaluation was
metrics.
slightly
Our lower by
proposed 0.94%exhibited
method compared the tohighest
ISANet,aAcc
considering the overall
and Acc.fire. precision,
Although the com-
Acc.fabric was
slightly lower by 0.94% compared to ISANet, considering the overall precision, the com-7
prehensive performance of our method surpassed that of the existing models. Figure
visually displays
prehensive the comparison
performance results ofsurpassed
of our method accuracy (%)thatacross
of thedifferent
existing models,
models.providing
Figure 7
clear evidence of our model’s performance.
visually displays the comparison results of accuracy (%) across different models, provid-
ing clear evidence of our model’s performance.
slightly lower by 0.94% compared to ISANet, considering the overall precision, the com-
prehensive performance of our method surpassed that of the existing models. Figure 7
visually displays the comparison results of accuracy (%) across different models, provid-
ing clear evidence of our model’s performance.
Mathematics 2024, 12, 54 13 of 16
Figure7.7.Comparison
Figure Comparisonresults
resultsofofaccuracy
accuracy(%)
(%)across
acrossdifferent
differentmodels.
models.(aAcc,
(aAcc,Acc.fire,
Acc.fire,and
andAcc.fabric).
Acc.fabric).
Table
Table55shows
showsan anoverall
overallcomparison
comparisonofofthe themIoU,
mIoU,IoU.background,
IoU.background,IoU.fire,
IoU.fire,and
and
IoU.fabric across various
IoU.fabric across various models. Notably, OCRNet showed the poorest
Notably, OCRNet showed the poorest performance onperformance
on
thethe MAFD.
MAFD. In contrast,
In contrast, ourour proposed
proposed method
method outperformed
outperformed otherother
modelsmodels across
across all
all eval-
evaluation metrics.
uation metrics. Comparatively,
Comparatively, it isitreasonable
is reasonable to assume
to assume thatthat
thethe proposed
proposed method
method ex-
exhibits superioroverall
hibits superior overallperformance
performanceand andhigher
higherIoU
IoUvalues
valuesamong
amongall allexisting
existingmodels.
models.
(a)
(b)
(c)
(d)
(e)
(f)
Mathematics 2024, 12, 54 15 of 17
(g)
Figure
Figure 8.
8. Exemplary
Exemplary visual
visual results
results from
from various
various models:
models: row
row (a)
(a) illustrates
illustrates the
the input
input images,
images, while
while
(b–f) display the segmentation outcomes of PSPNet, CCNet, FCN, ISANet, OCRNet, and (g) dis-
(b–f) display the segmentation outcomes of PSPNet, CCNet, FCN, ISANet, OCRNet, and (g) displays
plays the results of our approach.
the results of our approach.
at improving the accuracy of identifying minute details and complex relationships within
indoor scenes. Additionally, a new database, MAFD, was tailored to collect a wide array
of fire instances and potential combustible materials. Through meticulous annotation,
this dataset aims to provide a comprehensive resource for training and evaluating models
dedicated to fire detection and material recognition within indoor settings. Ultimately, the
experimental results indicated that our model on the MAFD achieved an aAcc of 84.26%
and mAcc of 77.05%. We pioneered the simultaneous estimation of fire instances and fire
load in indoor scenarios, offering a novel strategy for fire safety protection and assessment.
To expand the applicability of our research, there are still more tasks to be undertaken
in the future. Firstly, we will incorporate a wider array of combustible materials in the
MAFD such as paper, plastic, etc., to ensure a richer representation of potential fire hazards
within indoor environments. Secondly, we are refining the training parameters to augment
model training efficiency and accuracy. Thirdly, our research can serve as a new reference
for designing secure buildings and evaluating the fire resistance of structures.
Author Contributions: Conceptualization, F.H.; data curation, F.H.; methodology, F.H.; software,
W.Z.; validation, W.Z.; visualization, W.Z.; writing—original draft, F.H. and W.Z.; writing—review
and editing, X.F. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by National Natural Science Foundation of China (nos. 62203475)
and Changsha Natural Science Foundation (nos. kq2208285).
Data Availability Statement: The data supporting the findings of this study are available upon
request from the readers.
Conflicts of Interest: We hereby declare that we have no conflicts of interest that could be perceived
as influencing the integrity or objectivity of our work.
References
1. Zhang, L.; Wang, G.X.; Yuan, T.; Peng, K.M. Research on Indoor Map. Geom. Spat. Inf. Technol. 2013, 43–47. [CrossRef]
2. Kuti, R.; Zólyomi, G.; László, G.; Hajdu, C.; Környei, L.; Hajdu, F. Examination of Effects of Indoor Fires on Building Structures
and People. Heliyon 2023, 9, e12720. [CrossRef] [PubMed]
3. Kodur, V.; Kumar, P.; Rafi, M.M. Fire Hazard in Buildings: Review, Assessment and Strategies for Improving Fire Safety. PSU Res.
Rev. 2020, 4, 1–23. [CrossRef]
4. Li, S.; Yun, J.; Feng, C.; Gao, Y.; Yang, J.; Sun, G.; Zhang, D. An Indoor Autonomous Inspection and Firefighting Robot Based on
SLAM and Flame Image Recognition. Fire 2023, 6, 93. [CrossRef]
5. Xie, Y.; Zhu, J.; Guo, Y.; You, J.; Feng, D.; Cao, Y. Early Indoor Occluded Fire Detection Based on Firelight Reflection Characteristics.
Fire Saf. J. 2022, 128, 103542. [CrossRef]
6. Wu, X.; Lu, X.; Leung, H. A Video Based Fire Smoke Detection Using Robust AdaBoost. Sensors 2018, 18, 3780. [CrossRef]
[PubMed]
7. Russo, A.U.; Deb, K.; Tista, S.C.; Islam, A. Smoke Detection Method Based on LBP and SVM from Surveillance Camera. In
Proceedings of the 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering
(IC4ME2), Rajshahi, Bangladesh, 8–9 February 2018.
8. Wang, H.; Zhang, Y.; Fan, X. Rapid Early Fire Smoke Detection System Using Slope Fitting in Video Image Histogram. Fire Technol.
2020, 56, 695–714. [CrossRef]
9. Wu, X.; Cao, Y.; Lu, X.; Leung, H. Patchwise Dictionary Learning for Video Forest Fire Smoke Detection in Wavelet Domain.
Neural Comput. Appl. 2021, 33, 7965–7977. [CrossRef]
10. Gagliardi, A.; Saponara, S. AdViSED: Advanced Video SmokE Detection for Real-Time Measurements in Antifire Indoor and
Outdoor Systems. Energies 2020, 13, 2098. [CrossRef]
11. Hossain, F.M.A.; Zhang, Y.M.; Tonima, M.A. Forest Fire Flame and Smoke Detection from UAV-Captured Images Using Fire-
Specific Color Features and Multi-Color Space Local Binary Pattern. J. Unmanned Veh. Syst. 2020, 8, 285–309. [CrossRef]
12. Jia, Y.; Chen, W.; Yang, M.; Wang, L.; Liu, D.; Zhang, Q. Video Smoke Detection with Domain Knowledge and Transfer Learning
from Deep Convolutional Neural Networks. Optik 2021, 240, 166947. [CrossRef]
13. Peng, Y.; Wang, Y. Real-Time Forest Smoke Detection Using Hand-Designed Features and Deep Learning. Comput. Electron. Agric.
2019, 167, 105029. [CrossRef]
14. Cheng, S.; Ma, J. Smoke Detection and Trend Prediction Method Based on Deeplabv3+ and Generative Adversarial Network. J.
Electron. Imaging 2019, 28, 1. [CrossRef]
15. Yuan, F.; Zhang, L.; Xia, X.; Wan, B.; Huang, Q.; Li, X. Deep Smoke Segmentation. Neurocomputing 2019, 357, 248–260. [CrossRef]
Mathematics 2024, 12, 54 16 of 16
16. Lin, G.; Zhang, Y.; Xu, G.; Zhang, Q. Smoke Detection on Video Sequences Using 3D Convolutional Neural Networks. Fire Technol.
2019, 55, 1827–1847. [CrossRef]
17. Li, J.; Zhou, G.; Chen, A.; Wang, Y.; Jiang, J.; Hu, Y.; Lu, C. Adaptive Linear Feature-Reuse Network for Rapid Forest Fire Smoke
Detection Model. Ecol. Inform. 2022, 68, 101584. [CrossRef]
18. Liu, H.; Lei, F.; Tong, C.; Cui, C.; Wu, L. Visual Smoke Detection Based on Ensemble Deep CNNs. Displays 2021, 69, 102020.
[CrossRef]
19. Zhan, J.; Hu, Y.; Zhou, G.; Wang, Y.; Cai, W.; Li, L. A High-Precision Forest Fire Smoke Detection Approach Based on ARGNet.
Comput. Electron. Agric. 2022, 196, 106874. [CrossRef]
20. Hu, Y.; Zhan, J.; Zhou, G.; Chen, A.; Cai, W.; Guo, K.; Hu, Y.; Li, L. Fast Forest Fire Smoke Detection Using MVMNet. Knowl.-Based
Syst. 2022, 241, 108219. [CrossRef]
21. Hosseini, A.; Hashemzadeh, M.; Farajzadeh, N. UFS-Net: A Unified Flame and Smoke Detection Method for Early Detection of
Fire in Video Surveillance Applications Using CNNs. J. Comput. Sci. 2022, 61, 101638. [CrossRef]
22. Khan, S.; Muhammad, K.; Mumtaz, S.; Baik, S.W.; de Albuquerque, V.H.C. Energy-Efficient Deep CNN for Smoke Detection in
Foggy IoT Environment. IEEE Internet Things J. 2019, 6, 9237–9245. [CrossRef]
23. He, L.; Gong, X.; Zhang, S.; Wang, L.; Li, F. Efficient Attention Based Deep Fusion CNN for Smoke Detection in Fog Environment.
Neurocomputing 2021, 434, 224–238. [CrossRef]
24. Muhammad, K.; Khan, S.; Palade, V.; Mehmood, I.; de Albuquerque, V.H.C. Edge Intelligence-Assisted Smoke Detection in Foggy
Surveillance Environments. IEEE Trans. Industr. Inform. 2020, 16, 1067–1075. [CrossRef]
25. Strese, M.; Schuwerk, C.; Iepure, A.; Steinbach, E. Multimodal Feature-Based Surface Material Classification. IEEE Trans. Haptics
2017, 10, 226–239. [CrossRef] [PubMed]
26. Zhang, H.; Jiang, Z.; Xiong, Q.; Wu, J.; Yuan, T.; Li, G.; Huang, Y.; Ji, D. Gathering Effective Information for Real-Time Material
Recognition. IEEE Access 2020, 8, 159511–159529. [CrossRef]
27. Lee, S.; Lee, D.; Kim, H.-C.; Lee, S. Material Type Recognition of Indoor Scenes via Surface Reflectance Estimation. IEEE Access
2022, 10, 134–143. [CrossRef]
28. Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019.
29. Yu, F.; Koltun, V.; Funkhouser, T. Dilated Residual Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
30. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE
Inst. Electr. Electron. Eng. 2021, 109, 43–76. [CrossRef]
31. GitHub—Open-Mmlab/Mmsegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. Available online:
https://github.com/open-mmlab/mmsegmentation (accessed on 14 March 2023).
32. Wilson, A.C.; Roelofs, R.; Stern, M.; Srebro, N.; Recht, B. The Marginal Value of Adaptive Gradient Methods in Machine Learning.
arXiv 2017, arXiv:1705.08292.
33. Zhou, Y.-C.; Hu, Z.-Z.; Yan, K.-X.; Lin, J.-R. Deep Learning-Based Instance Segmentation for Indoor Fire Load Recognition. IEEE
Access 2021, 9, 148771–148782. [CrossRef]
34. Torralba, A.; Russell, B.C.; Yuen, J. LabelMe: Online Image Annotation and Applications. Proc. IEEE Inst. Electr. Electron. Eng.
2010, 98, 1467–1484. [CrossRef]
35. Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J.
Comput. Vis. 2010, 88, 303–338. [CrossRef]
36. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
37. Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-Cross Attention for Semantic Segmentation. In
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2
November 2019.
38. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach.
Intell. 2017, 39, 640–651. [CrossRef] [PubMed]
39. Huang, L.; Yuan, Y.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Interlaced Sparse Self-Attention for Semantic Segmentation. arXiv 2019,
arXiv:1907.12273.
40. Yuan, Y.; Chen, X.; Wang, J. Object-Contextual Representations for Semantic Segmentation. In Computer Vision—ECCV 2020;
Lecture Notes in Computer Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 173–190,
ISBN 9783030585389.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.