Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Application
Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Application
Fire is a devastating natural disaster that affects both people and the
Article history: environment. Recent research has suggested that computer vision could be
used to construct a cost-effective automatic fire detection system. This paper
Received May 13, 2022 describes a unique framework for utilizing CNN to detect fire. Convolution
Revised Jun 13, 2022 Neural Networks have yielded state-of-art performance in image
Accepted Jun 15, 2022 classification and other computer vision tasks. Their use in fire detection
systems will significantly enhance detection accuracy, resulting in fewer fire
disasters and less ecological and social consequences. The deployment of
CNN-based fire detection in everyday surveillance networks, however, is a
severe problem due to the huge memory and processing needs for inference.
In this study, we offer an innovative, energy-efficient, and computationally
effective CNN model for detection of fire, localisation, and understanding of
the fire scenario, based on the SqueezeNet architecture. It makes use of
small convolutional kernels and avoids thick, fully connected layers that
reduces the computational load. This paper shows how the unique qualities
of the problem at hand, as well as a wide range of fire data, can be combined
to make a balance of fire detection effectiveness and precision.
Fig. 1. Overview of the proposed system for fire detection using a deep CNN.
Fig 2. System Architecture.
Fig. 3. Fire localization using the proposed deep CNN.
8. : fire detection problem more effectively. Two standard
Output: Feature maps sensitive to fire. convolutional layers, three maximum pooling layers,
one average pooling layer, and eight "fire modules"
make up the model.
In the first convolution layer, the input picture is
passed through 64 filters of size 3x3, resulting in 64
feature maps. The first max pooling layer, with a stride
2. METHODOLOGY of two pixels and a neighborhood of 3x3 pixels, selects
We use a model with a similar architecture to the maximum activations of these 64 features maps.
SqueezeNet that has been modified to fit our target This decreases the proportions of the feature maps by
problem. The original model was trained on the a factor of two, allowing the most valuable
ImageNet dataset and is capable of classifying 1000 information to be retained while the insignificant
different objects. This was achieved by reducing the particulars are discarded. Following that, we employ
number of neurons in the final layer from 1000 to 2. two 128-filter fire modules, followed by a 256-filter
By keeping the rest of the architecture similar to the fire module. Squeezing and expansion are two more
original, we aimed to reuse the parameters to solve the convolutions in every firing module. Because each
module has several filter resolutions and the Caffe Proposed before
framework lacks native support for such convolution 9.99% 10.39%
FT 89.80%
layers. In each fire module, an expansion layer was
added, along with two independent convolution layers. AlexNet after FT 9.07% 2.13% 94.39%
11 filters make up the first convolution layer, while 33
filters make up the second. In the channel dimension, AlexNet before
9.23% 10.64% 90.07%
the output of these two layers is concatenated. FT
A significant number of weights need to be
Foggia et al. 11.67% 0% 93.55%
properly adjusted in CNNs, and a huge amount of
training data is usually required for this. Insufficient De Lascio et al. 13.33% 0% 92.86%
training data can lead to overfitting of these
parameters. The fully connected layers usually contain Habibuglu et al. 5.88% 14.29% 90.32%
the most parameters, and these can cause significant
overfitting. These problems can be avoided by
introducing regularization layers such as dropout, or Table 1. Comparison of various fire detection
by replacing dense fully connected layers with methods for dataset1
convolution layers. We used a pretrained SqueezeNet
model and fine-tuned it according to our classification
problem with a slower learning rate of 0.001. We also
removed the last fully connected layers to make the
architecture as efficient as possible in terms of
classification accuracy. The process of fine-tuning was
executed for 10 epochs; this increased the
classification accuracy from 89.8% to 94.50%, thus
giving an improvement of 5%.
Fig 6. Visual fire localization results of our CNNFire approach and other fire localization methods. (a) Input image (b) Ground truth. (c)
BoWFire. (d) Color classification. (e) Celik. (f) Chen. (g) Rossi. (h) Rudz. (i) CNNFire.
Proposed method
0.86 0.97 0.91
before FT
Fig. 7. Representative images from Dataset2. The top four
images include fires, while the remaining four images represent
fire-like normal images.
Fig. 8. Fire localization results from our CNNFire and other schemes with false positives. (a) Input image. (b) Ground truth.
(c) BoWFire. (d) Color classification. (e) Celik. (f) Chen. (g) Rossi. (h) Rudz. (i) CNNFire.
4.3. FIRE LOCALIZATION RESULTS AND examine the performance of fire localisation, true
DISCUSSION positive and false positive rates were calculated.
Because the feature maps we utilized to locate fire
The performance of our technique is evaluated were smaller than the ground truth photos, they
in this part in terms of fire localisation and were scaled to match the ground truth images'
comprehension of the scene under observation. To dimensions. A reason for choosing SqueezeNet was
the model's ability to provide bigger feature map correctly, but give larger regions as false positives.
sizes by using smaller kernels and avoiding pooling Chen fails to detect the fire regions of the ground
layers. When the feature maps were resized to truth image.
match the ground truth images, this allowed us to Our system can assess the severity of the
execute a more accurate localization. observed fire as well as the item under observation,
Fig. 5 shows the results of all methods for a in addition to fire detection and localisation. We
sample image from Dataset2. The BoWFire, color retrieved the ZOI from the input image and
categorization, Celik, and Rudz results are nearly segmented the fire regions for this. The ZOI image
identical. In this context, Rossi provides the worst was then fed into the SqueezeNet model, which had
outcomes, while Chen outperforms Rossi. The been trained on the 1000-class ImageNet dataset.
results from CNNFire are similar to the ground The SqueezeNet model's label for the ZOI image is
truth. Fig. 7. Highlights the performance of all then paired with the fire's severity for reporting to
methods for another sample image, with a higher the fire department. A set of sample cases from this
probability of false positives. Although BoWFire experiment is given in Fig. 8
has no false positives for this case, it misses some
fire regions, as is evident from its result. Color
classification and Celik detect the fire regions
Fig. 9. Sample outputs from our overall system: the first column shows input images with labels predicted by our CNN
model and their probabilities, with the highest probability taken as the final class label; the second column shows three
feature maps (F8, F26, and F32) selected by Algorithm 1; the third column highlights the results for each image using
Algorithm 2; the fourth column shows the severity of the fire and ZOI images with a label assigned by the SqueezeNet
model; and the final column shows the alert that should be sent to emergency services, such as the fire brigade. (a) Fire:
98.76%, normal: 1.24%. (b) Fire: 98.8%, normal: 1.2%. (c) Fire: 99.53%, normal: 0.47%.
REFERENCES