A Video Smoke Detection Algorithm Based On Cascade Classification and Deep Learning
A Video Smoke Detection Algorithm Based On Cascade Classification and Deep Learning
2018 6018
Copyright ⓒ 2018 KSII
Received March 30, 2018; revised June 4, 2018; accepted July 31, 2018;
published December 31, 2018
Abstract
Fires are a common cause of catastrophic personal injuries and devastating property damage.
Every year, many fires occur and threaten human lives and property around the world.
Providing early important sign for early fire detection, and therefore the detection of smoke is
always the first step in fire-alarm systems. In this paper we propose an automatic smoke
detection system built on camera surveillance and image processing technologies. The key
features used in our algorithm are to detect and track smoke as moving objects and distinguish
smoke from non-smoke objects using a convolutional neural network (CNN) model for
cascade classification. The results of our experiment, in comparison with those of some earlier
studies, show that the proposed algorithm is very effective not only in detecting smoke, but
also in reducing false positives.
This work was supported by the research grant of the Kongju National University in 2017.
1. Introduction
Every year, fires cause thousands of human deaths and billions of dollars in property damage.
In most cases, however, the fire damage can be prevented, or at least reduced, if the fires are
detected earlier. It is therefore very important to develop an automatic fire-alarm system.
Since smoke is the most important clue in the early stages of a fire, smoke detection should be
a first step in the effort to detect the fire early.
Many of the existing smoke detection systems make use of sensors like ionization detectors,
photoelectric sensors and carbon dioxide detectors. The accuracy of such systems primarily
depends on the reliability and positions of sensors. These sensors should be distributed densely
so as to ensure a high precision of smoke detection systems which may be difficult to install
especially in large outdoor spaces.
Recently, digital cameras have been evolving rapidly in the field of security surveillance.
Compared to sensor-based systems, security cameras are easy to install and can be used to
monitor large open areas. A recent trend is that sensor-based systems have been replaced by
developing smoke detection systems based on surveillance camera systems and video analysis
techniques.
A large number of image processing algorithms have been proposed for smoke detection
by video analysis, and some of them have achieved considerable success. In the following
paragraphs, we will discuss some of the most popular and successful technologies which are
used for smoke detection.
Most of the proposed algorithms consider smoke as moving objects and assume that smoke
will change the background appearance when it appears [2, 3, 5, 6]. Detecting the background
change is a technique that is frequently used as the first step of such an algorithm to detect
candidate smoke regions and eliminate stationary non-smoke objects. The most effective
techniques for detecting background changes include background modeling [1], background
subtraction and optical flow estimation.
The background change detection just helps locate candidate smoke regions but fails to
distinguish smoke from non-stationary objects like humans, vehicles or varying background
illuminations. Further analysis steps are required to verify detected smoke objects.
Colors are widely used to classify smoke [2, 3, 5, 6], which can appear gray, light gray,
white or dark gray. In reality, however, there are many objects having similar colors, and in
some cases, smoke is semitransparent and thus its color is affected by background colors.
Therefore, the color is not always a reliable clue to the presence of smoke.
Such methods as the randomness of smoke area size [2], smoke contours roughness [4,8],
growth of smoke regions [8] have been also proposed to eliminate false positives during
smoke detection; however, none of these features is perfect and all of them are still prone to
generate both false positives or false negative in certain situations.
Another interesting approach for smoke classification is wavelet-based analysis [4, 5].
When the whole or part of a background image is blurred by smoke, the high frequency
components and sharpness of the image can decrease on the surface of smoke regions.
Calculating a decrease in wavelet energy provides an important clue for smoke detection.
However, this feature is not always correct. For example, the presence of smoke could
increase the edge energy of the smooth background surface or some non-smoke objects which
have smooth and large surface could decrease the sharpness and edge of the background.
Recently, some image classification algorithms based on local image features (i.e., HOG
6020 Manh Dung et al.: A Video Smoke Detection Algorithm Based on Cascade Classification and Deep Learning
Candidate smoke region detection: We use the most popular and efficient background
modeling algorithm, Mixture of Gaussian Background Modeling (MOG) [1], to detect
changes in background pixels. Then we cluster connected pixels into sub regions as candidate
smoke regions.
Smoke classifier: A set of classifiers (layers) called the “cascade model” is used to classify
smoke and non-smoke regions. A candidate smoke region will be classified as true smoke if it
passes all layers of the model. In the top layer, we use weak but rapidly processed features
such as color, randomness of size variation, and edge energy to eliminate non-smoke regions.
However, we have to choose thresholds to make sure that only non-smoke regions are
eliminated. A lenient threshold could make for more false positives, but these will be reduced
by later classifier layers. The final layer of the cascade model is reliable a deep learning image
classifier for verifying candidate smoke regions.
Temporal Analysis: The final decision stage, increasing the precision of smoke detection.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 12, NO. 12, December 2018 6021
1) Color classification
Usually smoke is dark gray, gray, light gray, or white in color. Therefore, an image pixel will
be classified as a smoke pixel if it meets the following conditions:
Where 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵 are the intensity of red, green and blue color channel of the image pixel,
𝑡ℎ𝑐 is a threshold value, and I is the image pixel intensity. In our experiments, 𝑡ℎ𝑐 ranged from
5 to 25 and 80 < I < 220 for smoke pixels.
(c) Smoke mask before color (d) Non-smoke color pixel(red mask)
calssification classification
Fig. 4. Cascade model architecture
Fig. 4 show the classification of image pixels inside candidate smoke regions into smoke
and non-smoke color pixels. For true smoke region, the number of non-smoke color pixel is
very low. So that, to classify a candidate smoke regions into smoke or non-smoke by using
color feature we counting all of smoke color pixels 𝑁𝑐𝑜𝑙𝑜𝑟 and total pixels 𝑁𝑡𝑜𝑡𝑎𝑙 inside this
region. If ratio between 𝑁𝑐𝑜𝑙𝑜𝑟 to 𝑁𝑡𝑜𝑡𝑎𝑙 lower than a certain threshold 𝑡ℎ𝑟𝑐 this region will be
classified as non-smoke and will be eliminated for next process.
𝑡ℎ𝑟𝑐 is determined in experiment to make sure that only non-smoke regions are rejected in
this step.
The analysis of a growing region is as follows: Let’s track the size of a candidate smoke
region in a period of time. If the size of this region gradually increase it will be determined as
a growing region and classified as a possible of smoke for next process.
Assume that 𝑓𝑔𝑟𝑜𝑤𝑡ℎ and 𝑛𝑔𝑟𝑜𝑤𝑡ℎ are growing factor and growing steps counter of each
candidate smoke regions. Initially, 𝑛𝑔𝑟𝑜𝑤𝑡ℎ is set to zero and 𝑛𝑔𝑟𝑜𝑤𝑡ℎ increments by one if
size of the candidate smoke region reached a certain value as described in equation (6).
Where 𝑆𝑡 and 𝑆𝑡+1 are the size of the candidate smoke region at current and next growing
step. If 𝑁𝑓𝑟𝑎𝑚𝑒 is the number of analysis frame, the candidate smoke region will be classified
as a growing region if it meet following condition:
𝑛𝑔𝑟𝑜𝑤𝑡ℎ
𝑁𝑓𝑟𝑎𝑚𝑒
> 𝑡ℎ𝑔𝑟𝑜𝑤𝑡ℎ (7)
𝑑𝑡
𝑑𝐴𝑡 = 𝑆𝑆 (9)
𝑡
And the standard deviation of size variant 𝑠𝑡𝑑𝑆𝑡 over n recent frames at time t, given by:
1
𝑆𝑡𝑑𝑠𝑡 = �𝑛 ∑𝑛−1 𝑡−𝑖
𝑖=0 (𝑑𝑠 − 𝑑𝑠 )2 (10)
A candidate smoke region will be passed to the next layer if it satisfies the following
conditions:
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 12, NO. 12, December 2018 6025
Where 𝑡ℎ𝑑𝐴 and 𝑡ℎ𝑆𝑡𝑑𝑆 are the decision threshold, which are selected during experiment.
In order to estimate 𝑁𝐺+ and 𝑁𝐺− , firstly we calculate the gradient magnitude for
background image pixels using the following equations:
Where 𝐺𝑏𝑥,𝑦 𝐺𝑏𝑥𝑥,𝑦 𝐺𝑏𝑦𝑥,𝑦 𝐵𝑥,𝑦 are the gradient magnitude, vertical gradient, horizontal
gradient, and intensity of the current background image at position (x, y), respectively.
Similarly, we also calculate the gradient magnitude for current image pixels:
Where 𝐺𝑥,𝑦 𝐺𝑥𝑥,𝑦 𝐺𝑦𝑥,𝑦 𝐼𝑥,𝑦 are the gradient magnitude, vertical gradient, horizontal
gradient, and intensity of the current frame at position (x, y), respectively.`
One pixel is considered a lost edge magnitude pixel if:
Fig. 7 shows representative maps of lost edge magnitude pixels and gained edge magnitude
pixels. For a true smoke region, we can easily see that the number of gained edge magnitude
pixels is much smaller than the number of lost edge magnitude pixels. Using these edge
magnitude-based features, we identify a candidate as a smoke region if it satisfies the
following condition:
𝑁𝐺+
𝑁𝐺−
< 𝑡ℎ𝑒 (21)
(g) Gained edge pixels map (h) Lost edge pixels map
Fig. 6. Maps of lost edge magnitude pixels and gained edge magnitude pixels for a true smoke region
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 12, NO. 12, December 2018 6027
Transfer learning
Training a CNN requites a large datasets and a lot of computational time. This leads to
difficulty when we want to retrain or update a trained model for other categories. Transfer
learning [10, 11, 12] aims to overcome these difficulties. Instead of retraining the network
from scratch, transfer learning utilizes a trained model on a different dataset and adapts it to
train a new classifier.
Fine-tune the trained model is one of the transfer learning approaches, this approach
fine-tunes the trained model on the new dataset by continuing back propagation. It can either
fine-tune all the layers of the network or keep some of its layers.
The advantages of CNN Image classification are robust against distortions, such as change in
shape, poses, scale, lighting condition or presence of partial occlusions. Experimental results
show that CNN is sufficient for algorithmic use to achive the state of the art performance in
image classification and classify smoke and non-smoke object.
In order to train the CNN network for smoke classification, we replaced the last layer with
a new one trained from scratch using the back-propagation algorithm with our image dataset,
which has only two different categories (smoke and non-smoke). The whole layers of cascade
model work together as follows.
Candidate smoke region (Fig. 2f) is an area containing a moving object, which was
detected by using Mixture of Gaussian background Model(Section 2.1). A cascade smoke
classification model includes sequence of smoke classifiers, but the input of a cascade model
is a single sub-image that contains candidate smoke region. If a candidate smoke region is
classified as non-smoke at current layer it will be rejected immediately otherwise it will be
passed to next layer for classification process. A candidate smoke region is only classified as
true smoke if it pass all of layers of cascade model.
A non-smoke image may be of a human, a vehicle, or just a simple background image. Our
dataset has 10,000 smoke images and 10,000 non-smoke images for training and another 2,000
smoke and 2,000 non-smoke images for evaluation.
Non-smoke object images were collected from various sources such as the PETA dataset
(pedestrian images), the Cars dataset (vehicles), and the PASCAL dataset (backgrounds and
other moving objects). We also manually segmented non-smoke objects from surveillance
videos which were recorded for the IVS project [16].
Smoke object images were manually segmented from IVS project smoke videos [16],
videos from YouTube, and other videos from the
internet(http://signal.ee.bilkent.edu.tr/VisiFire). Also we uploaded our results on test videos to
YouTube.
(https://www.youtube.com/playlist?list=PLh7GPJJcJClgFKTBJbC9dJ7p6VSftWg6n)
We began the fine-tuning process with a learning rate of 0.01, and dropped it by a factor of
ten every 2,000 iterations. We used a smaller learning rate for weights being fine-tuned under
the assumption that the pre-trained CNN weights were relatively good; we didn’t want to
distort them too quickly or by too much. The optimization process was run for a maximum of
50,000 iterations. The accuracy of trained smoke classification model is 98%, false negative is
2.3% and false positive is 1.7%
We used 15 videos, including 10 smoke videos and 5 non-smoke videos for evaluation. Fig.
9 shows some example results from our experiments. The upper left image is movie_01, and
the lower right image is movie_15.
Fig. 9. Smoke detection results Red boundaries are detected smoke regions; green boundaries
non-smoke moving objects
An algorithm that does not implement a CNN classifier generally requires setting a high
decision threshold to reduce false positives, but it does not detect many real smoke regions and
still has a high false positive rate. The details of the experiment and evaluation are summarized
in Table 1.
The algorithm based on motion vector, surface roughness, and area randomness[2] missed
detection of smoke in movie_06 and false detection of smoke in movie_11 and movie_15.
Follow this algorithm, smoke is classified as true smoke if the variation of the motion vector is
large and the size of the smoke region changes randomly and quickly. However, in movie_06
inside the tunnel, the air flow is low and the smoke seems to gradually spread-out into
surrounding, in that case both direction and size of smoke are not varing so much so that
smoke might be classified as non-smoke object. This algorithm also has problem in movie_11
and movie_15 when a human or human group moving around. In this case, direction, size
change, color, and surface roughness are all similar to smoke characteristics and lead to false
detection.
The algorithm based on decrease of the background edge [4] has some difficulties if the
background edge is poor or the object surface edge is poor. As in movie_06, if smoke appears
when the edges of the background are poor, edge energy of background will increase instead
of reducing edge energy, or energy of background edges will decrease on objects of movie_15
with poor surface edges. The algorithm also uses boundary roughness to classify the smoke
and non-smoke. According to this algorithm, the boundary roughness of non-smoke objects
looks smoother than smoke objects, however this feature is not always correct. As shown in
movie_06 and movie_15 of Fig. 9, smoke boundaries look smoother than non-smoke objects.
Other algorithms are based on machine learning, which classifies smoke using bag of
feature histograms and random forest classifiers[6]. This algorithm has some advantages over
the heuristic rule classifier, but it still has disadvantages when it ignores object spatial
relationships, confuses background information when computing features, and especially
when the smoke is semi-transparent in the background.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 12, NO. 12, December 2018 6031
In contrast to other techniques, our algorithm tries to select a decision threshold that
removes only moving objects with low probability of being smoke. Depending on the
robustness and high accuracy of the CNN classifier, our algorithm not only reduces false
positives, but also achieves excellent detection rates. As shown in Table 1, our algorithm
successfully detected smoke on every short video that contained smoke, and false positives
were not returned [17].
By using NVIDIA computing accelerated hardware, our algorithm also achieves very good
computing performance. The processing time for top classifier layers of cascade model less
than one millisecond and processing time of CNN classifier is about eight millisecond per
image plus processing time for other parts. Test results for system perfomance is summarized
in Table 2. Experimental results show that our system can handle 40 frames per second and
can detect smoke between 3 to 10 second, making the system suitable for real-time
applications.
References
[1] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,”
in Proc. of IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recognition, vol. 2, 1999.
Article (CrossRef Link)
[2] T. X. Tung and J.-M. Kim, “An effective four-stage smoke-detection algorithm using video
images for early fire-alarm systems,” Fire Safety J., vol. 46, no. 5, pp. 276-282, Jul. 2011.
Article (CrossRef Link)
[3] W. Zheng, W. Xingang, A. Wenchuan, and C. Jianfeng, “Target-tracking based early fire smoke
detection in video,” in Proc. of ICIG ‘09, pp. 172-176, Sept. 2009. Article (CrossRef Link)
[4] B. U. Toreyin, Y. Dedeoglu, and A. Enis Cetin, “Contour based smoke detection in video using
wavelets,” in Proc. of 14th Eur. Signal Process. Conf., pp. 1-5, Sept. 2006.
Article (CrossRef Link)
[5] C.-Y. Lee, C.-T.Lin, C.-T.Hong, and M.-T. Su, “Smoke detection using spatial and temporal
analysis,” Int. J. Innovative Comput., Inf. and Contr., vol. 8, no. 7(A), Jul. 2012.
Article (CrossRef Link)
[6] B. C. Ko, J. Y. Kwak, and J. Y. Nam, “Wildfire smoke detection using temporal–spatial features
and random forest classifiers,” Opt. Eng., vol. 51, no. 1, Feb. 2012. Article (CrossRef Link)
[7] A. Bosch, A. Zisserman, and X. Munoz, “Image classification using random forests and ferns,” in
Proc. of ICCV 2007, pp. 1-8, Oct. 2007. Artile (CrossRef Limk)
[8] A. Genovese,R. D. Labati, V. Piuri, and F. Scotti, “Wildfire smoke detection using computational
intelligence techniques,” CIMSA, pp. 1-6, Sept. 2011. Article (CrossRef Link)
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet classification with deep convolutional
neural networks,” in Proc. of NIPS’12, Dec. 2012. Article (CrossRef Link)
[10] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image
representations using convolutional neural networks,” 2014 IEEE CVPR, pp. 1717-1724, Jun.
2014. Article (CrossRef Link)
[11] Y. Bengio, “Deep learning of representations for unsupervised and transfer learning,” in Proc. of
UTLW'11 Proc. 2011 Int. Conf. Unsupervised and Transfer Learning workshop,vol. 27, pp. 17-37,
2012. Article (CrossRef Link)
[12] A. K. Reyes, J. C. Caicedo, and J. E. Camargo, “Fine-tuning deep convolutional networks for plant
recognition,” in Proc. of Working Notes of CLEF 2015-Conf. and Labs of the Evaluation Forum
CLEF 2015, Sept. 2015. Article (CrossRef Link)
[13] Convolutional Neural Networks, Article (CrossRef Link)
[14] BAIR/BVLC CaffeNet Model, Article (CrossRef Link)
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 12, NO. 12, December 2018 6033
Nguyen Manh Dung received B.S. degrees from the Department of Electronics And
Telecommunication Engineer at Hanoi University Of Science And Technology in 2005,
and M.S degrees from Department of Information And Communication at Kongju National
University in 2009. He was a senior research engineer in the Research And Development
Department of IVS Technology. Since 2017 he became a PhD Student in DC lab of
Information And Communication Department at Kongju National University. His interests
include embedded system, image processing and video analysis algorithms for surveillance
camera system.
Soonghwan Ro received B.S., M.S., and Ph.D degrees from the Department of
Electronics Engineering at Korea University in 1987, 1989, and 1993, respectively. He was
a research engineer of Electronics and Telecommunications Research Institute and
University of Birmingham in 1997 and 2003, respectively. Since March 1994 he has been a
professor at Kongju National University, Korea. His research interests include 5G
communication, mobile network, and embedded systems.