ForestFireDetectionUsingCombinedArchitecture_ResearchGatePreprint
ForestFireDetectionUsingCombinedArchitecture_ResearchGatePreprint
net/publication/351500230
CITATIONS READS
16 336
2 authors:
All content following this page was uploaded by Soham Ghosh on 19 September 2021.
This article has been accepted for publication in 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). This is the author's version
which has not been fully edited and content may change prior to final publication. Citation information: 10.1109/CAIDA51941.2021.9425170
© 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more
information.
local and regional fire departments plan for the season. In [5], the study area helps narrow down places where the formulated
the authors proposed the visualization of an optimal power model can be successfully implemented. For instance, as
shut-off problem, enabling grid operators to maximize the observed in [4], the US states of Texas, California, Oregon, and
amount of transmitted power while selectively de-energizing Arizona are among the high-risk forest fire areas. Given the
the electric grid components to minimize the risk of wildfire topographical similarities, CNN architectures involving aerial
ignitions. Some success has also been reported in [6] with imaging from Northern Arizona can be implemented in US
optimal placements of specialized cameras combined with an states of California and Oregon. However, the same neural
intelligent video smoke detection algorithm. network may require added training while assessing forest fires
in Texas due to its distinct topography.
Among the many recent technologies available to detect
wildfires in the early stages, artificial intelligence-based CNN This study’s initial simulation and modeling requirements
seems to be at the forefront. Wildfire data captured by UAVs were fulfilled using Python 3.8.5 on standard i7 hardware.
and satellites using optical and thermal imaging systems can Google Colab Pro was used to support the full-scale model
facilitate precise training of a neural network [7]–[11]. With performance optimization using Python’s Scikit-Learn based
progress in the field of computer vision, sophisticated CNN GridsearchCV. Python libraries used for the simulation study
architectures like AlexNet, GoogLeNet, and ResNet have include but are not limited to Tensorflow 2.0, Keras, OpenCV
evolved as winners of the ILSVRC ImageNet challenge [12], 3.4.13, and Scikit-Image 0.18.1.
[13]. Past researchers reported success formulating variants of IV. M ETHODOLOGY
above mentioned CNN architectures in forest fire classifica-
A. Preview of Neural Networks, CNNs, and Separable CNNs
tion based on UAV-captured imageries [7], [8]. However, the
suggested architectures often impose high false alarm rates The CNN algorithm attempts to replicate an eye’s visual
and cannot detect smoke. Furthermore, there is a research gap cortex, which breaks down images into small receptive fields
in optimizer selection and hyperparameter tuning, which can reactive to specific locations in the visual field [12]. CNN
significantly improve model performance. requires less computation time than deep neural networks
due to its unique spatial independent algorithm that does not
This study aims to detect forest fires from UAV cap- require using all available pixels from an input image in
tured raw images. Unlike past studies [7], [8] that focused the initial layers [18]. Mathematically, a standard convolution
on implementing complex CNN models that required more operation can be expressed as in (1):
extensive computation time with relatively low accuracy on NI
the test set, this study suggested a combined architecture of
X
Yi = bi + Wi · Kji ; j = 1, 2, · · · , NO (1)
simple separable CNN model with regularization and a digital j=1
image processing unit with thresholding and segmentation.
To limit overfitting of input data, CNN uses a subsampling
The proposed architecture accurately identified forest fires
layer, called the pooling layer, which aggregates or extracts
with 98.10% sensitivity and fairly low specificity of 87.09%,
the maximum value from its input. A CNN model’s only
surpassing the performance demonstrated by complicated al-
requirement is that the first layer should be an input layer,
gorithms.
while the last should be the output layer.
The paper’s remaining is organized as follows: Section III
Separable convolution neural networks is a variant of CNN.
presents a comprehensive overview of the study’s dataset and
Unlike a regular CNN architecture, a separable CNN architec-
illustrates the study background. Section IV highlights the
ture models the spatial features (like shapes) and cross-channel
methods adopted in this study for forest fire classification.
features (like detailed patterns) separately for each available
Section V illustrates the simulation results and performance
color channel. Spatial feature extraction in a separable CNN
evaluation metrics. Finally, concluding remarks and future
is performed using depthwise convolutional operation and can
implications are discussed in Section VI.
be mathematically expressed as in (2), while the extraction of
III. DATA D ESCRIPTION AND S TUDY BACKGROUND cross-channel features is performed using pointwise convolu-
The study’s dataset was obtained from IEEE DataPort, tion operation shown in (3):
which contains UAV captured aerial imageries from prescribed Yi = bi + Wj · Ki ; i, j = 1, 2, · · · , NI (2)
pile burn from Northern Arizonian pine forest in the US K
[14]. The contributing authors assembled the static repositories
X 1 X
yij = ωk ylj−1 (3)
containing training/validation dataset and test dataset images |Ωi (k)|
k=1 pl ωi (k)
by preprocessing the in-situ video data. Typical to the geo-
Since image features are analyzed using two different proce-
graphical location [15], [16], the dataset images offer a vast
dures simultaneously, as mentioned in (2) and (3), separable
range of natural backdrops such as dense undergrowth, scrubs,
CNN is computationally faster and show better classification
lakes, riverside, sunsets with variations with and without snow.
properties than the existing CNN architectures [12]. An ex-
Geographical features play a crucial role in predictive cellent explanation on the working theory of depthwise and
modelling [17]. The knowledge of topographic features from pointwise covolution is found in [19], [20].
B. Preview of Thresholding and Segmentation two separable convolution layers showed a recommendable
As the name suggests, segmentation is tasked to segregate performance in terms of validation accuracy and computational
or segment an object in an image. Similar objects often time against models with one, two, three, four, and six convo-
show uniformities in texture, color, and brightness. Such lution layers. The max-pooling layer followed by each of the
uniformities are used by segmentation to segregate objects in two convolution layers showed better results than the average
a raw image [21]–[23]. Thresholding is one of the standard pooling. The adopted model showed optimum performance
methods of segmentation. In a simple thresholding method, with a kernel size of 3X3 using the ReLU activation function.
also called binary thresholding, each pixel in an image is Activation functions add non-linearity between layers in a
compared to the given threshold value T . The pixel values neural network by determining which neuron will activate in
(pixelvalue ) smaller than the threshold T are set to zero, while a convolution layer. Mathematically, ReLU is a continuous
those equal to or greater than T are set to a user-specified function defined as ReLU (z) = max(0, z), with it’s derivative
maximum value. For multichannel (or colored) images, the being 0 for z < 0 [12].
same threshold value T is applied to each channel one at a In addition to selecting a less complicated neural network,
time. A single merged image consisting of thresholded images three types of regularization techniques were used in the
from individual channels is presented as the final thresholded model to significantly address the overfitting problem: L2
output. Mathematically, thresholding for a grayscale image regularization in each convolution, batch normalization, and
with only one color channel can be defined as in (4): a dropout layer before the output layer, along with data
augmentation and early stopping. Each of these regularization
(
0 , pixelvalue < T
pixelvalue = (4) techniques is explained below.
1 , otherwise
• The L2 regularization uses ridge regression, which aims
Another straightforward yet powerful object extraction method
to decrease the overall MSE. Theoretically, L2 regular-
entails mentioning a range of HSV values. Objects with
ization minimizes the MSE by decreasing the overall
similarities in color and brightness can be easily extracted
variance while introducing a small bias.
using the HSV colorspace thresholding method [22], [24].
• Batch normalization is usually included in a model to pre-
User-specified HSV values act as lower (Tmin ) and higher
vent vanishing/exploding gradient problems. As the name
(Tmax ) threshold values. Pixel values (pixelvalues ) that fall
suggests, the batch normalization technique normalizes
outside Tmin and Tmax are set to 0. Mathematically, object
and zero-centers each input (6c), followed by scaling and
detection for grayscale images using HSV colorspace can be
shifting the result, which helps the algorithm to learn the
expressed using (5):
( optimal mean and scale for an input layer (6d) [12]. The
pixelvalue , Tmin ≤ pixelvalue ≤ Tmax batch normalization algorithm is expressed as in (6a-d):
pixelvalue = m
0 , otherwise 1 X (i)
(5) µB = x (6a)
m i=1
C. Methods Adopted m
2 1 X (i)
Variants of well known CNN architectures like AlexNet, σB = (x − µB )2 (6b)
m i=1
Lenet, Efficient Net, and ResNet was initially implemented and
compared in this study. The complexity of the architectures (x(i) − µB )
x̂(i) = p 2 (6c)
mentioned above facilitated a higher learning rate on the σB +
training set (99% training accuracy) but could not yield an
z (i) = γ x̂(i) + β (6d)
accuracy above 79% on the test set. Root-cause analysis of
such discrepancies in accuracies indicated overfitting. Further Due to the state-of-the-art iterative nature of the algo-
analysis of the misclassified images for both fire and no fire rithm, adding batch normalization can increase model
classes revealed the presence of smoke and fog. As the color complexity and slow down predictions. However, in the
and saturation of a raw image can be primarily affected by process, the learning process is sped up [25] and ulti-
smoke or fog, misclassification seemed justified. This section’s mately imposes a regularizing effect that reduce overfit-
remainder gives a detailed overview of the methods adopted ting. A more detailed overview of batch normalization is
to address the overfitting and smoke/fog problem. available in [12].
• Finally, a drop out layer is added just before the output
1) Measures Taken Against Overfitting: Instead of having a layer. The dropout layer facilitates a more robust model,
complicated model with many convolution layers, simple CNN which can generalize better. The hyperparameter (p) in a
models with regular two-dimensional convolution network was dropout layer suggests the probability of a neuron being
compared to simple models with separable convolutional lay- dropped at this layer is p for every step during training
ers. The finalization of the CNN model was based on perfor- [12].
mance metrics: AUC, accuracy, precision, recall or sensitivity, • Data augmentation is yet another technique to prevent
and specificity against the test set. The simpler model with overfitting and improve model performance in imbal-
Fig. 1: Architecture of the proposed simple separable convolution neural network
anced classes [26]–[28]. Data augmentation artificially
increases the training set’s size by creating randomly
modified data from the existing ones and benefits the
learning process by providing more data to be trained
on.
• Early stopping regularizes iterative learning by inspecting
the validation error or accuracy. It stops the learning pro- (a) Raw digital image files
cess from the training set as soon as the validation error
starts to increase or the validation accuracy decreases.
The detailed construction of the proposed separable CNN
architecture combined with measures to mitigate overfitting
is shown in Fig. 1.
(b) Applying multichannel binary thresholding
2) Measures Against Smoke and Fog: Thresholding and
segmentation were effective against the smoke and fog mis-
classification problem. The digital image processing technique
proposed in this paper includes two steps, multichannel binary
thresholding followed by segmentation using two sets of
HSV colorspace filters. Fig. 2 shows how thresholding and
(c) Applying HSV colorspace thresholding
segmentation helped extract a desired part of the raw image.
Fig. 2(a) shows two raw images, one with two pile burns Fig. 2: Digital image processing on raw in-situ field data
or brushfire and another without a nearby pile burn. Fig.
2(b) represents multichannel binary thresholding output. A V. S IMULATION R ESULTS AND D ISCUSSION
threshold value of 100 is considered, while the maximum value This section delivers a detailed overview of the proposed
is set to 255. Fig. 2(c) is the final output from the digital image model metrics and evaluates model performance using avail-
processing. Two sets of HSV colorspace filters are used to able optimizers in Keras TensorFlow such as Nadam, Adam,
extract the pile burn region. The decision of fire and no fire is Adagrad, SGD, Ftrl, and RMSprop. Further, a quick demon-
based on whether the final output image contained any pixel stration on the tuning of hyperparameter such as the batch size
values other than 0 (0 defines the color black), as observed in and the learning rate is presented.
the case of fire.
only be guaranteed when assessed within a similar topograph- Falls, OR. Gen. Tech. Rep PSW-GTR-198. Albany, CA: Pacific South-
ical location. west Research Station, Forest Service, U.S. Department of Agriculture:
1-32.
R EFERENCES [16] Moir, William H.; Geils, Brian W.; Benoit, Mary Ann; Scurlock, Dan.
1997. “Ecology of Southwestern Ponderosa Pine Forests”. In: Block,
[1] T. J. Brown, B. L. Hall and A. L. Westerling, “The Impact of Twenty- William M.; Finch, Deborah M. (Tech. eds.). Songbird ecology in
First Century Climate Change on Wildland Fire Danger in the Western southwestern ponderosa pine forests: a literature review. Gen. Tech.
United States: An Applications Perspective,” Climatic Change, 62 (1-3), Rep. RM-292. Fort Collins, CO: U.S. Department of Agriculture, Forest
pp. 365-388, 2004. Service, Rocky Mountain Forest and Range Experiment Station. p. 3-27.
[2] T. Wang, A. Li, W. Xu, J. Yang and Z. Zhang, “The Applied Research on [17] E. Maxwell, P. Pourmohammadi, and J. D. Poyner, “Mapping the
WUI Fire Risk Prevention and Control,” in 2020 IEEE 10th International Topographic Features of Mining-Related Valley Fills Using Mask R-
Conference on Electronics Information and Emergency Communication CNN Deep Learning and Digital Elevation Data,” Remote Sensing, vol.
(ICEIEC), Beijing, China, 2020. 12, no. 3, p. 547, Feb. 2020.
[3] J. W. Muhs, M. Parvania and M. Shahidehpour, “Wildfire Risk Miti- [18] Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017).
gation: A Paradigm Shift in Power Systems Planning and Operation,” “Understanding of a Convolutional Neural Network”, 2017
IEEE Open Access Journal of Power and Energy, vol. 7, pp. 366-375, International Conference on Engineering and Technology (ICET).
2020. doi:10.1109/icengtechnol.2017.8308186
[4] S. Ghosh and S. Dutta, “A Comprehensive Forecasting, Risk Modelling [19] Dang, L., Pang, P., & Lee, J. (2020). “Depth-Wise Separable Convolution
and Optimization Framework for Electric Grid Hardening and Wildfire Neural Network with Residual Connection for Hyperspectral Image
Prevention in the US,” International Journal of Energy Engineering, vol. Classification”. Remote Sensing, 12(20), 3408. doi:10.3390/rs12203408
10, no. 3, pp. 80-89, 2020. [20] B. Hua, M. Tran and S. Yeung, “Pointwise Convolutional Neural
[5] N. Rhodes, L. Ntaimo and L. Roald, ”Balancing Wildfire Risk and Power Networks,” 2018 IEEE/CVF Conference on Computer Vision and
Outages through Optimized Power Shut-Offs,” IEEE Transactions on Pattern Recognition, Salt Lake City, UT, 2018, pp. 984-993, doi:
Power Systems, no. doi: 10.1109/TPWRS.2020.3046796, 2020. 10.1109/CVPR.2018.00109.
[6] J. Shi, W. Wang, Y. Gao and N. Yu, “Optimal Placement and Intelligent [21] R. Szeliski, Computer Vision: Algorithms and Applications, London:
Smoke Detection Algorithm for Wildfire-Monitoring Cameras,” IEEE Springer, 2020.
Access, vol. 8, pp. 72326-72339, 2020. [22] E. R. Davies, Computer and Machine Vision, 4. Edition, Ed., San Diego:
[7] W. Lee, S. Kim, Y.-T. Lee, H.-W. Lee and M. Choi, “Deep Neural Elsevier Science Publishing Co Inc, 2012.
Networks for Wild Fire Detection with Unmanned Aerial Vehicle,” in [23] Y. Zhang and Z. Xia, “Research on the Image Segmentation Based on
2017 IEEE International Conference on Consumer Electronics (ICCE), Improved Threshold Extractions,” 2018 IEEE 3rd International Confer-
Las Vegas, NV, USA, 2017. ence on Cloud Computing and Internet of Things (CCIOT), Dalian,
[8] A. Shamsoshoara, F. Afghah, A. Razi, L. Zheng, P. Z. Ful´e and E. China, 2018, pp. 386-389, doi: 10.1109/CCIOT45285.2018.9032505.
Blasch, “Aerial Imagery Pile burn detection using Deep Learning: the [24] N. B. Tran, M. A. Tanase, L. T. Bennett, C. Aponte, “Fire-severity
FLAME dataset,” arXiv preprint arXiv:2012.14036, 2020. Classification Across Temperate Australian forests: Random Forests
[9] V. Yaloveha, D. Hlavcheva and A. Podoro, “Fire Hazard Research of Versus Spectral Index Thresholding,” Proc. SPIE 11149, Remote Sensing
Forest Areas based on the use of Convolutional and Capsule Neural for Agriculture, Ecosystems, and Hydrology XXI, 111490U (21 October
Networks,” in 2019 IEEE 2nd Ukraine Conference on Electrical and 2019); doi: 10.1117/12.2535616
Computer Engineering (UKRCON), Lviv, Ukraine, 2019. [25] Leslie N. Smith “A Disciplined Approach to Neural Network Hyper-
[10] O. Ghorbanzadeh, T. Blaschke, K. Gholamnia, and J. Aryal, “Forest Parameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight
Fire Susceptibility and Risk Mapping Using Social/Infrastructural Vul- Decay”, arXiv preprint arXiv:1803.09820(2018).
nerability and Environmental Variables,” Fire, vol. 2, no. 3, p. 50, Sep. [26] E.D. Cubak, B. Zoph, D. Mané, V.Vasudevan and Q.V.Le, “AutoAug-
2019. ment Learning Augmentation Stategies From Data”, 2019 IEEE/CVF
[11] Zhang, G., Wang, M. & Liu, K. “Forest Fire Susceptibility Conference on Computer Vision and Pattern Recognition (CVPR), Long
Modelling Using a Convolutional Neural Network for Yunnan Beach, CA, USA, 2019, pp. 113-123, doi: 10.1109/CVPR.2019.00020.
Province of China.” Int J Disaster Risk Sci 10, 366-403 (2019). [27] Shorten, C., & Khoshgoftaar, T. M. (2019). “A survey on Image
https://doi.org//10/1007/s13753-019-00233-1 Data Augmentation for Deep Learning”, Journal of Big Data, 6(1).
[12] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and doi:10.1186/s40537-019-0197-0
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Sys- [28] Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. (2016).
tems, Sebastopol, California: O’Reilly, 2019. “Understanding Data Augmentation for Classification: When to Warp?”,
[13] ImageNet, “Large Scale Visual Recognition Challenge (ILSVRC),” 2016 International Conference on Digital Image Computing: Techniques
ImageNet, 2010. [Online]. Available: http://www.image- and Applications (DICTA). doi:10.1109/dicta.2016.7797091
net.org/challenges/LSVRC/. [Accessed 12 2020].
[14] Alireza Shamsoshoara, Fatemeh Afghah, Abolfazl Razi, Liming Zheng,
Peter Fulé, Erik Blasch, November 19, 2020, ”The FLAME dataset:
Aerial Imagery Pile burn detection using drones (UAVs)”, IEEE Data-
Port, doi: https://dx.doi.org/10.21227/qad6-r683.
[15] Graham, Russell T.; Jain, Theresa B. 2005. “Ponderosa Pine Ecosys-
tems”. In: Ritchie, Martin W.; Maguire, Douglas A.; Youngblood, An-
drew, tech. coordinators. Proceedings of the Symposium on Ponderosa
Pine: Issues, Trends, and Management, 2004 October 18-21, Klamath