UAV Aerial Image-Based Forest Fire Detection Using Deep Learning KEDDOUS AKILA 2.0
UAV Aerial Image-Based Forest Fire Detection Using Deep Learning KEDDOUS AKILA 2.0
Manuscript
In partial fulfillment of the requirements for
The Degree of Master in Aeronautics
Specialty: Avionics
A Thesis Submitted by
Akila Keddous
Blida 2020-2021
Acknowledgments
This thesis is submitted in partial fulfillment of the requirements for the degree of Master
at the University of Saad Dahleb Blida 01. Under the supervision of Prof. Lagha Mohand and
Dr. Choutri Kheireddine, the presented thesis work is carried out at the laboratory of
Aeronautical Sciences at the institute of Aeronautics and space studies (IAES).
My thanks go in the first place to ALLAH Almighty who has illuminated my path with
the glow of knowledge and science and for the will, the health, and patience that he lavished on
me during all these years of study.
Fortunately, I have been given such great opportunities to collaborate with and be
significantly benefited by many outstanding people during my whole Final year project. In
particular, my deepest gratitude should go to my supervisors, Dr. Choutri Kheireddine who has
provided me with these valuable chances and resources allowing me to successfully handle
various challenges and difficulties in different critical phases of my project. Without his
consistent support and guidance, I cannot achieve such results. and Prof. Lagha Mohand for
providing useful feedback, strong support and for his positive attitude to both work and life, it
is really my great privilege being able to work under the supervision of such passionate people
with great diligence to research works.
I am also very grateful to the Drone girls Group: Yasmine, Nihad, Nouha, Fairouz thank
you for your great help, from the discussion of challenges in the project and paper writing,
through experimental and algorithm validation, to the problems that I met in my daily life. It is
my great pleasure to conduct research works with them, share opinions with them, and learn
from them.
I would like to thank my best friends: Amina, Kaouter, Chaima, Merieme, Rayane,
Souhila and Nesrine thank you for always being there for me and encouraging me through the
tough times you really made my study years more joyful. My good friend Merouane, words
cannot express how thankful I am for the help that you ‘ve provided.
Finally, I would like to dedicate this work to my parents Nacir and Cherifa, my two
brothers Hamza and Mustapha for their continuous support, for always pushing me forward and
for providing me with everything I need. Thank you for always believing in me, without your
encouragement, I would not have been here today.
i
Abstract
Forest fires are very dangerous. Once they become widespread, it is very difficult to extinguish.
In this work, an Unmanned aerial vehicle (UAV) image-based Real-time Forest fire detection
approach is proposed. Where we took advantage of recent development in computer vision
systems and the rapid maneuverability of Unmanned Aerial Vehicles to improve the
performance of the Real time detection, we designed and implemented a YOLOv2
Convolutional Neural Network Model in MATLAB to train on an aerial dataset, Experimental
results show that our proposed system has high detection performance, and its detection speed
reaches 58 Frame Per Second with a mean average precision of 0.87, thereby satisfying the
requirements of real-time detection (Speed and Accuracy).
Key Words: Real-Time Fire Detection, Deep Learning, Convolutional Neural Network,
Computer Vision, Unmanned Aerial Vehicles, Fire Datasets, YOLOv2.
Résumé
Les feux de forêt sont très dangereux. Une fois qu'ils se sont répandus, il est très difficile de les
éteindre. Dans ce travail, nous avons proposé une approche de détection des incendies de forêt
en temps réel basée sur les images obtenues des véhicules aériens sans pilote (UAV). Pour
améliorer les performances de la détection en temps réel, nous avons profité du développement
récent des systèmes de vision par ordinateur et de la maniabilité rapide des véhicules aériens
sans pilote où nous avons conçu et mis en œuvre un modèle de réseau neuronal convolutif basé
sur YOLOv2 architecture dans MATLAB pour s’entrainer sur une base de données aériennes.
Les résultats expérimentaux montrent que notre système proposé présente une performance de
détection élevée et que sa vitesse de détection atteint 58 image par seconde avec une précision
moyenne de 0,87, satisfaisant ainsi les exigences de détection en temps réel (Vitesse et
Précision).
ii
ملخص
حرائق الغابات خطيرة للغاية .بمجرد أن تصبح واسعة اإلنتشار ،من الصعب جدًا إخمادها .في هذا العمل ،تم اقتراح نهج
للكشف عن حرائق الغابات بشكل آني بواسطة الصورة المقدمة من قبل الطائرات بدون طيار .حيث استفدنا من التطورات
األخيرة في أنظمة رؤية الكمبيوتر والقدرة على المناورة السريعة للطائرات بدون طيار لتحسين أداء الكشف اآلني ،قمنا
بتصميم و تنفيذ نموذج الشبكة العصبية التالفيفية متركزة على هندسة ال YOLOv2في MATLABللتدريب على
مجموعة بيانات جوية ،تظهر النتائج التجريبية أن نظامنا المقترح يتمتع بأداء كشف عالي ،وسرعة اكتشافه تصل إلى 58
إطارا في الثانية بمتوسط دقة ، 0.87والتي تلبي متطلبات الكشف اآلني.
ً
كلمات مفتاحية :الكشف اآلني,التعلم العميق,الشبكات العصبية التالفيفية,الرؤية بواسطة الكمبيوتر,مجموعة بيانات
الحرائق,طائرات بدون طيار.YOLOv2,
iii
Contents
Acknowledgments ....................................................................................................................... i
Abstract ...................................................................................................................................... ii
List of Figures .......................................................................................................................... vii
List of Tables ............................................................................................................................. ix
Acronyms ................................................................................................................................... x
General Introduction .................................................................................................................. 1
Chapter I: Introduction to Forest Fire Detection ........................................................................ 4
1. Introduction: .................................................................................................................... 5
2. Statistics on Forest Fire ................................................................................................... 5
2.1. Forest fire Worldwide: ............................................................................................. 5
2.2. Forest fire in Algeria: ............................................................................................... 7
3. Forest Fire Detection Existing Methods: ...................................................................... 10
3.1. Using Sensors: ....................................................................................................... 10
3.2. Using Computer Vision ......................................................................................... 10
3.3. Using Deep Learning: ............................................................................................ 11
4. Forest Fire Detections Existing Systems ....................................................................... 11
4.1. Terrestrial Systems................................................................................................. 11
4.2. Unmanned Aerial Vehicles Systems ...................................................................... 11
4.3. Space borne (Satellite) Systems ............................................................................. 13
5. State of the Art .............................................................................................................. 14
5.1. Computer Vision-Based Traditional Approach ..................................................... 14
5.2. Computer Vision-Based Deep Learning Approach ............................................... 15
6. Conclusion:.................................................................................................................... 16
Chapter II: Deep Learning for Computer Vision ..................................................................... 17
1. Introduction ................................................................................................................... 18
2. Artificial Intelligence ,Machine Learning and Deep Learning ..................................... 18
2.1. Artificial Intelligence ............................................................................................. 18
2.2. Machine Learning............................................................................................. 19
2.3. Deep Learning : ...................................................................................................... 21
3. Convolutional Neural Network ..................................................................................... 22
3.1. Definition ............................................................................................................... 22
3.2. CNNs Principal of Operation ................................................................................. 22
iv
4. Computer vision: ........................................................................................................... 24
4.1. Definition ............................................................................................................... 24
4.2. The Evolution of Computer Vision ........................................................................ 25
4.3. Computer Vision with Machine Learning Approach ............................................. 26
4.4. Computer Vision with Deep Learning Approach ................................................... 27
5. Object Recognition ........................................................................................................ 27
5.1. Definition ............................................................................................................... 27
5.2. Object Recognition Computer Vision Tasks .......................................................... 27
5.3. Comparison between Single Object Localization and Object Detection: .............. 29
5.4. Popular families of object recognition deep learning Models ............................... 30
6. Conclusion:.................................................................................................................... 35
Chapter III: Forest Fire Detection Model Using YOLOv2 ...................................................... 36
1. Introduction: .................................................................................................................. 37
2. Fire Detection Dataset ................................................................................................... 37
2.1 Popular Fire Detection Datasets:............................................................................ 37
2.2 Building our Personalized Dataset: ........................................................................ 42
3. Fire Detection Model: ................................................................................................... 52
3.1 YOLOv2 Network Architecture: ........................................................................... 53
3.2 YOLOv2 Proposed Network Design: .................................................................... 55
3.3 YOLOv2 Network Training Process: .................................................................... 58
3.4Training Evaluation: .................................................................................................... 61
4. Conclusion:.................................................................................................................... 63
Chapter IV: Real-Time Model Evaluation and Testing .......................................................... 64
1. Introduction: .................................................................................................................. 65
2. Testing the Trained Model: ........................................................................................... 65
2.1. Test Set: ................................................................................................................. 65
2.2. Test Detection Scenarios: ...................................................................................... 66
2.3. Test Detection Results: .......................................................................................... 67
3. Model Evaluation: ......................................................................................................... 71
3.1 Evaluation Metrics: ................................................................................................ 72
3.2 Evaluation Results: ................................................................................................ 73
3.3 Results Discussion : ............................................................................................... 75
4. Real Time Detection Testing......................................................................................... 76
4.1 Conducted experiment: .......................................................................................... 76
4.2 Test Results: ........................................................................................................... 77
v
4.3 Results Discussion: ................................................................................................ 78
5. Conclusion ..................................................................................................................... 79
Conclusion and Perspectives .................................................................................................... 80
References ................................................................................................................................ 83
vi
List of Figures
Chapter1
Figure I. 1:Annual evolution in areas covered by fire in Algeria (period 1876 -2012) [20]. ..... 8
Figure I. 2:Average annual fire risk in Algeria [20]................................................................... 9
Figure I. 3:Generalized multispectral imaging systems for early fire detection [1]. ............... 14
Figure I. 4:Schematic Illustration of the UAV Based Forest Fire Detection System [4]. ........ 14
Chapter2
Chapter3
vii
Figure III. 6:Lebel Summary Graph of Manual Labeled Datasets. .......................................... 46
Figure III. 7:Color Thresholder App interface. ........................................................................ 47
Figure III. 8:Binary Fire Mask. ................................................................................................ 48
Figure III. 9:Segmentation Fire Mask ...................................................................................... 48
Figure III. 10:Loading dataset to run the automatic custom algorithm.. .................................. 48
Figure III. 11:Sample frames of fire bounding boxes automated labeling results. .................. 49
Figure III. 12:FLAME Dataset Lebel Summary Graph. .......................................................... 50
Figure III. 13:Ground Truth bounding boxes coordinates tables. ............................................ 50
Figure III. 14:Ground Truth bounding boxes coordinates Sample. ......................................... 51
Figure III. 15:Sample Frames from ALL New Dataset for Ground Truth testing. .................. 51
Figure III. 16:Sample Frames from Binary Mask testing results. ............................................ 52
Figure III. 17:Sample Frames from Bounding Box testing results. ......................................... 52
Figure III. 18:The YOLO network architecture [25] ............................................................... 54
Figure III. 19:Network Architecture Layers. ........................................................................... 56
.Figure III. 20:Network Layers Analysis ................................................................................. 57
Figure III. 21:YOLOv2 Detector Properties ............................................................................ 60
Figure III. 22:Training Progress’s information for the first 10 epochs. ................................... 61
Figure III. 23:Training Progress’s information for the last epochs. ......................................... 61
Figure III. 24:Neural Network Loss Visualization [43]. .......................................................... 62
Figure III. 25:Training Loss Graph .......................................................................................... 63
Chapter4
viii
List of Tables
Chapter1
Table I.1:Total reported fire statistical Data by country, 1993-2018 [6]. .................................. 6
Table I. 2:Common indicators of fire statistics in the countries of the world in 2018 [6]. ........ 7
Chapter3
Chapter4
ix
Acronyms
UAV Unmanned Aerial Vehicle
LIDAR Light Detection and Ranging
IR Infrared
CCD Charge Coupled Device
NN Neural Network
CNN Convolutional Neural Network
R-CNN Recurrent Convolutional Neural Network
RNN Recurrent Neural Network
ANN Artificial Neural Network
DNN Deep Neural Network
LTSM Long Term Semantic Memory
SVM Support Vector Machines
RELU Rectified Linear Unit
AI Artificial Intelligent
ML Machine Learning
DL Deep Learning
FLAME Fire Luminosity Airborne-Based Machine Learning Evaluation
YOLO You Only Look Once
YOLOv2 You Only Look Once Version 2
VGG Visual Geometry Goupe
RGB Red, Green, Blue
ROI Region of Interest
ADAM Adaptive Moment Estimation
CPU Central Processing Unit
GPU Graphics Processing Unit
IOU Intersection Over Union
TP True Positive
x
FP False Positive
FN False Negative
TN True Negative
mAP Mean Average Precision
mr Miss Rate
fppi False Positive Per Image
FPS Frame Per Second
GB Gigabyte
GHZ Gigahertz
LAMR Log Average Miss Rate
xi
General Introduction
1
Threat to people’s lives and property caused by fires has become increasingly serious in
particular, the increasing large-scale forest fire events around the world have made the
development of automatic wildfire detection as an urgent need for fire warning systems.
Traditionally, forest fires were mainly detected by human observation; however, this approach
is inefficient, as it is prone to human error and fatigue. On the other hand, conventional sensors
for the detection of heat, smoke, flame, and gas typically take time for the particles to reach the
point of sensors and activate them., hence, a large number of sensors need to be installed to
cover large areas, in order to tackle these disadvantages more advanced automatic forest fire
detection methods are developed using satellites, ground-based equipment’s, and
manned/unmanned aerial vehicles (UAVs). In particular, recent advances in aerial monitoring
systems can provide first responders and operational forces with more accurate data on fire
behavior for enhanced fire management, the use of unmanned aerial vehicles UAVs for fire
monitoring is gaining more attraction in recent years. moreover, UAVs offer new features and
convenience including fast deployment, high maneuverability, wider and adjustable viewpoints,
and less human intervention. Recent studies investigated the use of UAVs in disaster relief
scenarios and operations such as wildfires and floods, particularly as a temporary solution when
terrestrial networks fail due to damaged infrastructures, communication problems, spectrum
scarcity, or coalition.
As the technology of fire detection by the camera has been gradually applied with the
development of computer vision technology and artificial intelligence, deep learning vision-
based detection techniques can supply intuitive and real-time data, detect wide range objects,
and make record conveniently, it has become a crucial element in the UAV-based forest fire
detection systems.
How can we build a UAV-based forest fire detector using deep learning technologies? How can
we balance between the detection performance and real-time requirements?
In order to provide answers to these questions, we have structured our work in four chapters:
Chapter1: in this chapter we first introduce the Problem of forest fire detection and its
statistics, we then expose the different methods and systems that are used for detection and
discuss the previous conducted work in this field (State of the Art).
2
Chapter2: the second chapter is devoted to present generalities about deep learning
approach in computer vision, as well as notions about object detection and popular
convolutional neural networks models and architectures.
Chapter3: in the third chapter we discussed the conducted process of crafting a deep
computer vision model, from dataset creation and labeling to YOLOv2 network design and
training.
Chapter 4: in this chapter we try to test our built detector and evaluate its performance
for real-time fire detection.
We will end this brief with a general conclusion which summarizes the main ideas that we have
detailed and we will propose recommendations for the continuity of this work (perspectives).
3
Chapter I: Introduction to Forest
Fire Detection
4
Chapter 1: Introduction to Forest Fire Detection
1. Introduction:
Forest fires can potentially result in a great number of environmental disasters, causing vast
economic and ecological losses as well as endangering human lives. In order to preserve natural
resources and protect human safety and properties, forest fire detection and accurate monitoring
of the disturbance type, size, and impact over large areas is becoming increasingly important
and attracting an enormous interest around the world.
Recent advances in computer vision, machine learning and deep learning offer new tools for
detecting and monitoring forest fires, while the development of new materials and
microelectronics have allowed sensors to be more efficient in identifying active forest fires.
In this chapter we will try to cover up some statistics about forest fires, and the utilized methods
and systems for detection, we will also take an overview of the state of the art to get an idea on
the work done on this detection problem.
Table I .1 below shows total fire statistics from 1993 to 2018 for 27-57 countries, collectively
representing 0.9-3.8 billion inhabitants of the Earth, depending on the year of reporting. In these
countries, 2.5-4.5million fires and 17-62 thousand fire deaths were reported to fire services,
depending on the year. [6]
5
Chapter 1: Introduction to Forest Fire Detection
Table I .2 below shows that in 46 countries, representing 2.7 bln inhabitants, 36% of the
World’s population, 50 million calls (18.1 calls per 1000 inh.) 4,6 million fires(9.2% of all calls,
1.7 fires per 1000 inh.) 30.8 thousand civilian fire deaths (1.1 firedeaths per 100 thous inh.) and
51.3 thousand civilian fire injuries (1.9 fire injuries per 100 thous inh.) reported by fire services
in 2018 [6].
6
Chapter 1: Introduction to Forest Fire Detection
Number of Average
number:
Population, Per 1000 inh.: fire deaths per: fire injuries per:
Country fire fire
thous.inh. calls fires
№ deaths injuries
100000 100
calls fires 100 fires 100000 inh.
inh. fires
1 India 1 359 000 - 1 600 000 12 747 - - 1,2 0,9 0,8 - -
2 USA 327 167 36 746 500 1 318 500 3 655 15 200 112,3 4,0 1,1 0,3 4,6 1,2
3 Bangladesh 162 951 - 19 642 130 664 - 0,1 0,1 0,7 0,4 3,4
4 Russia 146 781 760 653 144 199 7 913 9 650 5,2 1,0 5,4 5,5 6,6 6,7
5 Philippines 106 700 - 16 675 326 - - 0,2 0,3 2,0 - -
6 Vietnam 95 990 - 4 182 90 208 - 0,0 0,1 2,2 0,2 5,0
7 France 66 628 4 942 906 305 500 262 1 282 74,2 4,6 0,4 0,1 1,9 0,4
8 Great Britain 64 553 695 101 204 525 400 8 944 10,8 3,2 0,6 0,2 13,9 4,4
9 Italy 61 000 908 887 213 116 - - 14,9 3,5 - - - -
Republic of
10 51 629 2 656 700 42 338 369 2 225 51,5 0,8 0,7 0,9 4,3 5,3
Korea
11 Ukraine 42 270 230 952 78 602 1 967 1 516 5,5 1,9 4,7 2,5 3,6 1,9
12 Poland 38 411 502 200 149 434 527 4 335 13,1 3,9 1,4 0,4 11,3 2,9
13 Peru 32 000 121 998 13 729 - - 3,8 0,4 - - - -
14 Kazakhstan 18 611 55 102 14 557 434 412 3,0 0,8 2,3 3,0 2,2 2,8
15 Netherlands 17 181 148 900 76 020 52 - 8,7 4,4 0,3 0,1 - -
16 Greece 10 788 65 298 24 459 131 187 6,1 2,3 1,2 0,5 1,7 0,8
Czech
17 10 650 - 20 720 100 1 466 - 1,9 0,9 0,5 13,8 7,1
Republic
18 Jordan 10 378 56 326 24 490 24 1 058 5,4 2,4 0,2 0,1 10,2 4,3
19 Sweden 10 230 133 955 31 376 73 390 6,7 1,9 1,0 0,2 3,8 1,2
20 Hungary 9 778 68 337 19 355 106 832 7,0 2,0 1,1 0,5 8,5 4,3
21 Belarus 9 475 52 974 6 435 525 311 5,6 0,7 5,5 8,2 3,3 4,8
22 Austria 8 837 278 672 43 554 - - 31,5 4,9 - - - -
23 Switzerland 8 500 77 304 13 178 - - 9,1 1,6 - - - -
24 Bulgaria 7 050 56 120 29 448 145 285 8,0 4,2 2,1 0,5 4,0 1,0
25 Denmark 5 786 42 876 15 081 71 - 7,4 2,6 1,2 0,5 - -
26 Singapore 5 612 191 492 3 885 4 90 34,1 0,7 0,1 0,1 1,6 2,3
27 Kyrgyzstan 5 522 - 4 808 59 54 - 0,9 1,1 1,2 1,0 1,1
28 Finland 5 483 113 464 14 264 58 670 20,7 2,6 1,1 0,4 12,2 4,7
29 Slovakia 5 450 31 326 9 288 49 194 5,7 1,7 0,9 0,5 3,6 2,1
30 Costa Rica 4 973 41 881 23 862 30 73 8,4 4,8 0,6 0,1 1,5 0,3
31 Ireland 4 920 242 631 28 534 18 - 49,3 5,8 0,4 0,1 - -
32 New Zealand 4 748 82 136 18 580 - - 17,3 3,9 - - - -
33 Oman 4 298 - 4 602 - - - 1,1 - - - -
34 Croatia 4 087 22 927 9 968 23 117 5,6 2,4 0,6 0,2 2,9 1,2
35 Mongolia 3 238 - 3 612 64 - - 1,1 2,0 1,8 - -
36 Lithuania 2 848 22 142 11 848 315 387 7,8 4,2 11,1 2,7 13,6 3,3
37 Qatar 2 839 3 125 1 922 2 115 1,1 0,7 0,1 0,1 4,1 6,0
38 Slovenia 2 081 153 313 4 119 7 284 73,7 2,0 0,3 0,2 13,6 6,9
39 Latvia 1 950 - 9 134 81 301 - 4,7 4,2 0,9 15,4 3,3
40 Estonia 1 317 26 163 5 353 50 100 19,9 4,1 3,8 0,9 7,6 1,9
41 Mauritius 1 300 12 634 6 664 - - 9,7 5,1 - - - -
42 Bhutan 817 - 100 3 0 - 0,1 0,4 3,0 0,0 0,0
43 Luxemburg 602 61 157 2 228 0 - 101,6 3,7 0,0 0,0 - -
44 Brunei 442 - 1 249 2 1 - 2,8 0,5 0,2 0,2 0,1
45 Barbados 277 - 1 925 - - - 6,9 - - - -
46 Liechtenstein 38 - 42 0 0 - 1,1 - - - -
Total 2 745 186 49 606 152 4 595 102 30 812 51 351 18,1 1,7 1,1 0,7 1,9 1,1
Table I. 2:Common indicators of fire statistics in the countries of the world in 2018 [6].
7
Chapter 1: Introduction to Forest Fire Detection
Figure I. 1:Annual evolution in areas covered by fire in Algeria (period 1876 -2012) [20].
From 1985 to 2012, the annual evolution reveals the variability of forest fires and especially
the existence of catastrophic years in 1993, 1994, 2000, 2007 and 2012, While a large majority
of fires (82.1%) are brought under control before the area covered exceeds 10 ha, 3.2% of fires
affect areas of more than 100 ha, of which 0.6% are greater than 500 ha. This last category
represents 272 fires (including 109 for 1994 alone) [20].
If we go back in time, we see that the colonial period was disastrous: a cumulative area of
3,506,942 ha was covered by fire, over a period of 87 years (1876-1962), i.e. an average of
41,258 ha / year. The catastrophic balances, of more than 100,000 ha / year (exceptionally more
than 150,000, even 200,000 ha), in 1881, 1892, 1894, 1902, 1913, 1919, 1956, 1957 and 1958,
mark dark years which generally coincide with troubled times [20].
After independence, the affected areas declined slightly, with an average of 35,315 ha/year
over the period 1963-2012. This did not prevent the occurrence of new dark years in 1965,
1967, 1971, 1977, 1978, 1993, 2000, 2007 and 2012. Three of them were particularly
catastrophic: 1983, 1994 and 2012 with 221,367 ha respectively. 271,598 ha and 99,061 ha
covered. These three years alone amount to nearly 600,000 ha of burnt areas, or 34% of the
total for the period 1963-2012. Such “out of the ordinary” surfaces can of course be favoured,
at least in large part, by climatic conditions very favourable to the outbreak and propagation of
fire, but they depend essentially on the human factor: prior disorder, poor management and
above all instability [20].
It has long been known that, in times of political issues, Algerian forests always pay a heavy
price for fires. Moreover, F. Ramade (1997) stigmatizes the political disorders which, as in
8
Chapter 1: Introduction to Forest Fire Detection
Algeria, have been "since 1992 at the origin of several fires which have devastated vast forests,
in particular in Kabylia" [20].
Spatially, we note that the risk of forest fires is mainly concentrated in the coastal wilayas
of northeast Algeria, from Tizi Ouzou to El Tarf (Figure I .2), corresponding to very wood and
hilly wilayas, with a high population density and a lack of land for urbanization.
We can say that Algeria is one of the countries where the problem of forest fires is relatively
unknown by the Scientific community: if in absolute value the areas burned remain remotely
modest compared to other countries around the Mediterranean, the scarcity of forests and threats
of desertification mean that these fires have a particularly disastrous. Algeria has 4.1 million
hectares of forests, or an afforestation rate of 1.76%. However, the close frequency of fires that
follow one another with a return interval of less than 10 years has a catastrophic ecological
impact.
The analysis of fires during the period 1985-2016, at the level of the 40 wilayas of Algeria
from North (the most wooded part) shows that 42,555 fires covered a total forest area of more
than 910,640 hectares [20].
9
Chapter 1: Introduction to Forest Fire Detection
The fire detection system's performance is determined by the fire pixel classifier, which
generates main areas on which the rest of the system operates. As a result, a highly accurate fire
pixel classifier with a low false detection rate is required. A video flame detection system that
uses background subtraction and color analysis to find probable flame regions on the video
frame and then uses a collection of fire extraction attributes, including color probability, to
distinguish between fire and non-fire objects.
10
Chapter 1: Introduction to Forest Fire Detection
Although some methods directly deal with fire pixel categorization, there are some that deal
with spatial variance, temporal variation, and contour variability of candidate blob regions.
In both greyscale and color video sequences, the fire pixel categorization can be evaluated.
3.3.Using Deep Learning:
Deep learning has recently been successfully used to a variety of disciplines, including image
object detection and classification, audio recognition, and natural language processing. To
increase performance, researchers have done a number of studies on fire detection using deep
learning.
The deep learning approach has several differences from the conventional computer vision-
based Fire detection. The first is that the features are not explored by an expert, but rather are
automatically captured in the network after training with a large amount of diverse training data.
Therefore, the effort to find the proper handcrafted features is shifted to designing a proper
network and preparing the training data [3].
Another distinction is that the detector/classifier can be obtained by training features in the same
neural network at the same time. As a result, with an efficient training method, the right network
structure becomes even more essential.
4.1.Terrestrial Systems
Terrestrial-based early detection systems consist of either individual sensors (fixed, PTZ, or
360_ cameras) or networks of ground sensors [1].to provide adequate visibility, these sensors
must be carefully placed. As a result, they're generally found in watchtowers, which are
structures built on high view points to monitor high-risk scenarios and can be utilized not just
for fire detection but also for verification and localization. There are two types of cameras used
for early fire detection: optical cameras and infrared cameras, which may gather data with
resolutions ranging from low to ultra-high for various fire detection scenarios.
Early detection devices that combine the two types have lately been introduced. Computer-
based solutions can process a large amount of data while retaining a high level of accuracy and
a low false alarm rate.
4.2.Unmanned Aerial Vehicles Systems
Terrestrial imaging systems can detect both flame and smoke, but in many cases, it is almost
Impossible to view, in a timely manner, the flames of a wildfire from a ground-based camera
11
Chapter 1: Introduction to Forest Fire Detection
or a mounted camera on a forest watchtower. To this aim, autonomous unmanned aerial vehicles
(UAVs) can provide a larger and more accurate view of the fire from above, even in regions
that are inaccessible or too risky for firefighting crews to operate in. Fixed or rotary-wing UAVs
cover a larger area and are more flexible, allowing for changes in monitoring area, although
they are subject to weather and have a limited flying length. Unmanned aerial vehicles (UAVs)
devoted to forest fire monitoring and detection are in high demand due to their rapid
manoeuvrability and improved personnel safety. A typical UAV-based forest fire surveillance
system is illustrated in Figure I. 2 Which is composed of a team of UAVs, different kinds of
on-board sensors, and a central ground station, The goal is to use UAVs to identify and track
flames, predict their spread, and provide real-time fire information to human firefighters, as
well as to use UAVs to suppress fires. The system may perform fire monitoring (search for a
prospective fire), detection (identify a potential fire and alert firefighting personnel), diagnosis
(calculate parameters of the fire position, extent, and evolution), and prognosis (predict the
outcome of the fire) (predict the fire propagation).
As can be observed, one of the most significant parts of the UAV-based forest fire monitoring
system is the computer vision-based fire detection technique. This is due to its multiple
advantages, including the ability to monitor a wide range of objects, provide intuitive and real-
time images, and conveniently record information.More specifically, charge-coupled device
(CCD) Cameras and infrared (IR) cameras are usually mounted on UAVs. Massive efforts have
been dedicated to the development of more effective image processing scheme for fire
detection.
Color and motion aspects in CCD camera visual images are typically used for fire detection.
However, in some outdoor applications, the use of CCD cameras is commonly considered to be
insufficiently durable and trustworthy.
Given highly complex, non-structured forest environments, the possibility of smoke covering
the fire, or the circumstance for analogues of fire such as reddish leaves swinging in the wind
and light reflections, the false fire alert rate is typically quite high.Due to the fact that IR images
can be obtained in either weak or no light situations, or smoke can be seen as transparent in IR
images, Even though IR cameras are more expensive than CCD cameras, they are extensively
used to capture monochromatic images in both the day and the night. The use of this successful
method is predicted to lower the rate of false fire alarms and improve the forest fire detection
system's adaptive capabilities in various operating conditions.
12
Chapter 1: Introduction to Forest Fire Detection
Figure I. 3:Generalized multispectral imaging systems for early fire detection [1].
13
Chapter 1: Introduction to Forest Fire Detection
Figure I.4:Schematic Illustration of the UAV Based Forest Fire Detection System [4].
14
Chapter 1: Introduction to Forest Fire Detection
Borges and Izquierdo [10] adopted the Bayes classifier to detect fires based on additional
features Such as the area, surface, and boundary of the fire area to colour. In addition, Foggia
[11] proposed a multi-expert system which combines the analysis results of a fire’s colour,
shape, and motion characteristics. Although insufficient, the supplementary features to colour,
including texture, shape, and optical flow, can reduce the false detections.
Nevertheless, these approaches require domain knowledge of fires in captured images essential
to Explore hand-crafted features and cannot reflect the information spatially and temporally
involved in Fire environments well. In addition, almost all methods using the conventional
approach only use a Still image or consecutive pairs of frames to detect fire. Therefore,
they only consider the short-term Dynamic behaviour of fire, whereas a fire has a longer-term
dynamic behaviour [3].
5.2.Computer Vision-Based Deep Learning Approach
In recent years, deep learning has been widely used in a myriad of computer vision applications
because of its high recognition capability. To the best of our knowledge, Gunay et al. [12] is
the first paper to use deep learning in dynamic texture recognition including wildfires. In early
deep Learning based forest detection, researchers designed blank convolutional neural networks
and trained them with collected or synthesized images. for example, Zhao et al [13]. Designed
a 15-layer Convolutional neural network (CNN) to detect the forest fire. What is more, to locate
the fire and Smoke in frames [5], Sebastien [14] proposed a fire detection network based on
CNN where the features are simultaneously learned with a Multilayer Perceptron (MLP)-type
neural net classifier by training.
Zhang et al. [15] also proposed a CNN-based fire detection method which is operated in a
cascaded fashion. In their method, the full image is first tested by the global image-level
classifier, and if a fire is detected, then a fine-grained patch classifier is used for precisely
localizing the fire patches.
Muhammad et al. [16] proposed a fire surveillance system based on a fine-tuned CNN fire
detector. This architecture is an efficient CNN architecture for fire detection, localization, and
semantic understanding of the scene of the fire inspired by the Squeeze Net architecture. In the
deep layer of CNN, a unit has a wide receptive field so that its activation can be treated as A
feature that contains a large area of context information. This is another advantage of the learned
Features with CNN for fire detection. Even though CNN showed over whelmingly superior
classification performance against traditional computer vision methods, locating objects has
been another problem. [3]
15
Chapter 1: Introduction to Forest Fire Detection
Although the CNN-based approaches provide excellent performance, it is hard to capture the
Dynamic behaviour of fire, which can be obtained by recursive-type neural networks (RNN).
LSTM Proposed by Hochreiter and Schmidhuber [17] is an RNN model that solves the
vanishing gradient Problem of RNN. LSTM can accumulate the temporal features for decision
making through the memory cells which preserve the internal states and the recurrent behaviour.
However, the number of recursions is usually limited, which makes it difficult to capture the
long-term dynamic behaviour necessary to make a decision. Therefore, special care must be
taken to consider the decision based on long-term behaviour with LSTM.
Recently, Hu et al. [18] used LSTM for fire detection, where the CNN features are extracted
from Optical flows of consecutive frames, and temporally accumulated in an LSTM network.
The final Decision is made based on the fusion of successive temporal features. Their approach,
however, computes the optical flow to prepare the input of CNN rather than directly using RGB
frames.
6. Conclusion:
We have devoted this chapter to the presentation of some generalities on Forest fire detection
systems and Methods while presenting a literature review on the most important work and
research papers that was published during the last decade, as we discussed above the use of a
deep learning method and a UAV based system has been considered as a powerful Approach
solution in the recent years. In order to detect forest fire with great precision and a decreasing
rate of false fire alarm, a model of neural networks is trained using a UAV Based Aerial Images
dataset. The next chapter is dedicated to present generalities about deep learning approach in
computer vision, as well as notions about object detection and CNNs.
16
Chapter II: Deep Learning for
Computer Vision
17
Chapter 2: Deep Learning for Computer Vision
1. Introduction
Deep Learning is a branch of artificial intelligence and Data Science. This field has undergone
immense development, especially in recent years. Big companies like Google, Facebook,
Microsoft and Amazon are working on this topic.
Our job is to find a solution for the detection of forest fire through the use of deep learning in
computer vision on board of a UAV.
In this chapter we give general information about Artificial Intelligence, Machine Learning and
Deep Learning technologies, then we present convolutional neurons, computer vision and
object recognition, finally we get a closer look on the most popular Deep Learning model
families.
Artificial Intelligence is the science of making computers behave like humans in terms of making
decisions, text processing, translation, etc. AI is a big umbrella that has Machine Learning and
Deep Learning under it [31]. Figure II. 1 below illustrate clearly the correlation between the three.
18
Chapter 2: Deep Learning for Computer Vision
Historically, four approaches to AI have been followed, each by different people with
different methods. a human-centered approach must in part be an empirical science, involving
observation and hypothesis about human behavior. A rational approach involves combination
of mathematics and engineering. The different groups described themselves as follows:
2.2.Machine Learning
2.2.1. Definition :
Machine learning is a subset of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed.
Machine learning focuses on the development of computer programs that can access data and
use it to learn for themselves [33].
2.2.2. Different types of Machine Learning :
Machine Learning can be divided into 3 categories:
A. Supervised Learning: we give the algorithm labeled data and the algorithm has to learn
from it and figure out how to solve future similar problems. It’s like giving the algorithm
problems and answers, the algorithm has to learn how these problems were solved in
order to solve future problems in a similar manner.
B. Unsupervised Learning: we give the algorithm a problem without any labeled data or
any prior knowledge of what the answer could be. It’s like giving the algorithm
problems without any answers, the algorithm has to find the best answer by driving
insights from the data.
C. Reinforcement Learning: is the training of machine learning models to make
a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially
complex environment. In reinforcement learning, an artificial intelligence faces
a game-like situation. The computer employs trial and error to come up with a solution
to the problem. To get the machine to do what the programmer wants, the artificial
intelligence gets either rewards or penalties for the actions it performs. Its goal is
to maximize the total reward.
A. Model:
o Classification Model: classification is a model of ML whose outputs belong to a finite
set of values (example: good, average, bad). Some of the most common classification
models: Logistic regression, Support Vector Machine (SVM), K-Nearest Neighbors
(KNN) and neural networks.
o Regression model: is a model of ML whose outputs are numbers (example: the
temperature of tomorrow), some of the most common regression models: Linear
regression, Decision Tree, Random Forest and neural networks.
B. Datasets:
Datasets correspond to one or more database tables, where each column of a table
represents a particular variable and each row corresponds to a given record in the dataset.
In question, the dataset lists the values of each of the variables, such as the height and
weight of an object, for each member of the dataset. The dataset is often separated into
three parts, namely:
o Training data: is that which is taken to refine the parameters of the model and thus
allow it to generalize towards unknown data.
o Validation data: is used during training to estimate the performance of the model and
20
Chapter 2: Deep Learning for Computer Vision
Figure II. 6:Comparing a machine learning approach to categorizing with deep learning [29].
21
Chapter 2: Deep Learning for Computer Vision
Figure II. 7:the relation between the performance of the algorithms and the amount of input Data
22
Chapter 2: Deep Learning for Computer Vision
The filters can start with very basic qualities like brightness and edges and progress to more
complicated attributes that describe the object uniquely.
A. Feature Learning, Layers, and Classification
A CNN, like other neural networks, has an input layer, an output layer, and a number of hidden
layers in between.
Figure II. 8:Neural networks, which are organized in layers consisting of a set
These layers perform operations on the data with the goal of learning data-specific
characteristics. Convolution, activation or ReLU, and pooling are three of the most popular
layers.
− Convolution passes the input images through a series of convolutional filters, each of which
activates different aspects of the images.
− Rectified linear unit (ReLU) By mapping negative values to zero and keeping positive
values, it is possible to train faster and more effectively. Because only the activated
characteristics are carried on into the next layer, this is frequently referred to as activation.
− Pooling reduces the amount of parameters that the network needs to learn by performing
nonlinear down sampling on the output.
Over tens or hundreds of layers, same actions are repeated, with each layer learning to
recognize distinct features
23
Chapter 2: Deep Learning for Computer Vision
4. Computer vision:
4.1.Definition
24
Chapter 2: Deep Learning for Computer Vision
on replicating parts of the complexity of the human vision system and enabling computers to
"see" and understand the content of digital images. The purpose of computer vision is to
comprehend the content of digital images by extracting a description from the image, which
could be an object, a text description, or a three-dimensional model, among other things.
Typically, this entails the creation of techniques that seek to replicate human eyesight figure.
The tasks that computer vision could perform before the introduction of deep learning were
relatively limited, and they needed a lot of manual coding and work from developers and human
operators. For example, if we wanted to run an object recognition model, we would have to do
the following:
− Create a database: Individual images of all the subjects we wish to track must be captured in a
precise format.
− Annotate images: Then we'd have to enter numerous critical data points for each individual
image.
− Capture new images: Then, whether from photographs or video content, we'd have to
capture new images. After that, we had to repeat the measurement process, this time noting
the image's main points. The angle from which the photo was shot has to be considered as
well.
25
Chapter 2: Deep Learning for Computer Vision
After all of this painstaking labour, the application would eventually be able to compare the
measures in the new image to those in its database and inform you if it matched any of the
profiles it was tracking. In actuality, the majority of the job was done manually, with very
little mechanization. And the margin of error was still significant.
Artificial intelligence has made enormous strides in recent years, surpassing humans in
several tasks linked to recognizing and labeling objects, thanks to breakthroughs in artificial
intelligence and innovations in deep learning, neural networks, and the amount of data we
generate today.
4.3.Computer Vision with Machine Learning Approach
When it came to solving computer vision difficulties, ML offered a fresh perspective.
Developers no longer have to manually code every rule into their vision applications thanks to
machine learning. Instead, they created "features," which were mini apps that could detect
certain patterns in photographs. They then utilized a statistical learning technique to find
patterns, classify photos, and detect objects in them, such as linear regression, logistic
regression, decision trees, or support vector machines (SVM).
26
Chapter 2: Deep Learning for Computer Vision
necessitated the collaboration of dozens of engineers and breast cancer experts and took a long
period.
4.4.Computer Vision with Deep Learning Approach
Machine learning was approached in a fundamentally new way with DL. Deep learning is based
on neural networks, which are a general-purpose function capable of solving any problem that
can be represented by instances. When you provide a neural network a large number of labeled
instances of a certain type of data, it may uncover common patterns between them and turn
them into a mathematical equation that can be used to categorize future pieces of information.
DL is an extremely effective computer vision algorithm. Creating a good deep learning system
usually boils down to accumulating a huge amount of labeled training data and fine-tuning
parameters like the type and number of layers of n-layers.
Deep learning is easier and faster to design and deploy than prior types of machine learning.
Deep learning is used in most contemporary computer vision applications, including fire
detection, self-driving cars, and facial recognition.
Deep learning has progressed from a theoretical notion to a practical application because to
breakthroughs in hardware and cloud computing resources.
5. Object Recognition
5.1.Definition
Object recognition is a broad word that refers to a group of related computer vision tasks that
entail recognizing things in digital pictures. Object localization entails creating a bounding box
around one or more objects in an image, whereas image classification entails providing a class
label to an image. Object detection is a more difficult task that combines the two tasks by
drawing a bounding box around each object of interest in the image and assigning a class label
to it. Object recognition is the umbrella term for all of these issues.
5.2.Object Recognition Computer Vision Tasks
− Image Classification: Predict an object's type or class from an image.
Input: A single-object image, such as a photograph.
output: A class label (e.g. one or more integers that are mapped to class labels).
− Object Localization: Detect the existence of items in a picture and use a bounding box to
denote their location.
Input: an image featuring one or more objects., such as a photograph.
Output: One or more bounding boxes (e.g. defined by a point, width, and height).
27
Chapter 2: Deep Learning for Computer Vision
− Object Detection: Locate the existence of things in a picture using a bounding box and the types
or classes of the objects found.
− Input: an image featuring one or more objects., such as a photograph.
− Output: One or more bounding boxes (e.g. defined by a point, width, and height), and a
class label for each bounding box.
− Object segmentation, also known as "object instance segmentation" or "semantic
segmentation," is a further development of this split of computer vision tasks, in which
instances of recognized objects are shown by highlighting the exact pixels of the object
rather than a crude bounding box.
Object recognition, as we can see from this split, pertains to a group of difficult computer
vision tasks that can be summarized in Figure II. 13 below.
With the goal of encouraging autonomous and distinct innovations at each level that can be
exploited more widely. The three equivalent task kinds, for example, are listed below:
− Image classification: Algorithms generate a list of object categories that can be found in the image.
− Single-object localization: Algorithms generate a list of object categories that appear in the
image, as well as an axis-aligned bounding box that indicates the position and scale of one
instance of each object category.
− Object detection: Algorithms provide a list of item categories that appear in the image, as
well as an axis-aligned bounding box that shows the position and scale of each instance of
each object type.
28
Chapter 2: Deep Learning for Computer Vision
Figure II. 15:Overview of Object Recognition Computer Vision Tasks for fire scene example.
29
Chapter 2: Deep Learning for Computer Vision
30
Chapter 2: Deep Learning for Computer Vision
Although the model is substantially faster to train and forecast, each input image still requires
a collection of candidate regions to be offered.
c) Faster R-CNN
The model architecture was further improved for both speed of training and detection by
Shaoqing Ren, et al. at Microsoft Research in the 2016 paper titled “Faster R-CNN: Towards
31
Chapter 2: Deep Learning for Computer Vision
32
Chapter 2: Deep Learning for Computer Vision
prediction is binary, indicating the presence of an object, or not, so-called “objectness” of the
proposed region.
5.4.3 YOLO Model Family
Another popular family of object recognition models is referred to collectively as YOLO or“You
Only Look Once,” develop by Joseph Redmon, et al.
Although R-CNN models are more accurate in general, the YOLO family of models is far faster,
achieving real-time object detection.
a) YOLO:
The YOLO model was first described by Joseph Redmon, et al. in the 2015 paper titled “You
Only Look Once: Unified, Real-Time Object Detection.” Note that Ross Girshick, developer of
R-CNN, was also an author and contributor to this work, then at Facebook AI Research.
The approach uses a single end-to-end trained neural network that takes a photograph as input
and immediately predicts bounding boxes and class labels for each bounding box. Although the
technique operates at 45 frames per second and up to 155 frames per second for a speed-
optimized version of the model, it has lower prediction accuracy (e.g., more localization errors).
The model divides the input image into a grid of cells, with each cell responsible for predicting
a bounding box if the center of a bounding box falls within the cell. Each grid cell generates a
bounding box based on the x, y coordinates, as well as the width, height, and confidence. Each
cell also serves as a basis for a class prediction..
For instance, an image may be divided into a 77-cell grid, with each cell predicting two
bounding boxes, yielding 94 recommended bounding box predictions. The bounding boxes with
confidences and the class probability map are then combined to create a final set of bounding
boxes and class labels. The model's two outputs are summarized in Figure II. 15.
33
Chapter 2: Deep Learning for Computer Vision
The model was updated by Joseph Redmon and Ali Farhadi in an effort to further improve
model performance in their 2016 paper titled “YOLO9000: Better, Faster, And Stronger.”
Although this variation of the model is referred to as YOLO v2, an instance of the model is
described that was trained on two object recognition datasets in parallel, capable of predicting
9,000 object classes, hence given the name “YOLO9000.” [27]
The model has undergone a variety of training and architectural improvements, including the
use of batch normalization and high-resolution input images.
The YOLOv2 model, like the Faster R-CNN, employs anchor boxes, which are pre-defined
bounding boxes with relevant shapes and sizes that are customized during training. On the
training dataset, a k-means analysis is used to select the bounding boxes for the image.
Importantly, the bounding box predictions are adjusted to allow tiny changes to have a less
dramatic influence on the predictions, resulting in a more stable model. Rather of explicitly
forecasting location and size, offsets for moving and reshaping the pre-defined anchor boxes
relative to a grid cell are anticipated and dampened by a logistic function.
Further improvements to the model were proposed by Joseph Redmon and Ali Farhadi in their
2018 paper titled “YOLOv3: An Incremental Improvement.” The improvements were
reasonably minor, including a deeper feature detector network and minor representational
changes [22].
34
Chapter 2: Deep Learning for Computer Vision
6. Conclusion:
We have devoted this chapter to the presentation of some generalities on Deep Learning
approaches with computer vision while presenting the Convolutional Neural Network and its
popular models family, as we discussed above the use of a deep Convolutional Neural Network
in object recognition has been a popular solution in the recent years due to its impressive results
and its evolution whether in its various architectures or its models family. and in order to detect
forest fire with great precision in real time we choose to work with the YOLOv2 Model since
it is faster and more accurate when it comes to real time detection. The next chapter is dedicated
to expose and to dive deep in the YOLOv2 Model and the process of constructing the fire
detection model in MATLAB from the data collection to labelling to training.
35
Chapter III: Forest Fire Detection
Model Using YOLOv2
36
Chapter 4: Real-Time Model Evaluation and Testing
1. Introduction:
Recent advances in artificial intelligence (AI) and Deep learning have made image-based
modeling and analysis (e.g., classification, real time prediction, and image segmentation) even
more successful in different applications Also, with the advent of nanotechnology
semiconductors, a new generation of Tensor Processing Units (TPUs) and Graphical Processing
Units (GPUs) can provide an extraordinary computation capability for data-driven methods.
Moreover, modern drones and UAVs can be equipped with tiny edge TPU/GPU platforms to
perform on-board processing on the fly to facilitate early fire detection before a catastrophic
event happens.
And to build such deep learning computer vision system for fire detection we need to have two
essential elements: The Dataset and the Trained Model.
In this chapter, we will discuss the techniques and the approaches we have used to design and
train the fire Detection model, it will be divided into two main processes:
First, we will talk about the fire detection Dataset Preparation process, then in the second part
we will talk about the training process.
2.1.1 THE FLAME DATASET: Aerial Imagery Pile Burn Detection Using Drones
(UAVS) [23].
This study provides an aerial imagery FLAME (Fire Luminosity Airborne-based Machine
learning Evaluation) dataset using drones during a prescribed pile burn in Northern Arizona,
37
Chapter 4: Real-Time Model Evaluation and Testing
USA.
The test was conducted with fire managers from the Flag staff (Arizona) Fire Department who
carried out a burn of piled slash on city-owned lands in a ponderosa pine forest on Observatory
Mesa.
The prescribed fire took place on January 16th, 2020 with the temperature of 43◦F (∼ 6◦C) and
partly cloudy conditions and no wind.
This dataset consists of different repositories including raw aerial videos recorded by drones'
cameras and also raw heatmap footage recorded by an infrared thermal camera. To help
researchers, two well-known studies; fire classification and fire segmentation are defined based
on the dataset. For approaches such as Neural Networks (NNs) and fire classification, 39,375
frames are labeled ("Fire" vs "Non-Fire") for the training phase. Also, another 8,617 frames are
labeled for the test data. 2,003 frames are considered for the fire segmentation and regarding
that, 2,003 masks are generated for the purpose of Ground Truth data with pixel-wise
annotation.
The FLAME dataset including all images, videos, and data are available on IEEE-Dataport [35].
38
Chapter 4: Real-Time Model Evaluation and Testing
2.1.2 FiSmo Dataset: A Compilation of Datasets from Emergency Situations for Fire and
Smoke Analysis [24].
FiSmo, is a compilation of Datasets from emergency situations, composed of images, videos,
regions of interest (ROIs), annotations, and features. these Datasets were employed in the
39
Chapter 4: Real-Time Model Evaluation and Testing
FiSmo-Images
Dataset Name Purpose # images Features ROIs
40
Chapter 4: Real-Time Model Evaluation and Testing
Doing our researches, we’ve noticed that although the amount of research papers and the work
done recently in the field of fire detection yet currently there is a scarcity of a diverse fire dataset
and a huge lack of benchmark Dataset for wildfire detection images thus it’s very difficult to
come up with one that simulate and counterfeit the nature and the wildfire scenarios in Algeria.
One of the limitations of the FiSmo datasets [24], is that although it is vast it is not diverse
enough to be solely utilized for training a network and expecting it to perform well in realistic
fire detection scenarios.
Moreover, that dataset was based on terrestrial images of the fire and to the best of our
knowledge, there exists no aerial imaging dataset for fire analysis besides the FLAME dataset
[23], Note that aerial imagery exhibits different properties such as low resolutions, and top-
view perspective, substantially different than images taken by ground cameras, but
unfortunately this last does not appear to be diverse enough as it contains a large number of
similar images basically from a restricted scenario that does not counterfeit completely the
probable Algerian wildfire scenarios (pile burn in Northern Arizona with the temperature of
43◦F (∼ 6◦C) and partly cloudy conditions and no wind).
Facing these challenges regarding the dataset we tried to create a diverse one mainly using [23]
and [24] plus the different online available resources, and our modest means to create a
personalized dataset that we will rely on in the training phase.
41
Chapter 4: Real-Time Model Evaluation and Testing
TOTAL 6,592
Dataset 5,556
42
Chapter 4: Real-Time Model Evaluation and Testing
After collecting the necessary amount of Raw Data, we need to move to the next essential step
which its success represent 80% of the model’s success.
43
Chapter 4: Real-Time Model Evaluation and Testing
it contains Fire or not and detect the ROI which is the location of the Fire in the Frame, we first
need to label images and to do that we utilized two different Methods of labeling:
− The Manual labeling Method.
− The Automatic labeling Method.
Before diving into the Fire labeling process, we need to clarify 3 main points:
The Data labeling tool: Since our input Data is all consisted of images, we opted to use the
image labeler APP from the Computer Vision Toolbox available in MATLAB to do the
Labeling Task.
The Label Class: For this project we choose to detect a single region of interest (ROI) which
means having a single label class that we will annotate “Fire”.
The computer vision Data Annotation Type: since Fire doesn’t have a predefine shape and our
main aim is to detect and localize the fire in a 2D frame for real time object detection, we will
use the bounding boxes type of annotation instead of the polygons and cuboids annotation to
minimize the operation time and maximize the accuracy.
2.2.2.1 Manual Labeling Method
We used the Manual labeling Method to annotate FiSmo Datase (Flicker-Fire) [10], Gaisaid D-
Fire[11],Fire Dataset from Kaggle [12],Foggia’s Dataset [13] and handpicked fire images from
the internet [14] , that’s because these datasets are very different from each other with different
fire scenarios so we cannot apply a single automation algorithm to annotate the fire in each
image Frame, automating the labeling process would be definitely a gain of time but certainly
with a very low annotation precision which results a low accuracy trained model ,To overcome
this challenge we need a human intervention, we had to annotate manually a total of 1,681
image frame , it was a very time-consuming process where a labeling mistake or inaccuracies of
the input can lead to wrong predictions and a wrong output.
For this task we need to:
− open the image labeler App from the Image Processing and Computer Vision tab and load
the data directly from the App, as shown in Figure III. 4.
define the labels we intend to draw directly within the app by creating a ROI label that
corresponds to our region of interest and create a 2D rectangle label named “Fire”, as shown in
Figure III. 4.
− use the mouse to draw a rectangular ROI around the Fire spot and we re-apply the process
for the rest of the datasets images, as shown in Figure III. 5.
44
Chapter 4: Real-Time Model Evaluation and Testing
− After labeling all the frames we export the labeled ground truth to the MATLAB
workspace. the labeled ground truth is stored as a ground Truth object so we can use this
object to train our deep-learning-based computer vision algorithm.
Figure III.4:Images loading and Fire label definition in Image Labeler App.
45
Chapter 4: Real-Time Model Evaluation and Testing
We used the graph to examine the occurrence of labels over time and compare the frames, the
frequency of labels and the distribution of ROIs where:
− The x-axis of the graph displays the numeric ID of each image in the dataset image
collection, the total number of images loaded and manually labeled from the dataset is
1,681 image frame.
− The y-axis displays the number of ROIs for each image in the dataset image collection,
which means how many fire bounding boxes are in each single image frame.as we can see
in the Figure III. 5 above, most of the frames are labeled with the average of 1 to 4 bonding
boxes per frame with an exception in some rare situations where there were frames that
were labeled with up to 19 Bounding boxes.
Since the FLAME dataset [9] consists of Fire images that represent the same scenario plus it is
the biggest dataset that we have, its manually labeling process will be more time-consuming,
so we tried to speed it up by using an automation algorithm to label the remainder of images,
for fire labeling there is no suitable built-in automation algorithm that are provided by
MATLAB toolstrip for that we need to create our own custom label automation algorithm.
To create this custom automation algorithm, we first require to create a segmented Fire Mask.
The mask is the main variable that we stand in need of for custom automation algorithm, using
the Color Thresholder APP from the Computer Vision Toolbox available in MATLAB, we
46
Chapter 4: Real-Time Model Evaluation and Testing
created the necessary binary segmentation Fire mask that we used to create the automation
algorithm for the FLAME dataset labeling.
In order to crate this mask, we loaded a good quality and net fire image from the Flame dataset
to the color Thresholder App we choose to segment in the RGB color space because it isolates
the fire colors better than the rest of the color spaces.
To separate fire from the rest elements of the image we used the point cloud approach where
the App converts the 3-D point cloud into a 2-D representation and activates the polygon ROI
tool, then we Draw a ROI around the Fire we want to segment and fine tune the segmentation
using the color controls R, G, B.
After checking the binary Fire mask, we export the function which we named “create mask”
with the code that creates the segmentation so we can use it to create the custom algorithm.
47
Chapter 4: Real-Time Model Evaluation and Testing
For the Creation of the custom algorithm, we used two functions: the first is the binary mask
function that we discussed above and the second is the “object label” function that the algorithm
needs in order to move from a black and white binary mask input into a 2-D bounding box
output, we completed coding the custom algorithm and we name It” Fire Detection”.
To perform the algorithm on the whole FLAME dataset [35] we need to:
− Open the image labeler App from the Image Processing and Computer Vision tab and load
all the images directly from the App as shown is Figure III. 10 below.
− Define the labels we intend to draw directly within the app by creating a ROI label that
corresponds to our region of interest and create a 2D rectangle label named “Fire”.
− Import the custom automation algorithm” Fire Detection” from the MAT files, select all
the images and run the automation on.
48
Chapter 4: Real-Time Model Evaluation and Testing
− After labeling all the selected frames we accept the annotation and we export the labeled
ground truth to the MATLAB workspace.
Figure III. 11:Sample frames of fire bounding boxes automated labeling results.
We used the graph to examine the occurrence of labels over time and compare the frames, the
frequency of labels and the distribution of ROIs generated by the fire detection custom
algorithm where:
− The x-axis of the graph displays the numeric ID of each image in the dataset image
collection, the total number of images loaded and manually labeled from the dataset is
4911 image frame.
− The y-axis displays the number of ROIs for each image in the dataset image collection,
which means how many fire bounding boxes are in each single image frame.as we can
see in the Figure III. 12 most of the frames are labeled with the average of 5 to 10
bonding boxes per frame with an exception in some rare situations where there were
frames that were labeled with up to 18 Bounding boxes.
49
Chapter 4: Real-Time Model Evaluation and Testing
The format of the bounding boxes defined in pixel coordinates as an M-by-4 numeric matrix
with rows of the form [x y w h], where [42]:
w specifies the width of the rectangle, which is its length along the x-axis.
h specifies the height of the rectangle, which is its length along the y-axis.
50
Chapter 4: Real-Time Model Evaluation and Testing
Figure
Definition: Ground truth is a term used in statistics and machine learning that means checking
the results of machine learning for accuracy against the real world. The term is borrowed from
meteorology, where "ground truth" refers to information obtained on site.
The accuracy of our trained model will depend on the accuracy of the used ground truth for that
we need to test it by inputting a raw Image from the dataset and see the annotation output.
Figure III.15:Sample Frames from ALL New Dataset for Ground Truth testing.
51
Chapter 4: Real-Time Model Evaluation and Testing
“Garbage In Garbage Out” is commonly used phrase in the machine learning community, which
means that the quality of the training data determines the quality of the model so
After investing 65% of our time and effort to ensure the quality and the quantity of the Data it is
time to ensure the success of the Detection Model.
52
Chapter 4: Real-Time Model Evaluation and Testing
have a framework built for the computer vision model for that we will use the YOLOv2
framework.
Compared to traditional two-stage detection algorithms (such as R-CNN, Fast R-CNN, Faster-
RCNN, etc.), YOLOv2 directly converts the problem of bounding box positioning into an end-
to-end regression solution. Since YOLOv2 avoids the process of generating hundreds of
candidate boxes, the execution speed of the algorithm is significantly improved over the two-
stage detection schemes, which makes YOLOv2 an excellent choice for implementing real-
world applications. The whole detection flow of the YOLOv2 algorithm can be generally
divided into three basic procedures:
preprocessing of the input image: It involves resizing images of different input resolutions to
a uniform size by using bilinear interpolation, and then subtracting the average brightness of all
the images in the dataset to avoid distortion effect, such as over brightness.
the main CNN forward computation: in the YOLOv2 algorithm, the input image is divided
into several grids according to the frame size, e.g., 13 × 13 grids for a 416×416 input image,
and each grid is responsible for predicting five anchor boxes with different aspect ratio.
Corresponding to each anchor box, the CNN outputs a 25- dimensional vector: one number for
the probability the box contains an object, four numbers to represent the bounding box
coordinates in relation to the anchor box, and 20-dimensional probability for each of the
categories in the training dataset.
post-processing: which extracts the coordinates of the bounding box and the category
information from the output of the CNN, and then filters them by performing non-maximum
suppression (NMS) to get the best detection result. The bounding boxes of the objects are finally
drawn and displayed for the user [26].
53
Chapter 4: Real-Time Model Evaluation and Testing
Each frame is divided into a grid of 𝒔𝒙 ∙ 𝒔𝒚 in each grid cell, a maximum number of 𝑩 bounding
boxes are searched; for every bounding box, a certain confidence score is assigned. The
𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 score is the probability that inside a bounding box an object is contained and
how accurate the decision is. This 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 is expressed as 𝒑(𝑶𝒃𝒋𝒆𝒄𝒕).Thus, each
bounding box consists in five values: 𝑿, 𝒀, 𝑾, 𝑯 and, 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆. Furthermore, for each grid
cell, the probability of the 𝑪 conditional class is computed. In this way, it is possible to obtain
for the entire image N bounding boxes, divided into grids. This leads to [25]:
𝒑(𝑪𝒍𝒂𝒔𝒔𝒊 |𝑶𝒃𝒋𝒆𝒄𝒕): Given a grid cell with at least one object, it represents the conditional
probability for one of them to belong to the 𝒊th class.
𝒑(𝑶𝒃𝒋𝒆𝒄𝒕). 𝑰𝑶𝑼𝒕𝒓𝒖𝒕𝒉
𝒑𝒓𝒆𝒅 : It represents the 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 of each bounding box inside a cell grid.
In each cell, only the classes with the highest Confidence are taken [25].
54
Chapter 4: Real-Time Model Evaluation and Testing
288×288 14.1 69
73.7 based
In this work we built using MATLAB20 a YOLOv2 Model layer by layer from scratch
on the original YOLOv2 Model, starting with an input layer, followed by the76.8
detection
subnetwork containing a series of Convolutional, Batch normalization, and ReLu layers.
77.8 These
layers are then connected by the MATLAB’s inbuilt yolov2TransformLayer
78.6 and
yolov2OutputLayer.
yolov2TransformLayer transforms the raw CNN output into a form required to produce object
detections. Yolov2OutputLayer defines the anchor box parameters and implements the loss
function used to train the detector. Using the network analyzer app, we can visualize the layers
graph of the built Model (Figure III.19).
55
Chapter 4: Real-Time Model Evaluation and Testing
56
Chapter 4: Real-Time Model Evaluation and Testing
Once the network layers are ready and fully built, we moved to the training phase where we
train our model on precedent-built Dataset.
57
Chapter 4: Real-Time Model Evaluation and Testing
for the training and testing process we first need to feed our network with the annotated fire
detection datasets that we split the dataset into 75% Training Set and 25% Testing Set as shown
in the table below.
#Images
The training stage is the stage where the Deep learning algorithm is trained by feeding datasets,
that’s where the learning takes place.
Deep learning neural network models learn to map inputs to outputs given a training dataset of
examples, it consists of the sample output data and the corresponding sets of input data that
have an influence on the output. all neurons of a given layer are generating an output, but they
don’t have the same weight for the next neurons layer. This means that if a neuron on a layer
observes a given pattern it might mean less for the overall picture and will be partially or
completely muted. This is what is called Weighting: a big weight means that the Input is
important and of course a small weight means that we should ignore it. Every Neural
Connection between neurons will have an associated weight. And this is the magic of neural
network adaptability: weights will be adjusted over the training to fit the objectives we have set
(recognizing fire in an image Frame). In simple terms: Training a NN means finding the
appropriate weights of the neural Connections thanks to a feedback loop called Gradient
Backward propagation.
This training process is iterative, meaning that it progresses step by step with small updates to
the model weights each iteration and, in turn, a change in the performance of the model each
iteration. This iterative process which is called “model fitting” solves an optimization problem
58
Chapter 4: Real-Time Model Evaluation and Testing
that finds for parameters (model weights) that result in a minimum error or loss when evaluating
the examples in the training dataset.
Supervised learning is possible when the training data contains both the input and output values.
Each set of data that has the inputs and the expected output is called a supervisory signal. The
training is done based on the deviation of the processed result from the documented result when
the inputs are fed into the model.
For training a network we always provide the algorithm with some training options. These
options guide the network through how the network should learn.
Once an error gradient has been estimated, the derivative of the error can be calculated and used
to update each parameter. There may be statistical noise in the training dataset and in the
estimate of the error gradient. Also, the depth of the model (number of layers) and the fact that
model parameters are updated separately means that it is hard to calculate exactly how much to
change each model parameter to best move down the whole model down the error gradient.
Instead, a small portion of the update to the weights is performed each iteration. A
hyperparameter called the “learning rate” controls how much to update model weights and, in
turn, controls how fast a model learns on the training dataset.
− Learning Rate: The amount that each model parameter is updated per cycle of the learning
algorithm.
The training process must be repeated many times until a good or good enough set of model
parameters is discovered. The total number of iterations of the process is bounded by the
number of complete passes through the training dataset after which the training process is
terminated. This is referred to as the number of trainings “epochs.”
59
Chapter 4: Real-Time Model Evaluation and Testing
− Epochs: The number of complete passes through the training dataset before the training
process is terminated.
There are many extensions to the learning algorithm, although these five hyperparameters
generally control the learning algorithm for deep learning neural networks.
Once we have the Network layers ready, we moved to work on our model’s training options.
The solver, Learning Rate, mini batch size, and no of epochs that we’ve discussed above are
some of the important training options to consider in order to decide what should be the learning
speed of the network and how much data sample the network should train on, in each round of
training.
For the YOLOv2 Network Training we set some options based on the size of the dataset where
we’ve:
The training was conducted using a system with the following specifications:
Intel®Xeon®CPU E5-1650 v2 @ 3.50 GHz with 64 GB onboard memory; The required
environments and tools (MATLAB 2020). The weight in the network is initialized randomly
and the other training options have been set and discussed above.
60
Chapter 4: Real-Time Model Evaluation and Testing
YOLOv2 training stages are carried out to get the best model which will then be used to
recognize and detect fire patterns. The training process in this study takes about 2 to 5 minutes
per epoch. The length of training time is affected by the batch size value, and the number of
epochs, for 50 epochs the computing took around 3 hours and 33 minutes to complete the model
training.
Figure III.22 and Figure Figure III.23 below show the training progress through each iteration
with mini batch and base learning rate information.
3.4Training Evaluation:
In order to evaluate the training process an error function must be chosen, often called the
objective function, cost function, or the loss function. Typically, a specific probabilistic
61
Chapter 4: Real-Time Model Evaluation and Testing
framework for inference is chosen called Maximum Likelihood. Under this framework, the
commonly chosen loss functions are cross entropy for classification problems and mean squared
error for regression problems [43].
a) Loss Function: The function used to estimate the performance of a model with a specific
set of weights on examples from the training dataset.
The search or optimization process requires a starting point from which to begin model updates.
The starting point is defined by the initial model parameters or weights. Because the error
surface is non-convex, the optimization algorithm is sensitive to the initial starting point. As
such, small random values are chosen as the initial model weights [25].
From a very simplified perspective, the loss function (J) can be defined as a function which
takes in two parameters [43]:
− Predicted Output.
− True Output.
This function will essentially calculate how poorly our model is performing by comparing what
the model is predicting with the actual value it is supposed to output which is our ground truth.
If Y pred is very far off from Y, the Loss value will be very high. However, if both values are
almost similar, the Loss value will be very low.
If the loss is very high, this huge value will propagate through the network while it’s training
and the weights will be changed a little more than usual. If it’s small then the weights won’t
change that much since the network is already doing a good job.
62
Chapter 4: Real-Time Model Evaluation and Testing
It was inferred from the graph that while the total loss value was 164 at the starting of the
training it was decreased to level of 0.00 at the end of the training at the steps of 8460. As the
loss function was lower than the level of 0.02, the inference graph was frozen and exported to
detect the fire source, which proves that the network is efficiently trained and ready to be tested.
4. Conclusion:
This chapter was devoted to discuss the process of building the fire detection deep learning
model from collecting and labeling the necessary high quality data to designing the YOLOv2
Network Layers and training a reasonably accurate model ,after the evaluation of the training
we need to move to the next step which is the testing phase, The next chapter is dedicated to
expose this important phase where we will test our trained model on the test set ,evaluate it ,
finally we expose the test and evaluation of the model for Real-Time Detection.
63
Chapter IV: Real-Time Model
Evaluation and Testing
64
Chapter 4: Real-Time Model Evaluation and Testing
1.Introduction:
Fire Classification and localization is a supervised learning approach in which the target
variable “fire” is discrete (or categorical) for that evaluating a deep learning model is considered
as important as building and training it, in this chapter we will proceed to the next phase of our
project which is the Testing phase where we will perform our built in Fire Detection Model on
new, previously unseen realistic data and try to evaluate the results based on three created model
scenarios ,finally we will cover up Real-time detection where the model should be able to sense
the environment parse the scene to balance between detection performance and real time
requirements.
Deep learning Model Test is an important and critical step in the detection process that we can
neither neglect nor over pass, this part of the process enables us to quickly identify any
shortcomings or edge cases that the model does not perform well on and thus evaluate it based
on the detection test performance.
According to this test results we can re-train our model using annotations from images depicting
these edge cases, or at least be aware of the specific environments in which the model performs
best.
therefore, once we have made the YOLOv2 model using the prepared training data, we can test
it by feeding in new, real wildfire scenarios images (test test) and seeing whether the model
classifies and localize the fire accurately, while this isn’t an automated or quantified test it was
conducted using the same environment specifications that we used for the training:
Intel®Xeon®CPU E5-1650 v2@ 3.50 GHz with 64 GB onboard memory; The required
environments and tools (MATLAB 2020).
65
Chapter 4: Real-Time Model Evaluation and Testing
The primary reason for feeding an already labeled data to the trained model is to be able to
evaluate the model later according to the comparison between the exiting ground truth and the
predicted outputs and other range of different metrics, thus test data should be representative of
the original data and be left unmodified for unbiased evaluation.
a) Scenario 1: the detector of the first scenario was trained on only aerial images
obtained from the FLAME Dataset [35] that we discussed in chapter N°3, were we
fed the model with 1,126 fire annotated images.
b) Scenario 2: the detector of the second scenario was trained on only terrestrial
images obtained from the FiSmo [36] and Gaisaid [37] Datasets that we discussed
in chapter N°3, were we fed the model with 850 fire annotated images.
c) Scenario 3: the last detector is the same detector that we discussed its mixed
dataset (All New Dataset [41]), training process and training evaluation in chapter
N°3.
Note that all the above scenarios were experimented using the same system environment and
its training dataset was also preprocessed in the same way and have been fed to the same
YOLOv2 network designed in chapter N°3
66
Chapter 4: Real-Time Model Evaluation and Testing
We adopted this approach of testing to demonstrate the efficacy and simulate the performance
of each Trained detector once it would be deployed on UAV-based forest fire monitoring and
detection missions for real situations.
67
Chapter 4: Real-Time Model Evaluation and Testing
68
Chapter 4: Real-Time Model Evaluation and Testing
69
Chapter 4: Real-Time Model Evaluation and Testing
70
Chapter 4: Real-Time Model Evaluation and Testing
3. Model Evaluation:
An object detection model produces the output in three components:
Evaluating object detection model means that we need to measure if the model found all the
objects, and also a way to verify if the found objects belong to the correct class. This implies that
in our case our Fire detection model needs to accomplish two things:
1. Classification: Identify if a Fire spot is present in the image and its class.
2. Localization: Predict the coordinates of the bounding box around the Fire when it is
present in the image. Here we compare the coordinates of ground truth and predicted
bounding boxes.
To evaluate our model, we need to evaluate the performance of both classification as well as
localization of using bounding boxes in the image using evaluation metrics, but before exploring
the metrics we need to cover an important parameter —IoU.
− Intersection over Union (IoU): we use the concept of Intersection over Union (IoU) or
so-called threshold as a similarity measure, it computes intersection (overlap) over the union
of the two bounding boxes; the bounding box for the ground truth and the predicted bounding
box as shown in figure where Red represents ground truth bounding box and green represent
predicted bounding box.
71
Chapter 4: Real-Time Model Evaluation and Testing
An IoU of 1 implies that predicted and the ground-truth bounding boxes perfectly overlap.
Threshold is a key parameter for various evaluation Metrics, changing the value of the threshold
parameter drastically changes the value of the evaluation metric. We can set a threshold value
for the IoU to determine if the object detection is valid or not.
• if IoU <0.5 then it is a wrong detection and classify it as False Positive (FP).
• When a ground truth is present in the image and model failed to detect the Fire, classify
it as False Negative (FN).
• True Negative (TN): TN is every part of the image where we did not predict a Fire. This
last metrics is not useful for object detection in general, hence we ignore TN.
3.1 Evaluation Metrics:
To evaluate our YOLOv2 Fire detector, we will use three main evaluation metrics that we will
discuss below:
1) Mean Average Precision (mPA):
it’s the most used metric to evaluate an object detection model it’s calculated using three
essential elements:
o Precision measures how good our model is when the prediction is positive [46].
𝑇𝑃
𝑝𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 (1)
o Recall measures how good our model is at correctly predicting positive classes [46].
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑁 (2)
The focus of precision is positive predictions. It indicates how many positive predictions
are true, while the focus of recall is actual positive classes. It indicates how many of the
positive classes the model is able to predict correctly.
72
Chapter 4: Real-Time Model Evaluation and Testing
we count the accumulated TP and the accumulated FP and compute the precision/recall at each
line, the Average Precision is computed as the average precision at 11 equally spaced recall
levels. The Mean Average Precision (mAP) is the averaged AP over all the object categories
and since in our case we have only one object category which is” Fire” then we can imply that.
𝑚𝐴𝑃 = 𝐴𝑃 (4)
𝐹𝑁
𝑚𝑟(𝑐) = 𝑇𝑃+𝐹𝑁 (4)
𝐹𝑃
𝑓𝑝𝑝𝑖(𝑐) = ⋕𝑖𝑚𝑔 (5)
we first plot the miss-rate the number of false positives per image in log-log plots. all for a
given confidence value c such that only detections are taken into account with a confidence
value greater or equal than c. As commonly applied in object detection evaluation the
confidence threshold c is used as a control variable. By decreasing c, more detections are taken
into account for evaluation resulting in more possible true or false positives, and possible less
false negatives. We define the log average miss-rate (LAMR) as shown in (equation (6) [46].
1
𝐿𝐴𝑀𝑅 = 𝑒𝑥𝑝( ∑ 𝑙𝑜𝑔(𝑚𝑟( armax 𝑓𝑝𝑝𝑖 (𝑐)))) (6)
9 𝑓𝑝𝑝𝑖(𝑐)≤𝑓
𝑓
Note that due to the high non-rigidness of pedestrians we follow the common choice of an IoU
threshold of 0.5.
73
Chapter 4: Real-Time Model Evaluation and Testing
a) Scenario 1 Results:
b) Scenario 2 Results:
Figure IV.8:Log Average Miss Precision of the Seconde scenario/ Log Average
74
Chapter 4: Real-Time Model Evaluation and Testing
c) Scenario 3 Results:
Figure IV.9:Log Average Miss Precision of the Third scenario/ Log Average
a) Scenario 1 Results:
from Figure IV. 6 and Figure IV. 7 it shows that the first trained model achieved an average
precision of 0.72 and an average mis rate of 0.49 ,which is considered as a high precision value
that relates to a low false negative FN rate without neglecting the miss rate value which kinda
not satisfying comparing to the precision ,and this is due to high false positive rate FP ,we can
say that the first trained model has almost fulfil the object detection requirements despite the
false positive rates FP from the tree spots that were detected as fire (Figure IV. 2) which the
train data plays an important factor in that as it is considered as a lower complex scene data.
b) .Scenario 2 Results:
from Figure IV. 8 it shows that the second trained model achieved an average precision of 0.6
and an average mis rate of 0.52,which is high miss rate that is related to a high false negative
rate FN ,these values are connected directly to the difference between the test set and the train
set ,the dataset that we trained our second model on is composed of multiple fire scenes but
they are completely terrestrial images, while the dataset that we tested with is composed of
75
Chapter 4: Real-Time Model Evaluation and Testing
purely aerial scenes and that’s was obviously seen in Figure IV. 3 where the detector was
incapable of detecting the fire spots were the aeras were less large and less clear.
c) Scenario 3 Results:
from Figure IV. 9 it shows that the third trained model achieved an average precision of 0.87
and an average miss rate of 0.16 which is considered as a high precision and a low miss rate,
these results are related to the low false positive rate FP and false negative rate FN and that’s
what we can see clearly in Figure IV. 4 were most all of the spot fire were successfully detected.
These detection results proves that the personalized dataset that we’ve built has overcame the
challenges by gathering rich aerial fire senses and its detector satisfied the requirements of an
object detection Model.
The table below summaries the Fire Detection test results of the three Detectors.
From these results we can say that we did the right thing by mixing the datasets and training the
model on our own personalized dataset.
To evaluate the detector performance in Real time it was required to do a UAV test where we
deploy our Detector on a UAV equipped camera and test its detection and robust performance
while the UAV is flying upon a fire region which is the real scenario that this system was built
for ,unfortunately due to lack of resources and materials we couldn’t achieve the real test so we
tried to imitate the scenario as much as possible by replacing the UAV camera by a phone camera
and try to detect it from a height of approximately 9.5 meters.
76
Chapter 4: Real-Time Model Evaluation and Testing
This Real time test was conducted using an environment of the following specifications:
• for processing we used Intel®Core (TM) i7-4600U CPU @ 2.10GHz ,2.70 GHz with 64
GB onboard memory.
77
Chapter 4: Real-Time Model Evaluation and Testing
Figure IV. 7:MATLAB live processing window for detection with 27.28 FPS.
Figure IV. 8:MATLAB live processing window for detection with 41.08 FPS.
Figure IV.10, Figure IV.11 and Figure IV.12 above show the real-time test results ,in which
we can clearly notice the detected fire spot with an average score of 0.6 ,besides the accuracy
the final results has shown a rapid detection that reached out to 58 Frame per Image FPS
regarding the poor calculation environment that we tested with ,these rapidity is due to the
78
Chapter 4: Real-Time Model Evaluation and Testing
designed YOLOv2 Network model that utilize the predefined anchors that cover spatial
position, scales, and aspect ratios across an image. Hence, we do not need an extra branch for
extracting region proposals. Since all computations are in a single network, it likely to run faster
than the two-stage detectors.
5 Conclusion
In this chapter we tested our built Model on an aerial fire images test set, we did evaluate our
detector by comparing it to different trained Models using evaluation metrics in which its results
has shown a mean average precision of 0.87,for further evaluation in real time environment we
conducted a Real-Time detection test that end up with a very fine results, our detector has shown
a rapid and robust detection that reached out to 58 FPS ,these parameters satisfy completely the
requirements of a Real-Time object detector.
79
Conclusion and Perspectives
80
Forest Fires is a major problem, when a fire occurs, it seriously threatens people’s lives
and causes major losses, a lot of work based on computer vision are made to detect these fires
by exploiting the use of UAVs for aerial forest monitoring.
After a definition of the treated problem which concerns the Fire classification and localization,
we have exposed the state of the art on the work that is already done on the same problem, we
have chosen to solve this problem by using deep learning, more specifically the YOLOv2
Convolutional Neural Network Architecture.
To build this deep learning computer vision-based fire detector we had to provide two
essential elements the dataset and the trained model, In deep learning dataset is so crucial ,in
our way to craft this element we encountered several challenges ,the first one was the absence
of benchmark dataset for forest fire detection ,so we had to surf through various resources to
collect the necessary amount of data ,the second challenge is that most of the collected data
were not labeled or annotated ,its labeling process in MATLAB was very time and energy
consuming.
For model crafting we used a main single approach: building the model "from scratch"
which requires to design the network architecture and train it. The advantage of the latter
approach is the possibility of customizing the layers and the training options as we want till the
validation of the model. we tried to keep a balance between detection performance and real-
time requirements. Generally, if the real-time requirements are met, we see a drop in
performance and vice versa. So, balancing both the aspects was a big challenge
Finally, the obtained results have shown a mean average precision of 0.87 and rapid
detection that reached out to 58 FPS, these parameters satisfy completely the requirements of a
Real-time object detector.
Perspectives:
− Future works could consider fire spot localization strategy development which can
obtain the fire position in real world coordinates.
− we will look for more effective ways and approaches based on several models to resolve
existing false positives.
− we would like to increase the size of the Forest Fire benchmark and make finer grained
annotations, e.g., using compute graphic engine to create synthetic data and generate
pixel-wise annotations.
81
− we plan to mount the proposed fire detection system on UAVs for real world forest fire
detection by realizing an embedded system comprising a camera and a microcontroller
card such as Raspberry Pi or else base of a GPU such as the NVIDIA Jetson card.
82
References
83
Bibliography
[1] Barmpoutis, P., Papaioannou, P., Dimitropoulos, K., & Grammalidis, N. (2020). A review
on early forest fire detection systems using optical remote sensing. Sensors, 20(22), 6442.
[2] Alkhatib, A. A. (2014). A review on forest fire detection techniques. International Journal
of Distributed Sensor Networks, 10(3), 597368.
[3] Kim, B., & Lee, J. (2019). A video-based fire detection using deep learning models. Applied
Sciences, 9(14), 2862.
[4] Yuan, C., Liu, Z., & Zhang, Y. (2017). Aerial images-based forest fire detection for
firefighting using optical remote sensing techniques and unmanned aerial vehicles. Journal of
Intelligent & Robotic Systems, 88(2), 635-654.
[5] Pan, H., Badawi, D., & Cetin, A. E. (2020). Computationally efficient wildfire detection
method using a deep convolutional network pruned via Fourier analysis. Sensors, 20(10), 2891.
[6] Center of fire statistics CTIF REPORT World Fire Statistics 2020 N °25.
[7] Celik, T, Demirel, H. Ozkaramanli Celik, T., Demirel, H., Ozkaramanli, H., & Uyguroglu,
M. (2007). Fire detection using statistical color model in video sequences. Journal of Visual
Communication and Image Representation, 18(2), 176-185.
[8] Chen, T. H., Wu, P. H., & Chiou, Y. C. (2004, October). An early fire-detection method
based on image processing. In 2004 International Conference on Image Processing, 2004.
ICIP'04. (Vol. 3, pp. 1707-1710). IEEE.
[9] Wang, T., Shi, L., Yuan, P., Bu, L., & Hou, X. (2017, October). A new fire detection method
based on flame color dispersion and similarity in consecutive frames. In 2017 Chinese
Automation Congress (CAC) (pp. 151-156). IEEE.
[10] Borges, P. V. K., & Izquierdo, E. (2010). A probabilistic approach for vision-based fire
detection in videos. IEEE transactions on circuits and systems for video technology, 20(5), 721-
731.
[11] Foggia, P., Saggese, A., & Vento, M. (2015). Real-time fire detection for video-
surveillance applications using a combination of experts based on color, shape, and
motion. IEEE TRANSACTIONS on circuits and systems for video technology, 25(9), 1545-
1556.
84
[12] Günay, O., & Çetin, A. E. (2015, September). Real-time dynamic texture recognition using
random sampling and dimension reduction. In 2015 IEEE International Conference on Image
Processing (ICIP) (pp. 3087-3091). IEEE.
[13] Zhao, Y., Ma, J., Li, X., & Zhang, J. (2018). Saliency detection and deep learning-based
wildfire identification in UAV imagery. Sensors, 18(3), 712.
[14] Frizzi, S., Kaabi, R., Bouchouicha, M., Ginoux, J. M., Moreau, E., & Fnaiech, F. (2016,
October). Convolutional neural network for video fire and smoke detection. In IECON 2016-
42nd Annual Conference of the IEEE Industrial Electronics Society (pp. 877-882). IEEE.
[15] Zhang, Q., Xu, J., Xu, L., & Guo, H. (2016, January). Deep convolutional neural networks
for forest fire detection. In Proceedings of the 2016 international forum on management,
education and information technology application. Atlantis Press.
[16] Muhammad, K., Ahmad, J., Lv, Z., Bellavista, P., Yang, P., & Baik, S. W. (2018).
Efficient deep CNN-based fire detection and localization in video surveillance
applications. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(7), 1419-
1434.
[17] Hochreiter, S., & Schmidhuber, J. (1997). LSTM can solve hard long time lag
problems. Advances in neural information processing systems, 473-479.
[18] Hu, C., Tang, P., Jin, W., He, Z., & Li, W. (2018, July). Real-time fire detection based on
deep convolutional long-recurrent networks and optical flow method. In 2018 37th Chinese
Control Conference (CCC) (pp. 9061-9066). IEEE.
[19] Jijitha, R., & Shabin, P. (2019). A Review on Forest Fire Detection. Research and
Applications: Embedded System, 2(3).
[20] Meddour-Sahar, O., & Bouisset, C. (2013). Les grands incendies de forêt en Algérie :
problèmes humains et politiques publiques dans la gestion des risques. Méditerranée. Revue
géographique des pays méditerranéens/Journal of Mediterranean geography, (121), 33-40.
85
[21] O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V.,
Krpalkova, L., ... & Walsh, J. (2019, April). Deep learning vs. traditional computer vision.
In Science and Information Conference (pp. 128-144). Springer, Cham.
[22] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of
the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
[23] Shamsoshoara, A., Afghah, F., Razi, A., Zheng, L., Fulé, P. Z., & Blasch, E. (2021). Aerial
Imagery Pile burn detection using Deep Learning: the FLAME dataset. Computer
Networks, 193, 108001.
[24] Cazzolato, M. T., Avalhais, L. P., Chino, D. Y., Ramos, J. S., de Souza, J. A., Rodrigues-
Jr, J. F., & Traina, A. J. (2017). Fismo: A compilation of datasets from emergency situations
for fire and smoke analysis. In Brazilian Symposium on Databases-SBBD (pp. 213-223). SBC.
[25] Giuffrida, G., Meoni, G., & Fanucci, L. (2019). A YOLOv2 convolutional neural network-
based human–machine interface for the control of assistive robotic manipulators. Applied
Sciences, 9(11), 2243.
[26] Wang, Z., Xu, K., Wu, S., Liu, L., Liu, L., & Wang, D. (2020). Sparse-YOLO:
Hardware/Software co-design of an FPGA accelerator for YOLOv2. IEEE Access, 8, 116569-
116585
86
Webography
[27] “A Gentle Introduction to Object Recognition with Deep Learning’ at
https://machinelearningmastery.com/object-recognition-with-deep-learning/
https://towardsdatascience.com/everything-you-ever-wanted-to-know-about-computer- vision-
heres-a-look-why-it-s-so-awesome-e8a58dfb641e
https://www.mathworks.com/discovery/convolutional-neural-network-matlab.html
https://towardsdatascience.com/what-is-deep-learning-and-how-does-it-work- 2ce44bb692ac
https://machinelearningmastery.com/what-is-computer-vision/
[33] “Everything You Wanted to Know About Machine Learning but Were Too Afraid to Ask” at
https://medium.com/swlh/everything-you-wanted-to-know-about-machine-learning-but- were-
too-afraid-to-ask-d7d92021038
https://nptel.ac.in/content/storage2/courses/109101004/downloads/Lecture-
19%20&%2020.pdf
https://ieee-dataport.org/open-access/flame-dataset-aerial-imagery-pile-burn-detection-using-
drones-uavs
87
[38] FIRE Dataset Outdoor-fire images and non-fire images for computer vision tasks.2018 at
https://www.kaggle.com/phylake1337/fire-dataset
https://towardsdatascience.com/introduction-to-object-detection-model-evaluation-
3a789220a9bf
https://towardsdatascience.com/evaluating-performance-of-an-object-detection-model-
137a349c517b
https://eurocity-dataset.tudelft.nl/eval/benchmarks/detection
88