Journal (AI&ML)
Journal (AI&ML)
LEARNING
FIRE AND SMOKE DETECTION USING
CNN
A Journal Paper
Prepared by
HARI KRISHNAN K (953622205021)
HARISH S (953622205022)
MOHAMED ISLAAM K A (953622205025)
RAM PANDIAN G (953622205034)
Keywords
Fire detection
Convolutional Neural Networks (CNN)
Dense Neural Networks (DNN)
Deep Learning
Image Processing
Real-Time Systems
Early Warning
Artificial Intelligence
Introduction
Fire detection is a crucial aspect of safety, designed to prevent potential loss of
life, property damage, and environmental destruction. Traditional systems, such
as smoke detectors and heat sensors, have been in use for decades but often face
limitations, including false alarms and slow response times. In outdoor
environments or large spaces, these systems struggle with timely detection, and
environmental conditions like fog, dust, or steam can further degrade their
performance. This has driven the need for more advanced solutions, capable of
detecting fire accurately and rapidly across a wide variety of settings. Video-
based fire detection systems, which leverage real-time surveillance cameras,
have emerged as an alternative, offering the ability to monitor larger areas and
detect both flames and smoke. These systems, when enhanced with machine
learning, can further improve fire detection accuracy and response times.
Related Work
The landscape of fire detection systems has evolved dramatically, particularly with
the integration of advanced machine learning and deep learning techniques. Traditionally, fire
detection relied heavily on physical sensors such as smoke detectors and thermal sensors,
which have inherent limitations, including susceptibility to false alarms and delayed response
times. However, the advent of image processing and deep learning has paved the way for
more reliable and efficient fire detection methodologies. For instance, LeCun et al. (2015)
outlined the potential of deep learning techniques, particularly Convolutional Neural
Networks (CNNs), in image classification tasks. Their findings have been foundational in
recognizing that CNNs are adept at automatically learning and extracting features from
images, making them particularly well-suited for detecting visual anomalies such as fire and
smoke.
Several recent studies have built upon these foundational theories, further enhancing
fire detection capabilities. Gagliardi and Saponara (2020a) proposed a distributed video
antifire surveillance system leveraging Internet of Things (IoT) embedded computing nodes.
This approach emphasized real-time monitoring and responsiveness, showcasing how
distributed networks can improve the accuracy and speed of fire detection systems. By
utilizing IoT technologies, the proposed system enables remote surveillance and the
integration of multiple sensor modalities, thereby enhancing situational awareness in diverse
environments.
In addition, the work by Rafiee et al. (2011) demonstrated the application of wavelet
analysis in conjunction with disorder characteristics for fire and smoke detection. Their study
highlighted how analyzing the frequency domain of images can provide valuable insights into
the presence of fire, which can be particularly useful in scenarios where traditional sensors
may fail. By focusing on the characteristics of fire and smoke, this approach reduces false
positives and improves detection accuracy. Moreover, Vijayalakshmi and Muruganand
(2017) utilized background subtraction methods in video images, further refining detection
techniques by enabling the system to differentiate between normal environmental changes
and potential fire incidents. This method, when combined with machine learning algorithms,
allowed for real-time detection and provided a substantial improvement over traditional fire
detection systems.
The literature also explores various other methodologies, including sensor fusion
techniques that combine multiple data sources to enhance fire detection reliability. Saponara
et al. (2014) presented an early video smoke detection system designed for rolling stock,
emphasizing the necessity of tailored solutions for specific environments. Their work
illustrated that context-specific adaptations could significantly improve detection efficacy. In
parallel, Celik et al. (2007) explored image processing-based approaches for fire and smoke
detection without relying on traditional sensors. Their research highlighted the importance of
visual data analysis and the capability of deep learning models to understand complex visual
information, indicating a shift towards a more data-driven approach to fire safety.
Proposed Methodology
The proposed fire detection system leverages two deep learning architectures:
Convolutional Neural Networks (CNNs) and Dense Neural Networks (DNNs). Each model is
designed, trained, and evaluated for performance in identifying fire in real-time video or
image data. The following subsections describe the detailed steps involved in this approach:
1. Dataset Collection and Preparation
The dataset consists of fire and non-fire images sourced from publicly available databases
and custom image sets. The dataset needs to include a variety of fire scenarios: controlled
fires, wildfires, and indoor fires, as well as various non-fire images to reduce false positives.
Factors such as lighting conditions, smoke presence, environmental factors, and diverse
camera angles are considered to ensure robustness across different conditions.
Fire Images: Collected from fire emergency services, real-world videos, and simulation data.
Non-Fire Images: Random outdoor and indoor settings without fire.
The images are preprocessed by resizing them to a standard resolution, typically 224x224
pixels, and normalized for training the models. Data augmentation techniques such as
rotation, scaling, flipping, and brightness variation are applied to improve model
generalization.
Convolutional Neural Networks are well-suited for image classification tasks due to their
ability to capture spatial hierarchies in image data. The CNN architecture designed for this
study includes several layers:
Convolutional Layers: The CNN model uses a series of convolutional layers with varying filter
sizes (3x3 or 5x5) to extract low- and high-level features. Each convolutional layer is followed
by a ReLU (Rectified Linear Unit) activation function.
Pooling Layers: Max pooling is applied after each convolutional block to reduce the spatial
dimensions of the feature maps and minimize overfitting. Pooling layers help maintain key
features while reducing computational complexity.
Batch Normalization and Dropout: Batch normalization layers are inserted after each
convolutional layer to stabilize and accelerate training. Dropout layers are used to reduce
overfitting by randomly setting a fraction of the activations to zero during training.
Fully Connected Layers: The final convolutional layers are flattened and passed through fully
connected layers to produce the final classification output.
Output Layer: A softmax activation function is applied to the output layer to classify whether
the input image contains fire or not.
The CNN is trained using categorical cross-entropy loss, optimized using the Adam optimizer
with an adaptive learning rate. The model is evaluated on validation data using accuracy,
precision, recall, and F1-score as performance metrics.
Dense Neural Networks, or fully connected networks, are also evaluated for comparison.
Unlike CNNs, which specialize in spatial feature learning, DNNs rely on fully connected
layers to process image data.
Input Layer: The preprocessed image data is flattened into one-dimensional vectors.
Hidden Layers: The DNN consists of multiple hidden layers, each followed by ReLU
activations. Dropout is used to regularize the network and prevent overfitting.
Batch Normalization: Batch normalization is applied to each hidden layer to improve
convergence speed and ensure model stability during training.
Output Layer: A softmax activation function is used in the final layer to output probabilities
for fire and non-fire classes.
Both CNN and DNN models are trained using backpropagation and the Adam optimizer. The
models undergo hyperparameter tuning to optimize their performance. Key hyperparameters
include the learning rate, batch size, and number of epochs. Early stopping is implemented to
prevent overfitting by halting training when the validation loss starts to increase.
5. Evaluation Metrics
6. Real-Time Implementation
For real-time fire detection, the trained CNN model is integrated into a surveillance system
where it processes video frames continuously. The model analyzes each frame, classifies it as
fire or non-fire, and triggers an alarm in case of fire detection. The system is designed to
handle real-time constraints by utilizing hardware acceleration through GPUs or specialized
AI inference hardware.
Conclusion
In this paper, we evaluated the performance of several deep learning models—CNN,
ResNet, LSTM, and DNN—on the task of facial emotion recognition using the FER2013
dataset. Our results demonstrate that CNNs and ResNets outperformed LSTM and DNN
models, particularly in terms of validation accuracy and generalization capabilities. CNNs
achieved the highest validation accuracy of approximately 65%, followed closely by
ResNets, which benefited from their residual learning architecture. In contrast, LSTMs and
DNNs struggled with image-based tasks, exhibiting lower accuracy and higher loss values.
The findings highlight the suitability of CNNs and ResNets for facial emotion
recognition due to their ability to effectively learn spatial features from images. LSTM’s
sequential processing and DNN’s fully connected architecture proved less capable of
handling image data, emphasizing the importance of choosing architectures designed for
spatial data when dealing with image classification tasks.
Future work can focus on further optimizing these models, exploring hybrid
architectures, or employing transfer learning to enhance performance. Additionally,
addressing model robustness and fairness in emotion recognition applications will be crucial
for deploying these systems in real
References
1. Alhassan, M. A., & Ali, M. H. (2020). Facial expression recognition using
convolutional neural networks: A review. Journal of Visual Communication and
Image Representation, 70, 102799. https://doi.org/10.1016/j.jvcir.2020.102799
2. Bansal, A., & Kaur, M. (2021). A comprehensive review on facial expression
recognition using deep learning approaches. International Journal of Machine
Learning and Computing, 11(5), 463-471.
https://doi.org/10.18178/ijmlc.2021.11.5.760
3. Chen, J., & Zhang, Z. (2018). Facial emotion recognition based on deep learning: A
review. ACM Computing Surveys, 51(1), 1-35. https://doi.org/10.1145/3158684
4. Dhall, A., Goecke, R., & Lucey, P. (2017). Emotion recognition in the wild: A survey
of the FER2013 challenge. ACM Transactions on Multimedia Computing,
Communications, and Applications, 14(3), 1-20. https://doi.org/10.1145/3088983
5. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
https://www.deeplearningbook.org/
6. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of GANs
for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
https://arxiv.org/abs/1710.10196
7. Liu, X., & Zhang, L. (2021). Facial expression recognition based on deep learning: A
review. IEEE Transactions on Affective Computing, 12(2), 382-398.
https://doi.org/10.1109/TAFFC.2020.2963137
8. Mollah, M. N., & Rahman, M. M. (2020). A survey on facial emotion recognition
techniques and databases. International Journal of Computer Applications, 975, 975-
8887. https://doi.org/10.5120/ijca2020920307
9. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once:
Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 779-788.
https://doi.org/10.1109/CVPR.2016.91
10. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the Gap
to Human-Level Performance in Face Verification. Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 1701-1708.
https://doi.org/10.1109/CVPR.2014.220