0% found this document useful (0 votes)
17 views10 pages

Journal (AI&ML)

This paper analyzes the effectiveness of Convolutional Neural Networks (CNNs) compared to Dense Neural Networks (DNNs) for real-time fire detection in images, highlighting the limitations of traditional fire detection systems. The study demonstrates that CNNs outperform DNNs in accuracy due to their ability to handle spatial patterns in images, suggesting that deep learning models can significantly enhance fire detection capabilities. The proposed methodology includes data preparation, model training, and real-time implementation, emphasizing the potential for improved early warning systems in fire safety.

Uploaded by

953622205021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Journal (AI&ML)

This paper analyzes the effectiveness of Convolutional Neural Networks (CNNs) compared to Dense Neural Networks (DNNs) for real-time fire detection in images, highlighting the limitations of traditional fire detection systems. The study demonstrates that CNNs outperform DNNs in accuracy due to their ability to handle spatial patterns in images, suggesting that deep learning models can significantly enhance fire detection capabilities. The proposed methodology includes data preparation, model training, and real-time implementation, emphasizing the potential for improved early warning systems in fire safety.

Uploaded by

953622205021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE

LEARNING
FIRE AND SMOKE DETECTION USING
CNN
A Journal Paper

Prepared by
HARI KRISHNAN K (953622205021)
HARISH S (953622205022)
MOHAMED ISLAAM K A (953622205025)
RAM PANDIAN G (953622205034)

In partial fulfilment for the award of the degree


degree of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY

RAMCO INSTITUTE OF TECHNOLOGY


RAJAPALAYAM – 626 117
ANNA UNIVERSITY: CHENNAI 600025
JUNE 2024
Abstract
This paper presents a comparative analysis of Convolutional Neural Networks (CNNs) and
Dense Neural Networks (DNNs) for real-time fire detection in images. Traditional fire detection
systems face challenges like false alarms and delayed responses, motivating the need for more
advanced techniques. CNNs and DNNs are explored due to their potential for accurate fire detection
through deep learning. A system framework involving data preprocessing, model development, and
evaluation is implemented to assess the effectiveness of both models. Results show that CNNs,
designed to handle spatial patterns in images, outperform DNNs in terms of accuracy. This study
demonstrates that deep learning models can be effectively integrated into real-time fire detection
systems, improving early warning capabilities.

Keywords
 Fire detection
 Convolutional Neural Networks (CNN)
 Dense Neural Networks (DNN)
 Deep Learning
 Image Processing
 Real-Time Systems
 Early Warning
 Artificial Intelligence

Introduction
Fire detection is a crucial aspect of safety, designed to prevent potential loss of
life, property damage, and environmental destruction. Traditional systems, such
as smoke detectors and heat sensors, have been in use for decades but often face
limitations, including false alarms and slow response times. In outdoor
environments or large spaces, these systems struggle with timely detection, and
environmental conditions like fog, dust, or steam can further degrade their
performance. This has driven the need for more advanced solutions, capable of
detecting fire accurately and rapidly across a wide variety of settings. Video-
based fire detection systems, which leverage real-time surveillance cameras,
have emerged as an alternative, offering the ability to monitor larger areas and
detect both flames and smoke. These systems, when enhanced with machine
learning, can further improve fire detection accuracy and response times.

With the advent of deep learning, particularly Convolutional Neural Networks


(CNNs), video-based fire detection has seen significant advancements. CNNs
excel at identifying complex visual patterns, such as flames or smoke, by
automatically learning features from the data, which makes them highly
effective for image-based detection tasks (LeCun et al. 2015). Dense Neural
Networks (DNNs) also play a role in classification tasks, though their fully
connected structure makes them less specialized for images than CNNs. This
study explores the use of both CNNs and DNNs for fire detection, comparing
their effectiveness in real-time applications. The goal is to develop a fire
detection system that combines speed, accuracy, and efficiency, potentially
overcoming the shortcomings of traditional sensor-based systems and enhancing
fire safety across diverse environments.

Related Work
The landscape of fire detection systems has evolved dramatically, particularly with
the integration of advanced machine learning and deep learning techniques. Traditionally, fire
detection relied heavily on physical sensors such as smoke detectors and thermal sensors,
which have inherent limitations, including susceptibility to false alarms and delayed response
times. However, the advent of image processing and deep learning has paved the way for
more reliable and efficient fire detection methodologies. For instance, LeCun et al. (2015)
outlined the potential of deep learning techniques, particularly Convolutional Neural
Networks (CNNs), in image classification tasks. Their findings have been foundational in
recognizing that CNNs are adept at automatically learning and extracting features from
images, making them particularly well-suited for detecting visual anomalies such as fire and
smoke.

Several recent studies have built upon these foundational theories, further enhancing
fire detection capabilities. Gagliardi and Saponara (2020a) proposed a distributed video
antifire surveillance system leveraging Internet of Things (IoT) embedded computing nodes.
This approach emphasized real-time monitoring and responsiveness, showcasing how
distributed networks can improve the accuracy and speed of fire detection systems. By
utilizing IoT technologies, the proposed system enables remote surveillance and the
integration of multiple sensor modalities, thereby enhancing situational awareness in diverse
environments.

In addition, the work by Rafiee et al. (2011) demonstrated the application of wavelet
analysis in conjunction with disorder characteristics for fire and smoke detection. Their study
highlighted how analyzing the frequency domain of images can provide valuable insights into
the presence of fire, which can be particularly useful in scenarios where traditional sensors
may fail. By focusing on the characteristics of fire and smoke, this approach reduces false
positives and improves detection accuracy. Moreover, Vijayalakshmi and Muruganand
(2017) utilized background subtraction methods in video images, further refining detection
techniques by enabling the system to differentiate between normal environmental changes
and potential fire incidents. This method, when combined with machine learning algorithms,
allowed for real-time detection and provided a substantial improvement over traditional fire
detection systems.

The literature also explores various other methodologies, including sensor fusion
techniques that combine multiple data sources to enhance fire detection reliability. Saponara
et al. (2014) presented an early video smoke detection system designed for rolling stock,
emphasizing the necessity of tailored solutions for specific environments. Their work
illustrated that context-specific adaptations could significantly improve detection efficacy. In
parallel, Celik et al. (2007) explored image processing-based approaches for fire and smoke
detection without relying on traditional sensors. Their research highlighted the importance of
visual data analysis and the capability of deep learning models to understand complex visual
information, indicating a shift towards a more data-driven approach to fire safety.

As deep learning techniques continue to evolve, there is a growing emphasis on real-


time processing capabilities. This shift is crucial for fire detection systems, where timely
identification can mean the difference between containment and disaster. The integration of
models that leverage both spatial and temporal features in video streams has shown promise
in addressing challenges related to false alarms and improving detection speed. Future
research in this field is likely to focus on refining these models further, exploring hybrid
approaches that combine CNNs with other architectures to leverage their respective strengths.
Overall, the body of work underscores a clear trend toward utilizing sophisticated machine
learning techniques to create smarter, more responsive fire detection systems that can operate
effectively in dynamic environments.

The evolution of fire detection methodologies is closely linked to the broader


advancements in computer vision and machine learning, reflecting a paradigm shift from
reliance on physical sensors to sophisticated data-driven solutions. The collective efforts of
researchers have made significant strides in enhancing detection accuracy and response times,
paving the way for innovative systems capable of operating in increasingly complex and
varied environments. This body of literature not only highlights the current state of fire
detection research but also serves as a foundation for future explorations in the field,
promising more efficient and effective solutions for fire safety and prevention.

Proposed Methodology

The proposed fire detection system leverages two deep learning architectures:
Convolutional Neural Networks (CNNs) and Dense Neural Networks (DNNs). Each model is
designed, trained, and evaluated for performance in identifying fire in real-time video or
image data. The following subsections describe the detailed steps involved in this approach:
1. Dataset Collection and Preparation

The dataset consists of fire and non-fire images sourced from publicly available databases
and custom image sets. The dataset needs to include a variety of fire scenarios: controlled
fires, wildfires, and indoor fires, as well as various non-fire images to reduce false positives.
Factors such as lighting conditions, smoke presence, environmental factors, and diverse
camera angles are considered to ensure robustness across different conditions.

 Fire Images: Collected from fire emergency services, real-world videos, and simulation data.
 Non-Fire Images: Random outdoor and indoor settings without fire.

The images are preprocessed by resizing them to a standard resolution, typically 224x224
pixels, and normalized for training the models. Data augmentation techniques such as
rotation, scaling, flipping, and brightness variation are applied to improve model
generalization.

2. CNN Architecture Design

Convolutional Neural Networks are well-suited for image classification tasks due to their
ability to capture spatial hierarchies in image data. The CNN architecture designed for this
study includes several layers:

 Convolutional Layers: The CNN model uses a series of convolutional layers with varying filter
sizes (3x3 or 5x5) to extract low- and high-level features. Each convolutional layer is followed
by a ReLU (Rectified Linear Unit) activation function.
 Pooling Layers: Max pooling is applied after each convolutional block to reduce the spatial
dimensions of the feature maps and minimize overfitting. Pooling layers help maintain key
features while reducing computational complexity.
 Batch Normalization and Dropout: Batch normalization layers are inserted after each
convolutional layer to stabilize and accelerate training. Dropout layers are used to reduce
overfitting by randomly setting a fraction of the activations to zero during training.
 Fully Connected Layers: The final convolutional layers are flattened and passed through fully
connected layers to produce the final classification output.
 Output Layer: A softmax activation function is applied to the output layer to classify whether
the input image contains fire or not.

The CNN is trained using categorical cross-entropy loss, optimized using the Adam optimizer
with an adaptive learning rate. The model is evaluated on validation data using accuracy,
precision, recall, and F1-score as performance metrics.

3. DNN Architecture Design

Dense Neural Networks, or fully connected networks, are also evaluated for comparison.
Unlike CNNs, which specialize in spatial feature learning, DNNs rely on fully connected
layers to process image data.

 Input Layer: The preprocessed image data is flattened into one-dimensional vectors.
 Hidden Layers: The DNN consists of multiple hidden layers, each followed by ReLU
activations. Dropout is used to regularize the network and prevent overfitting.
 Batch Normalization: Batch normalization is applied to each hidden layer to improve
convergence speed and ensure model stability during training.
 Output Layer: A softmax activation function is used in the final layer to output probabilities
for fire and non-fire classes.

4. Model Training and Hyperparameter Tuning

Both CNN and DNN models are trained using backpropagation and the Adam optimizer. The
models undergo hyperparameter tuning to optimize their performance. Key hyperparameters
include the learning rate, batch size, and number of epochs. Early stopping is implemented to
prevent overfitting by halting training when the validation loss starts to increase.

5. Evaluation Metrics

The performance of the models is evaluated using several metrics:

 Accuracy: Measures the percentage of correctly classified images.


 Precision: Assesses the proportion of correctly identified fire cases out of all fire predictions.
 Recall (Sensitivity): Measures the ability to detect all actual fire instances.
 F1-Score: The harmonic mean of precision and recall, providing a balanced evaluation
metric.
 Confusion Matrix: Used to visualize the number of true positives, false positives, true
negatives, and false negatives for both fire and non-fire classifications.

6. Real-Time Implementation

For real-time fire detection, the trained CNN model is integrated into a surveillance system
where it processes video frames continuously. The model analyzes each frame, classifies it as
fire or non-fire, and triggers an alarm in case of fire detection. The system is designed to
handle real-time constraints by utilizing hardware acceleration through GPUs or specialized
AI inference hardware.

Results and Discussion


This section presents a detailed analysis of the performance of four deep learning models—
Convolutional Neural Networks (CNNs), Residual Networks (ResNets), Long Short-Term
Memory (LSTM) networks, and Deep Neural Networks (DNNs)—in the context of facial
emotion recognition using the FER2013 dataset. The evaluation is based on accuracy and loss
metrics, as depicted in the provided accuracy and loss comparison plots.
Fig 2. Accuracy and Loss comparison
1. Model Performance Overview
The plots illustrate the training and validation accuracy, as well as the training and validation
loss, across 50 epochs for each model. This comprehensive analysis sheds light on the
effectiveness of each architecture in recognizing facial expressions and highlights critical
insights regarding their strengths and limitations.
2. Accuracy Analysis
 Convolutional Neural Networks (CNNs): The CNN model demonstrates a steady
increase in both training and validation accuracy, reaching approximately 65%
validation accuracy by the end of the training epochs. This is indicative of the model's
robust ability to learn spatial features relevant to facial expressions. The consistent
growth in accuracy throughout the training process suggests effective feature
extraction, essential for distinguishing between various emotional states.
 Residual Networks (ResNets): The ResNet model exhibits slightly higher validation
accuracy compared to the CNN model, peaking just above 60%. The architecture's
residual blocks effectively mitigate the vanishing gradient problem, enabling deeper
learning and better generalization. The validation accuracy's gradual ascent implies
that ResNets, while computationally intensive, significantly enhance the model’s
capacity to recognize complex patterns in facial expressions.
 Long Short-Term Memory Networks (LSTMs): The LSTM model displays lower
overall accuracy, with validation accuracy reaching only around 50%. The LSTM
architecture is primarily designed for sequence data, making it less suited for image-
based tasks. This underperformance highlights a critical limitation in applying LSTMs
to spatial data like images, where temporal dependencies are less relevant than the
local features extracted from image pixels.
 Deep Neural Networks (DNNs): The DNN model demonstrates the least
effectiveness, with validation accuracy hovering around 40%. The absence of
convolutional layers in DNNs is a significant drawback when processing image data,
as DNNs typically rely on fully connected layers that do not capture the spatial
hierarchies found in images. This limitation is evident in the low validation accuracy
and indicates that traditional neural networks are insufficient for tasks that require
spatial understanding, such as facial emotion recognition.
3. Loss Analysis
 Training Loss:
o CNNs and ResNets: Both models show a consistent decline in training loss,
indicating effective learning. The CNN model achieves a final training loss of
approximately 1.0, while ResNet's loss decreases even further, suggesting that
the ResNet architecture captures features more effectively.
o LSTMs and DNNs: In contrast, the training loss for LSTMs and DNNs does
not exhibit a comparable decline, with DNNs maintaining a higher training
loss throughout training. This inconsistency indicates inadequate learning and
suggests that these models struggle to represent the data effectively.
 Validation Loss:
o CNNs and ResNets: Validation loss for CNN and ResNet models also trends
downward, affirming their generalization capabilities and highlighting their
suitability for real-world applications. The stability of validation loss across
epochs reinforces the reliability of these models in identifying emotional states
from facial expressions.
o LSTMs and DNNs: The validation loss for the LSTM and DNN models
fluctuates significantly, indicating potential overfitting or instability in the
learning process. These patterns raise concerns about the robustness of these
models when applied to unseen data, emphasizing their limitations in practical
implementations.
4. Comparative Summary
The comparative analysis underscores the effectiveness of CNNs and ResNets for facial
emotion recognition, primarily due to their ability to learn spatial hierarchies and complex
features from images. Their performance demonstrates a higher capacity for generalization,
critical for applications in real-time emotion detection.
In contrast, the lower accuracy and higher loss values associated with LSTM and DNN
models indicate that these architectures are less suited for image data. The findings suggest
that while LSTMs excel in handling sequential data, their application in image recognition
tasks is limited. Similarly, DNNs, lacking convolutional layers, fall short in effectively
processing and classifying images.
5. Practical Implications and Future Work
The results of this study have significant implications for the deployment of emotion
recognition systems in various domains, such as human-computer interaction, mental health
assessment, and security applications. The success of CNNs and ResNets opens avenues for
real-time implementation in these fields.
Future work should focus on optimizing hyperparameters and exploring advanced
architectures, such as hybrid models that combine the strengths of different neural networks.
Additionally, transfer learning from pre-trained models could further enhance accuracy and
reduce training times, especially in cases where labeled data is scarce. Addressing issues of
robustness and bias in model predictions will also be essential for ensuring fair and accurate
emotion recognition across diverse populations and conditions.

Conclusion
In this paper, we evaluated the performance of several deep learning models—CNN,
ResNet, LSTM, and DNN—on the task of facial emotion recognition using the FER2013
dataset. Our results demonstrate that CNNs and ResNets outperformed LSTM and DNN
models, particularly in terms of validation accuracy and generalization capabilities. CNNs
achieved the highest validation accuracy of approximately 65%, followed closely by
ResNets, which benefited from their residual learning architecture. In contrast, LSTMs and
DNNs struggled with image-based tasks, exhibiting lower accuracy and higher loss values.
The findings highlight the suitability of CNNs and ResNets for facial emotion
recognition due to their ability to effectively learn spatial features from images. LSTM’s
sequential processing and DNN’s fully connected architecture proved less capable of
handling image data, emphasizing the importance of choosing architectures designed for
spatial data when dealing with image classification tasks.
Future work can focus on further optimizing these models, exploring hybrid
architectures, or employing transfer learning to enhance performance. Additionally,
addressing model robustness and fairness in emotion recognition applications will be crucial
for deploying these systems in real
References
1. Alhassan, M. A., & Ali, M. H. (2020). Facial expression recognition using
convolutional neural networks: A review. Journal of Visual Communication and
Image Representation, 70, 102799. https://doi.org/10.1016/j.jvcir.2020.102799
2. Bansal, A., & Kaur, M. (2021). A comprehensive review on facial expression
recognition using deep learning approaches. International Journal of Machine
Learning and Computing, 11(5), 463-471.
https://doi.org/10.18178/ijmlc.2021.11.5.760
3. Chen, J., & Zhang, Z. (2018). Facial emotion recognition based on deep learning: A
review. ACM Computing Surveys, 51(1), 1-35. https://doi.org/10.1145/3158684
4. Dhall, A., Goecke, R., & Lucey, P. (2017). Emotion recognition in the wild: A survey
of the FER2013 challenge. ACM Transactions on Multimedia Computing,
Communications, and Applications, 14(3), 1-20. https://doi.org/10.1145/3088983
5. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
https://www.deeplearningbook.org/
6. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of GANs
for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
https://arxiv.org/abs/1710.10196
7. Liu, X., & Zhang, L. (2021). Facial expression recognition based on deep learning: A
review. IEEE Transactions on Affective Computing, 12(2), 382-398.
https://doi.org/10.1109/TAFFC.2020.2963137
8. Mollah, M. N., & Rahman, M. M. (2020). A survey on facial emotion recognition
techniques and databases. International Journal of Computer Applications, 975, 975-
8887. https://doi.org/10.5120/ijca2020920307
9. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once:
Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 779-788.
https://doi.org/10.1109/CVPR.2016.91
10. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the Gap
to Human-Level Performance in Face Verification. Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 1701-1708.
https://doi.org/10.1109/CVPR.2014.220

You might also like