Natural disasters like earthquakes and floods, along with man-made crises such as industrial accidents, have long posed significant challenges to societies worldwide. Among these, fire is one of the most dangerous and unpredictable. Fire can spread incredibly fast, turning a small flame into a large, uncontrollable blaze within minutes, destroying everything in its path. This not only threatens people’s lives and homes but also causes long-term damage to the environment, leading to deforestation, air pollution, and the release of harmful chemicals. Fires also have serious economic impacts, with billions of dollars lost every year due to fire-related damage [
1]. Fires can have lasting effects on communities, forcing people to leave their homes, destroying jobs, and causing deep psychological harm [
2]. These effects can last for generations, changing the way entire regions function socially and economically [
1,
3]. Given the rapid spread of fires, early detection is essential; it can mean the difference between successfully controlling the fire and allowing it to escalate into a major disaster. Traditional tools like smoke detectors and heat sensors are helpful, but they often do not detect fires until the heat or smoke reaches the sensors, especially in large or open areas. These systems also have trouble detecting fires in complex environments like thick forests or large industrial sites, where early detection is crucial [
4]. To better prevent fire disasters, we need more advanced and accurate systems that can spot fires just as they are starting, allowing for quicker intervention.
Image-based systems hold promise, but existing models often lack the necessary precision in critical situations [
10]. These systems face significant challenges, including low detection accuracy in changing lighting conditions, difficulty distinguishing between fire and non-fire objects, and inefficiencies in processing real-time data [
11]. In high-stakes scenarios where every second is crucial, these shortcomings can have devastating consequences. False alarms pose a serious problem because they can desensitize response teams, resulting in delays when responding to actual emergencies [
12]. Moreover, in areas where fire detection is crucial, such as densely populated urban centers or remote forest regions, the failure of these systems can result in catastrophic outcomes. The increasing threat posed by fires, combined with the limitations of current detection technologies, highlight the need for a more advanced solution. As climate change intensifies and urbanization continues to expand, the frequency and severity of fire incidents are expected to rise. This reality underscores the urgency of developing a fire detection system that is not only accurate but also robust enough to operate effectively in diverse environments. In response to this critical need, we have developed a state-of-the-art deep learning approach tailored to meet these demands. By utilizing computer vision, our solution offers a more precise and reliable method of fire detection, ensuring that fires are identified and managed before they cause significant harm.
Previous research has explored various frameworks such as YOLOv3, YOLOv5, R-CNN, vanilla CNN, and dual CNN models for fire detection [
2]. However, these models often face challenges related to accuracy and speed, particularly in high-pressure situations [
2,
13]. Several studies have pointed out the limitations of these fire detection models. For instance, the YOLOv3 algorithm-based fire detection method was introduced, which was adapted for real-time high-speed detection on a Banana Pi M3 board [
2]. The researchers used data augmentation techniques like rotating labeled images and adding fire-like images to the dataset to enhance training. Independent logistic classifiers and binary cross-entropy loss were also employed for class predictions [
2]. However, the method sometimes misclassified non-fire objects like neon signs and headlights as fires, especially at night. Nighttime blurring also caused errors, making it difficult to distinguish between actual fires and other light sources [
2]. The YOLOv2-based model was utilized for real-time fire and smoke detection in various environments [
1]. Research was conducted with indoor and outdoor fire and smoke image sets, using a Ground Truth Labeler app for data labeling and also implemented on a low-cost embedded device, Jetson Nano, for real-time processing [
1]. However, background objects with similar color properties caused false detections, and performance might be limited by the capabilities of the embedded device used, according to the research [
1]. Additionally, a fire detection method was proposed using an improved YOLOv4 network and was adapted for real-time monitoring on a Banana Pi M3 board [
4]. The dataset was expanded using image transformation techniques to improve detection accuracy. Despite improvements, the method encountered false positives, especially in scenarios with fire-like lights and larger image sizes, which increased processing time, which could be a limitation for real-time applications [
4]. Furthermore, a study proposed a fire and smoke detection method using dilated convolutional neural networks to enhance feature extraction and reduce false alarms. A custom dataset of fire and smoke images was created and used for training and evaluation. Despite this, the method struggled with early-stage detection when fire and smoke pixel values were similar to the background, especially in cloudy weather [
14]. In another study, a dual deep learning framework utilized two deep CNNs to extract image-based features such as color, texture, and edges and motion-based features like optical flow. Researchers also utilized Superpixel Segmentation for smoke regions in images to extract features and support vector machine combined features from both frameworks for classification. Despite this, the method struggled with environmental scenarios like fog, clouds, and sandstorms, and the dataset was also limited, according to the researchers [
15]. These inconsistencies made it challenging to rely on these models in critical situations where reliability is always necessary. Furthermore, in a different study, researchers utilized aerial 360-degree cameras to capture wide-field-of-view images. The DeepLab V3+ networks were applied for flame and smoke segmentation. They also implemented an adaptive method to reduce false positives by analyzing environmental appearance. However, the method was affected by weather conditions, the aerial device required frequent recharging, and the system was not capable of detecting fires at night [
3]. In other studies, some researchers utilized a multifunctional AI framework and the Direct-MQTT protocol to enhance fire detection accuracy and minimize data transfer delays. This approach also applied a CNN algorithm for visual intelligence and used the Fire Dynamics Simulator for testing. However, it did not consider sensor failures and used static thresholds [
16]. In another study, researchers utilized the ELASTIC-YOLOv3 algorithm to quickly and accurately detect fire candidate areas and combined it with a random forest classifier to verify fire candidates. Additionally, they used a temporal fire-tube and bag-of-features histogram to reflect the dynamic characteristics of nighttime flames. However, the approach faced limitations in real-time processing due to the computational demands of combining CNN with RNN or LSTM, and it struggled with distinguishing fire from fire-like objects in nighttime urban environments [
17]. The Intermediate Fusion VGG16 model and the Enhanced Consumed Energy-Leach protocol were utilized in a study for the early detection of forest fires. Drones were employed to capture RGB and IR images, which were then processed using the VGG16 model. However, the study faced limitations due to the lack of real-world testing and resource constraints that hindered comprehensive evaluation [
10]. In another computer vision-based study, researchers utilized a YOLOv5 fire detection algorithm based on an attention-enhanced ghost model, mixed convolutional pyramids, and flame-center detection. It incorporated Ghost bottlenecks, SECSP attention modules, GSConv convolution, and the SIoU loss function to enhance accuracy. However, the limitations included potential challenges in real-time detection due to high computational complexities and the need for further validation in diverse environments [
11]. In a different study based on CNN, the researchers modified the CNN for forest fire recognition, integrating transfer learning and a feature fusion algorithm to enhance detection accuracy. The researchers utilized a diverse dataset of fire and non-fire images for training and testing. However, the study faced limitations due to the small sample size of the dataset and the need for further validation in real-world scenarios to ensure robustness and generalization. In a different study to detect fire and smoke, the researchers utilized a capacitive particle-analyzing smoke detector for very early fire detection, employing a multiscale smoke particle concentration detection algorithm. This method involved capacitive detection of cell structures and time-frequency domain analysis to calculate particle concentration. However, the study faced limitations in distinguishing particle types and struggled with false alarms in complex environments.
In our pursuit of a more reliable solution, we turned to the robust capabilities of YOLOv8, a model renowned for its superior object detection abilities. Our goal was to enhance fire detection performance by optimizing the architecture of YOLOv8 to better identify fire-specific visual cues. Through extensive training on a comprehensive fire and smoke image dataset, the modified YOLOv8 model demonstrated improved accuracy. This advanced model excels at detecting not only flames but also smoke, which is often an early indicator of larger fires, thereby providing an early warning that can prevent a small incident from escalating into a full-blown disaster.