0% found this document useful (0 votes)
9 views

machine learning

machine learing

Uploaded by

mnarmadha366
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

machine learning

machine learing

Uploaded by

mnarmadha366
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

CHAPTER I

INTRODUCTION

In recent years, the devastating impact of wildfires has become increasingly evident worldwide.
These natural disasters pose a significant threat to life, property, and the environment. Detecting fires at
their early stages is crucial for effective and timely response, minimizing the potential damage and
facilitating swift firefighting efforts. With the advancements in machine learning, researchers and
engineers have turned to this technology to develop early fire detection systems that can improve the
efficiency and accuracy of fire detection. Machine learning, a subfield of artificial intelligence, empowers
computers to learn from data and make predictions or take actions without being explicitly programmed.
By analyzing large volumes of data, machine learning algorithms can identify patterns and make informed
decisions based on the acquired knowledge. This capability has been harnessed to develop intelligent fire
detection systems that leverage various data sources, such as imagery, temperature readings, and
environmental factors, to identify and predict the occurrence of fires.
Traditional fire detection systems often rely on human observation or rely on fixed sensors in
specific locations. These methods can be limited in their effectiveness, especially in vast or remote areas
where immediate human intervention may not be feasible. Machine learning-based fire detection systems
offer several advantages over traditional approaches. They can process real-time data from diverse sources,
including satellite imagery, thermal sensors, and weather data, enabling early detection of fires and
enhancing situational awareness. One of the primary advantages of using machine learning for fire
detection is the ability to detect fires at their nascent stages. By training algorithms on historical fire data,
these systems can learn to recognize patterns and anomalies associated with fire outbreaks. The algorithms
can then generalize this knowledge to detect new fires, even in areas with different environmental
conditions or vegetation types. This adaptability makes machine learning-based fire detection systems
valuable tools in mitigating the impact of wildfires across various landscapes.

1.1JUPYTER
Jupyter, previously known as IPython Notebook, is a web-based, interactive development
environment. Originally developed for Python, it has since expanded to support over 40 other
programming languages including Julia and R.
Jupyter allows for notebooksto be written that contain text, live code, images, and equations. These
notebooks can be shared, and can even be hosted on GitHubfor free. For each section of this tutorial, you

1
can download a Juypter notebook that allows you to edit and experiment with the code and examples for
each topic. Jupyter is part of the Anaconda distribution; it can be started from the command line using the
jupyter command:

1.2 MACHINE LEARNING


We will now move on to the task of machine learning itself. In the following sections we will
describe how to use some basic algorithms, and perform regression, classification, and clustering on some
freely available medical datasets concerning breast cancer and diabetes, and we will also take a look at a
DNA microarray dataset.

SCIKIT-LEARN
SciKit-Learn provides a standardised interface to many of the most commonly used machine learning
algorithms, and is the most popular and frequently used library for machine learning for Python. As well as
providing many learning algorithms, SciKit-Learn has a large number of convenience functions for
common preprocessing tasks (for example, normalisation or k-fold cross validation). SciKit-Learn is a very
large software library.
CLUSTERING
Clustering algorithms focus on ordering data together into groups. In general clustering algorithms
are unsupervised—they require no y response variable as input. That is to say, they attempt to find groups
or clusters within data where you do not know the label for each sample. SciKit-Learn have many

2
clusteringalgorithms, but in this section we will demonstrate hierarchical clustering on a DNA expression
microarray dataset using an algorithm from the SciPy library.
We will plot a visualisation of the clustering using what is known as a dendrogram, also using the
SciPy library. The goal is to cluster the data properly in logical groups, in this case into the cancer types
represented by each sample’s expression data. We do this using agglomerative hierarchical clustering,
using Ward’s linkage method:

1.3 CLASSIFICATION

. We will work on the Wisconsin breast cancer dataset, split it into a training set and a test set, train
a Support Vector Machine with a linear kernel, and test the trained model on an unseen dataset. The
Support Vector Machine model should be able to predict if a new sample is malignant or benign based on
the features of a new, unseen sample:

3
You will notice that the SVM model performed very well at predicting the malignancy of new,
unseen samples from the test set—this can be quantified nicely by printing a number of metrics using the
classification report function. Here, the precision, recall, and F1 score (F1 = 2·
precision·recall/precision+recall) for each class is shown. The support column is a count of the number of
samples for each class. Support Vector Machines are a very powerful tool for classification. They work
well in high dimensional spaces, even when the number of features is higher than the number of samples.
However, their running time is quadratic to the number of samples so large datasets can become difficult to
train. Quadratic means that if you increase a dataset in size by 10 times, it will take 100 times longer to
train. Last, you will notice that the breast cancer dataset consisted of 30 features. This makes it difficult to
visualize or plot the data.

DIMENSIONALITY REDUCTION
Another important method in machine learning, and data science in general, is dimensionality
reduction. For this example, we will look at the Wisconsin breast cancer dataset once again. The dataset
consists of over 500 samples, where each sample has 30 features. The features relate to images of a fine
needle aspirate of breast tissue, and the features describe the characteristics of the cells present in the
images. All features are real values. The target variable is a discrete value (either malignant or benign) and
is therefore a classification dataset. You will recall from the Iris example in Sect. 7.3 that we plotted a
scatter matrix of the data, where each feature was plotted against every other feature in the dataset to look
for potential correlations (Fig. 3). By examining this plot you could probably find features which would
separate the dataset into groups. Because the dataset only had 4 features we were able to plot each feature
against each other relatively easily. However, as the numbers of features grow, this becomes less and less
feasible, especially if you consider the gene expression example in Sect. 9.4 which had over 6000 features.
One method that is used to handle data that is highly dimensional is Principle Component Analysis, or
PCA. PCA is an unsupervised algorithm for reducing the number of dimensions of a dataset. For example,
for plotting purposes you might want to reduce your data down to 2 or 3 dimensions, and PCA allows you
to do this by generating components, which are combinations of the original features, that you can then use
to plot your data.PCA is an unsupervised algorithm. You supply it with your data, X, and you specify the
number of components you wish to reduce its dimensionality to. This is known as transforming the data:

4
1.3.1 NEURAL NETWORKS AND DEEP LEARNING
In this section we will use Keras to build a simple neural network to classify theWisconsin breast
cancer dataset that was described earlier. Often, deep learning algorithms and neural networks are used to
classify images—convolutional neural networks are especially used for image related classification.
However, they can of course be used for text or tabular-based data as well. In this we will build a standard
feed-forward, densely connected neural network and classify a text-based cancer dataset in order to
demonstrate the framework’susage.
In this example we are once again using the Wisconsin breast cancer dataset, which consists of 30 features
and 569 individual samples. To make it more challenging for the neural network, we will use a training set
consisting of only 50% of the entire dataset, and test our neural network on the remaining 50% of the data.
Note,Keras is not installed as part of the Anaconda distribution, to install it use pip:

Keras additionally requires either Theano or TensorFlow to be installed. In the examples in this
chapter we are using Theano as a backend, however the code will work identically for either backend. You
can install Theano using pip, but it has a number of dependencies that must be installed first. Refer to the
Theano and TensorFlow documentation for more information [12]. Keras is a modular API. It allows you
to create neural networks by building a stack of modules, from the input of the neural network, to the
output of the neural network, piece by piece until you have a complete network. Also, Keras can be
configured to use your Graphics Processing Unit, or GPU. This makes training neural networks far faster
than if we were to use a CPU. We begin by importing Keras:

5
We may want to view the network’s accuracy on the test (or its loss on the training set) over time
(measured at each epoch), to get a better idea how well it is learning. An epoch is one complete cycle
through the training data.
Fortunately, this is quite easy to plot as Keras’ fit function returns a history object which we can
use to do exactly this:

This will result in a plot similar to that shown. Often you will also want to plot the loss on the test
set and training set, and the accuracy on the test set and training set.
Plotting the loss and accuracy can be used to see if you are over fitting (you experience tiny loss on the
training set, but large loss on the test set) and to see when your training has plateaued.

1.4 OBJECTIVE STATEMENT

In the proposed system XG BOOST and RF is used to compare the performance of different ML
tools. In this case we used test data accuracy to tell which model is good.

6
SCOPE OF THE PROJECT:

The advanced big data implementation is collecting the data and ML models like XG BOOST and
Random forest with the data related to environment, aids in building water quality prediction models. This
paper analyses two prediction models developed using AU and big data techniques and their experimental
results of water prediction and evaluation. Performance of both the model are discussed and compared.

PROBLEM STATEMENT:
The problem addressed in this paper is the need for an early fire detection system that can
accurately and reliably detect fires at an early stage. The existing fire detection systems, which primarily
rely on smoke and heat sensors, may have limitations in terms of accuracy, response time, and false alarm
rates. Therefore, there is a need to explore the application of machine learning techniques to develop an
intelligent fire detection system that can overcome these limitations and contribute to the reduction of fire
hazards and promotion of safety in buildings and public spaces. The specific objectives include data
acquisition, pre-processing, feature extraction, classifier design, and testing stages of building a robust and
reliable fire detection system.
OBJECTIVE:
The objective is to develop an intelligent fire detection system using machine learning techniques.
This involves acquiring a diverse dataset of fire-related data, preprocessing the data to improve its quality,
extracting relevant features, designing a classifier, and testing the system's performance. The goal is to
accurately detect fires at an early stage and contribute to reducing fire hazards and promoting safety in
buildings and public spaces.

CHAPTER 2

SURVEY OF TECHNOLOGIES

7
2.1LITERATURE SURVEY

TITLE: Experimental Study on Kitchen Fire Accidents in Different Scenarios

Author: Xiaoyuan Xu; Pengfei Wang; Nianhao Yu; Hongya Zhu

YEAR: - 2019

ABSTRACT:

In this paper, a real-sized fire test platform for home kitchen is built, and oil pan fire, kitchen flue
fire and cabinet fire tests are carried out on this platform. The evolution characteristics of different fire
accidents in home kitchen are studied through the change and development of temperature, smoke and fire
situation. The following conclusions are drawn from the experiment: the time of igniting 0.5L, 1.0L and
2.5L cooking oil by using gas stove fire is 200, 480 and 742s, respectively; when the range hood is open,
the range hood can be ignited by the cooking oil fire above 1.0L, thus causing range hood fire. It is very
difficult to extinguish cooking oil fire and kitchen flue fire with dry powder fire extinguisher, and it is very
easy to re-ignite. Water-based fire extinguisher specially used for cooking oil fire is needed. Kitchen flue
fire can ignite all combustibles in kitchen within 3 minutes, and reaching the state of flashover. The results
of this study can provide reference for kitchen fire prevention, detection, disposal and other technologies.

TITLE: Research on Image Fire Detection Based on Support Vector Machine

Author: Ke Chen; Yanying Cheng; Hui Bai; Chunjie Mou; Yuchun Zhang

YEAR: - 2020

ABSTRACT:
8
In order to detect and alarm early fire timely and effectively, traditional temperature and smoke fire
detectors are vulnerable to environmental factors such as the height of monitoring space, air velocity, dust.
An image fire detection algorithm based on support vector machine is proposed by studying the features of
fire in digital image. Firstly, the motion region is extracted by the inter-frame difference method and
regarded as the Suspected fire area. Then, the uniform size is sampled again. Finally, the flame color
moment feature and texture feature are extracted and input into the support vector machine for
classification and recognition. Data sets were formed by collecting Internet resources and fire videos taken
by oneself and the trained support vector machine was tested. The test results showed that the algorithm
can detect early fire more accurately.

TITLE: Texture Analysis of Smoke for RealTime Fire Detection

Author: Yu Chunyu; Zhang Yongming; Fang Jun; Wang Jinjun

YEAR: - 2021

ABSTRACT:

9
Since the texture is an important feature of smoke, a novel method of texture analysis is proposed
for real-time fire smoke detection. The texture analysis is based on gray level co-occurrence matrices
(GLCM) and can distinguish smoke features from other none fire disturbances. For the realization of real-
time fire detection, block processing technique is adopted and the computation of texture features is done
to every block of image. Neural network is used to classify smoke texture features from none-smoke
features and the fire alarm trigger is set according to the total smoke blocks in one frame. The accuracy of
the method is discussed as a function of frames in the end.

TITLE: The Application of Water Mist Fire Extinguishing System in Bus

Author: Shuchao Li; Dongxing Yu; Zongyu Ling; Wei Ding

YEAR: - 2022

Abstract:

10
Based on the characteristics of bus fire, the applicability of water mist extinguishing bus fire was
analyzed. The structures of self-contained water mist fire extinguishing system and pump supplied system
were summarized. Taking a 12-meter bus as an example, the application of pump supplied water mist fire
extinguishing system using in bus cabin was introduced in detail. The fire extinguishing efficiency of
water mist using in buses was verified by full scale fire test. The flame was extinguished 11 seconds after
the system started and the average temperature of cabin was 39.9° 58 seconds later. Technical guidance for
the application and design of water mist system using in bus is provided in this paper.

CHAPTER 3

3.1 HARDWARE REQUIREMENTS


The hardware requirements may serve as the basis for a contract for the implementation of the system
and should therefore be a complete and consistence specification of the whole system. They are used by
software engineers as the starting point for the system design.It shows what the system does and not how it
should be implemented.
11
PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 500 GB

3.2 SOFTWARE REQUIREMENTS


The software requirements documents is the specification of the system.It should include both a
definition and a specification of requirements.It is a set of what the system should do rather than how it
should doit .The software requirements provide a basis for creating the software requirement
specification.It is useful in estimating cost,planning team activities,perfoming tasks and tracking the
team’s and tracking the team’s progress throughout the development activity.

PYTHON IDE : Anaconda, Jupyter Notebook

PROGRAMMING LANGUAGE : Python

3.3PROPOSED SYSTEM
Both decision tree and random forest algorithms provide a flexible and effective approach for fire
detection, capturing complex patterns and improving accuracy. They can handle a variety of features and
adapt to different fire scenarios. The system evaluates their performance, such as accuracy and other
evaluation metrics, to ensure reliable fire detection.

3.3.1ADVANTAGES OF PROPOSED SYSTEM


1. Interpretability.
2. Improved accuracy.

3.4 EXISTING SYSTEM


In the existing fire detection systems, traditional methods such as smoke and heat sensors are commonly
employed. These systems detect fires based on changes in smoke density or the presence of high temperatures.

12
3.4.1 DISADVANTAGES OF EXISTING SYSTEM
1. Limited Accuracy.
2. Response Time.
3. Maintenance and False Alarms

3.5 GENERAL

3.5.1 ANACONDA
It is a free and open-source distribution of the Python and R programming languages for
scientific computing (data science, machine learning applications, large-scale data processing, predictive
analytics, etc.), that aims to simplify package management and deployment
Anaconda distribution comes with more than 1,500 packages as well as the Conda package and
virtual environment manager. It also includes a GUI, Anaconda Navigator, as a graphical alternative to the
Command Line Interface (CLI). The big difference between Conda and the pip package manager is in how
package dependencies are managed, which is a significant challenge for Python data science and the
reason Conda exists. Pip installs all Python package dependencies required, whether or not those conflict
with other packages you installed previously.
So your working installation of, for example, Google Tensorflow, can suddenly stop
working when you pip install a different package that needs a different version of the Numpy library. More
insidiously, everything might still appear to work but now you get different results from your data science,
or you are unable to reproduce the same results elsewhere because you didn't pip install in the same order.

13
Anaconda Navigator is a desktop Graphical User Interface (GUI) included in Anaconda
distribution that allows users to launch applications and manage conda packages, environments and
channels without using command-line commands. Navigator can search for packages on Anaconda Cloud
or in a local Anaconda Repository, install them in an environment, run the packages and update them. It is
available for Windows, macOS and Linux.
The following applications are available by default in Navigator:
 JupyterLab
 Jupyter Notebook
 QtConsole
 Spyder
 Glueviz
 Orange
 Rstudio
 Visual Studio Code

Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating
XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is
a language-neutral platform for writing programs that can easily and securely interoperate. There’s no

14
language barrier with .NET: there are numerous languages available to the developer including Managed
C++, C#, Visual Basic and Java Script. “.NET” is also the collective name given to various software
components built upon the .NET platform. These will be both products (Visual Studio.NET and
Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on). Microsoft
VISUAL STUDIO is an Integrated Development Environment (IDE) from Microsoft. It is used to develop
computer programs, as well as websites, web apps, web services and mobile apps.

3.5.2 PYTHON:
 Python is a powerful multi-purpose programming language created by Guido van Rossum.
 It has simple easy-to-use syntax, making it the perfect language for someone trying to learn computer
programming for the first time.

3.6 FEATURES OF PYTHON :

1. Easy to code: Python is high level programming language. Python is very easy to learn
language as compared to other language like c, c#, java script, java etc. It is very easy to code
python language and anybody can learn python basic in few hours or days. It is also developer-
friendly language.

15
2.opensource
Python language is freely available at official website and you can download it from the given
download link below click on the Download Python keyword. Since, it is open-source, this means that
source code is also available to the public. So you can download it as, use it as well as share it.

3.Object-Oriented
One of the key features of python is Object-Oriented programming. Python supports object
oriented language and concepts of classes, objects encapsulation etc.

4.GUIProgramming
Graphical Users interfaces can be made using a module such as PyQt5, PyQt4, wxPython or Tk in
python. PyQt5 is the most popular option for creating graphical apps with Python.

5.High-LevelLanguage:
Python is a high-level language. When we write programs in python, we do not need to remember
the system architecture, nor do we need to manage the memory.

6.Extensiblefeature:
Python is a Extensible language. we can write our some python code into c or c++ language and
also we can compile that code in c/c++ language.

7.PythonPortablelanguage:
Python language is also a portable language. for example, if we have python code for windows
and if we want to run this code on other platform such as Linux, Unix and Mac then we do not need to
change it, we can run this code on any platform.

3.7 APPLICATIONS OF PYTHON :

3.7.1 WEB APPLICATIONS

16
 You can create scalable Web Apps using frameworks and CMS (Content Management System) that are
built on Python. Some of the popular platforms for creating Web Apps are:Django, Flask, Pyramid,
Plone, Django CMS.
 Sites like Mozilla, Reddit, Instagram and PBS are written in Python.

3.7.2 SCIENTIFIC AND NUMERIC COMPUTING


 There are numerous libraries available in Python for scientific and numeric computing. There are
libraries like:SciPy and NumPy that are used in general purpose computing. And, there are specific
libraries like: EarthPy for earth science, AstroPy for Astronomy and so on.
 Also, the language is heavily used in machine learning, data mining and deep learning.

3.7.3 CREATING SOFTWARE PROTOTYPES


 Python is slow compared to compiled languages like C++ and Java. It might not be a good choice if
resources are limited and efficiency is a must.
 However, Python is a great language for creating prototypes. For example: You can use Pygame (library
for creating games) to create your game's prototype first. If you like the prototype, you can use language
like C++ to create the actual game.

3.7.4 GOOD LANGUAGE TO TEACH PROGRAMMING


Python is used by many companies to teach programming to kids. It is a good language with a lot
of features and capabilities. Yet, it's one of the easiest language to learn because of its simple easy-to-use
system.

3.7.5 SOFTWARE DEVELOPMENT

Python is just the perfect option for software development. Popular applications like Google,
Netflix, and Reddit all use Python. This language offers amazing features like:
 Platform independence
 Inbuilt libraries and frameworks to provide ease of development.

17
 Enhanced code reusability and readability
 High compatibility

CHAPTER 4

PROJECT DESCRIPTION
4.1 INTRODUCTION
Machine learning, a subfield of artificial intelligence, empowers computers to learn from data and
make predictions or take actions without being explicitly programmed. By analyzing large volumes of
18
data, machine learning algorithms can identify patterns and make informed decisions based on the acquired
knowledge. This capability has been harnessed to develop intelligent fire detection systems that leverage
various data sources, such as imagery, temperature readings, and environmental factors, to identify and
predict the occurrence of fires.
Traditional fire detection systems often rely on human observation or rely on fixed sensors in
specific locations. These methods can be limited in their effectiveness, especially in vast or remote areas
where immediate human intervention may not be feasible. Machine learning-based fire detection systems
offer several advantages over traditional approaches. They can process real-time data from diverse sources,
including satellite imagery, thermal sensors, and weather data, enabling early detection of fires and
enhancing situational awareness.
One of the primary advantages of using machine learning for fire detection is the ability to detect
fires at their nascent stages. By training algorithms on historical fire data, these systems can learn to
recognize patterns and anomalies associated with fire outbreaks. The algorithms can then generalize this
knowledge to detect new fires, even in areas with different environmental conditions or vegetation types.
This adaptability makes machine learning-based fire detection systems valuable tools in mitigating the
impact of wildfires across various landscapes.
Furthermore, machine learning algorithms can incorporate multiple parameters and factors to
improve the accuracy of fire detection. For example, they can analyze infrared images to detect the
presence of hotspots or anomalies in temperature patterns. They can also consider meteorological data,
such as wind speed and humidity, to predict the fire's behavior and spread. By combining these inputs and
utilizing sophisticated algorithms, early fire detection systems can provide more comprehensive and
reliable information for firefighting efforts.

4.2 DETAILED DIAGRAM

4.2.1BACK END MODULE DIAGRAMs:

In Backend module diagram the modules which are used is shown in diagrammatic form. There are,
Dataset processing module, Data splitting module, training module and testing module.

19
4.2.2FRONT END MODULE DIAGRAMS:

In front end module diagram the modules which are used is shown in diagrammatic form. The main
processing in this phase is getting input from the user and producing output for it, so there are input and

output modules.

4.2.3USE CASE DIAGRAM:

Use case diagrams are a way to capture the system's functionality and requirements in UML

diagrams. It captures the dynamic behavior of a live system. A use case diagram consists of a use case and

an actor.

20
4.2.4 STATE DIAGRAM
A state diagram, also known as a state machine diagram or state chart diagram, is an illustration of
the states an object can attain as well as the transitions between those states in the Unified Modeling
Language. Then, all of the possible existing states are placed in relation to the beginning and the end.

21
4.2.5ACTIVITY DIAGRAM:
Activity Diagrams describe how activities are coordinated to provide a service which can be at
different levels of abstraction. Typically, an event needs to be achieved by some operations, particularly
where the operation is intended to achieve a number of different things that require coordination

4.2.6 SEQUENCE DIAGRAM:

A sequence diagram is a type of interaction diagram because it describes how and in what order a
group of objects works together. These diagrams are used by software developers and business
professionals to understand requirements for a new system or to document an existing process.

22
4.2.7 DATA FLOW DIAGRAM:
Data flow diagrams are used to graphically represent the flow of data in a business information
system. DFD describes the processes that are involved in a system to transfer data from the input
Data flow diagrams are used to graphically represent the flow of data in a business information

23
4.2.8 E-R DIAGRAM:
E-R Diagram stands for Entity Relationship Diagram, also known as ERD is a diagram that
displays the relationship of entity sets stored in a database. In other words, ER diagrams help to explain the
logical structure of databases. ER diagrams are created based on three basic concepts: entities, attributes
and relationships.ER Diagrams contain different symbols that use rectangles to represent entities, ovals to
define attributes and diamond shapes to represent relationships.

4.2.9 SYSTEM ARCHITECTURE

24
4.3SCOPE OF THE PROJECT:
The scope of the project includes the development and implementation of an early fire detection
system using machine learning techniques, specifically focusing on decision tree and random forest
algorithms.

4.3.1RANDOM FOREST ALGORITHMS:


Random Forest algorithm is a machine learning based algorithm that combines multiple decision
trees together for obtaining efficient outcome. Decision trees are created by random forest algorithm based
on data samples and selects the best solution by means of voting
Random Forest algorithms are used for classification as well as regression. It creates a tree for the
data and makes prediction based on that. Random Forest algorithm can be used on large datasets and can
produce the same result even when large sets record values are missing. The generated samples from the
decision tree can be saved so that it can be used on other data. In random forest there are two stages, firstly
create a random forest then make a prediction using a random forest classifier created in the first stage.

The random forest is a supervised learning algorithm that randomly creates and merges multiple decision
trees into one “forest.” The goal is not to rely on a single learning model, but rather a collection of decision
models to improve accuracy. The primary difference between this approach and the standard decision tree
algorithm is that the root nodes feature splitting nodes are generated randomly.
4.3.2PERFORMANCE MATRICES:

Data was divided into two portions, training data and testing data, both these portions consisting 70%
and 30% data respectively. All these two algorithms were applied on same dataset using Enthought
Canaopy and results were obtained.

25
Predicting accuracy is the main evaluation parameter that we used in this work. Accuracy can be defied

using equation. Accuracy is the overall success rate of the algorithm.

4.3.3CONFUSION MATRIX:
It is the most commonly used evaluation metrics in predictive analysis mainly because it is very easy
to understand and it can be used to compute other essential metrics such as accuracy, recall, precision, etc.
It is an NxN matrix that describes the overall performance of a model when used on some dataset, where N
is the number of class labels in the classification problem.

All predicted true positive and true negative divided by all positive and negative. True Positive (TP),
True Negative (TN), False Negative (FN) and False Positive (FP) predicted by all algorithms are presented
in table.
Many measures are available for assessing system performance. Typically tests after tests are used for
assessing intrusion detection.

• True positive (TP): The number of corrected instances classified as an intrusion.

• True Negative (TN): The number of incorrect instances classified as an intrusion.

• False-positive (FP): The number of intrusion instances that were incorrectly classified as normal.

• False-negative (FN): The number of normal instances that were incorrectly classified as an intrusion.

4.4MODULE DESCRIPTION:
• Data Collection
• Data pre processing

26
• Feature Selection
• Splitting data
• Training and testing models

DATA COLLECTION:
Data collection is a very basic module and the initial step towards the project. It generally deals with the
collection of the right dataset. The dataset that is to be used in the crop and fertilizer prediction has to be used to
be filtered based on various aspects. Data collection also complements to enhance the dataset by adding more
data that are external. Our data mainly consists of the previous year sales details. Initially, we will be analysing
the Kaggle dataset and according to the accuracy, we will be using the model with the data to analyse the
predictions accurately.
DATA-PRE PROCESSING
Data pre-processing is a part of machine learning, which involves transforming raw data into a more
coherent format. Raw data is usually, inconsistent or incomplete and usually contains many errors. The data
pre-processing involves checking out for missing values, looking for categorical values, splitting the dataset into
training and test set and finally do a feature scaling to limit the range of variables so that they can be compared
on common environs. In this paper we have used is null() method for checking null values and label Encoder()
for converting the categorical data into numerical data
FEATURE CLEANING
Cleaning data is the eradication or restoration of unfinished or empty data. There may also be
incomplete occurrences of data which do not carry the information that you think you'd like to lever may need
to eliminate these occurrences. In addition, there are attributes which carry sensitive information and that the
attributes are likely to be omitted.

SPLITTING DATA
This step includes training and testing of input data. The loaded data is divided into two sets, such as
training data and test data, with a division ratio of 80% or 20%, such as 0.8 or 0.2. In a learning set, a
classifier is used to form the available input data. In this step, create the classifier's support data and
preconceptions to approximate and classify the function. During the test phase, the data is tested. The final
data is formed during preprocessing and is processed by the machine learning module.

27
TRAINING AND TESTING MODELS
Algorithms learn from data. They find relationships, develop understanding, make decisions, and
evaluate their confidence from the training data they’re given. And the better the training data is, the better
the model performs. In fact, the quality and quantity of your training data has as much to do with the
success of your data project as the algorithms themselves.
Now, even if you’ve stored a vast amount of well-structured data, it might not be labeled in a way
that actually works for training your model. For example, autonomous vehicles don’t just need pictures of
the road, they need labeled images where each car, pedestrian, street sign and more are annotated;
sentiment analysis projects require labels that help an algorithm understand when someone’s using slang
or sarcasm; catboats need entity extraction and careful syntactic analysis, not just raw language. In other
words, the data you want to use for training usually needs to be enriched or labeled. Because if you’re
trying to make a great model, you need great training data. And we know a thing or two about that. After
all, we’ve labeled over 5 billion rows of data for some of the most innovative companies in the world.
Whether it’s images, text, audio, or, really, any other kind of data, we can help create the training set that
makes your models successful.

28
CHAPTER 5

CODE IMPLEMENTATION

BACK END:

import pandas as pd

#list of useful imports that I will use

%matplotlib inline

import os

import matplotlib.pyplot as plt

import pandas as pd

# import cv2

import numpy as np

from glob import glob

import seaborn as sns

import random

import pickle

from sklearn.metrics import confusion_matrix

from sklearn.metrics import roc_curve

data= pd.read_csv(r'C:\Users\ELCOT\Documents\PROJECT\FIRE(SMOKE)

29
Unna TVO eCO2
Temperatu Humidi
med: UTC C[pp [ppm
re[C] ty[%]
0 b] ]

165473333
c 0 20.000 57.36 0
1

165473333
1 1 20.015 56.67 0
2

Raw
Pressure[hPa] PM1.0 PM2.5 NC0.5
Ethanol 165473333
2 2 20.029 55.96 0
3

165473333
3 3 20.044 55.28 0
4

165473333
4 4 20.059 54.69 0
5

Raw
H2

400 12306 18520 939.735 0.00 0.00

400 12345 18651 939.744 0.00 0.00

400 12374 18764 939.738 0.00 0.00

400 12390 18849 939.736 0.00 0.00

NC1.0 NC2.5 CNT Fire Alarm


400 12403 18921 939.744 0.00 0.00

0.00 0.000 0.000 0 0

0.00 0.000 0.000 1 0

0.00 0.000 0.000 2 0

0.00 0.000 0.000 3 0


30

0.00 0.000 0.000 4 0


data.info()

data.describe()

data.isnull().sum()

data.isnull().any()

## Checking the each columns on correlation


data.corr()

data['Fire Alarm'].value_counts()

sns.set(style="whitegrid")

plt.figure(figsize=(10, 5))

ax = sns.countplot(x="Fire Alarm", data=data, palette=sns.color_palette("cubehelix", 4))

plt.xticks(rotation=90)

plt.title("Class Label Counts", {"fontname":"fantasy", "fontweight":"bold", "fontsize":"medium"})

plt.ylabel("count", {"fontname": "serif", "fontweight":"bold"})

plt.xlabel("Class", {"fontname": "serif", "fontweight":"bold"})

#Removing duplicates
data = data.drop_duplicates()
data.shape
from sklearn.preprocessing import LabelEncoder

columns=data.columns
31
label_encoder=LabelEncoder()

for cols in columns:

# print(cols)

if(isinstance(data[cols].values[0],str)):

data[cols]=label_encoder.fit_transform(data[cols].values)

data

data=['Fire Aladatarm'].value_counts

from sklearn.utils import resample

# Separate majority and minority classes

df_majority = data[data['Fire Alarm']== 1]

df_minority = data[data['Fire Alarm']== 0]

# Downsample majority class and upsample the minority class

df_minority_upsampled = resample(df_minority, replace=True,n_samples=5000,random_state=100)

df_majority_downsampled = resample(df_majority, replace=False,n_samples=5000,random_state=100)

# Combine minority class with downsampled majority class

df_balanced = pd.concat([df_minority_upsampled, df_majority_downsampled])

# Display new class counts

df_balanced['Fire Alarm'].value_counts()

data.head()

data = data.drop(columns=['Unnamed: 0'])

data

x = data.drop(columns=['Fire Alarm'])

y = data['Fire Alarm']

x.head()

y.tail()

32
from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.30,stratify=y ,random_state=40)

x_train

y_test

#RANDOM FOREST CLASSIFIER

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import roc_auc_score

from sklearn.model_selection import GridSearchCV

from sklearn.metrics import roc_curve

dept = [1, 5, 10, 50, 100, 500, 1000]

n_estimators = [20, 40, 60, 80, 100, 120]

param_grid={'n_estimators':n_estimators , 'max_depth':dept}

clf = RandomForestClassifier()

model = GridSearchCV(clf,param_grid,scoring='accuracy',n_jobs=-1,cv=3)

model.fit(x_train,y_train)

print("optimal n_estimators",model.best_estimator_.n_estimators)

print("optimal max_depth",model.best_estimator_.max_depth)

optimal_max_depth = model.best_estimator_.max_depth

optimal_n_estimators = model.best_estimator_.n_estimators

optimal n_estimators 20

optimal max_depth 10

from sklearn.metrics import accuracy_score

#training our model for max_depth=100,n_estimators = 120

clf = RandomForestClassifier(max_depth = optimal_max_depth,n_estimators = optimal_n_estimators)

33
clf.fit(x_train,y_train)

import pickle

pickle.dump(clf, open(filename, 'wb'))

pred_test2 =clf.predict(x_test)

test_accuracy1 = accuracy_score(y_test, pred_test2)

pred_train = clf.predict(x_train)

train_accuracy1 =accuracy_score(y_train,pred_train)

print("AUC on Test data is " +str(accuracy_score(y_test,pred_test2)))

print("AUC on Train data is " +str(accuracy_score(y_train,pred_train)))

print("---------------------------")

# Code for drawing seaborn heatmaps

class_names =['True','False']

df_heatmap = pd.DataFrame(confusion_matrix(y_test, pred_test2.round()), index=class_names,

columns=class_names )

fig = plt.figure( )

heatmap = sns.heatmap(df_heatmap, annot=True, fmt="d")

SCREENSHOTS:

34
Front end:

35
36
37
38
CHAPTER 5

5.CONCLUSION AND REFERENCES

CONCLUSION:
Early fire detection is a critical aspect of fire safety, and machine learning techniques can significantly
enhance the capabilities of fire detection systems. By acquiring relevant data, preprocessing it effectively,
extracting informative features, designing a suitable classifier, and thoroughly testing the system, an
intelligent fire detection system can be developed. Such a system has the potential to reduce fire hazards,
minimize property damage, and save lives.

REFERENCES

1. https://ieeexplore.ieee.org/document/652749

2. https://www.researchgate.net/publication/
309758635_A_symbolic_distributed_event_detection_sch eme_for_Wireless_Sensor_Networks
3. http://agritech.tnau.ac.in/agriculture/agri_majorareas _disastermgt_forestfire.html

4. Distributed Event Detection in Wireless Sensor Networks for Forest Fires Yashwant Singh and Suman
Saha Urvashi Chugh and Chhavi Gupta Department of Computer Science and Engineering Jaypee University
of Information and Technology Waknaghat, Solan173245, INDIA {yashwant.singh &
suman.saha}@juit.ac.in
5.Detection of Forest Fires using Machine Learning Technique: A Perspective Aditi Kansal1, Yashwant
Singh2, Nagesh Kumar3, Vandana Mohindru4 Department of Computer Science & Engineering Jaypee
University of Infonnation Technology Waknaghat, Solan173234, (H.P ), India 1 [email protected],
2yashu _ [email protected]

39

You might also like