0% found this document useful (0 votes)
25 views64 pages

Batch 3 Report

Uploaded by

Alavala Nishanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views64 pages

Batch 3 Report

Uploaded by

Alavala Nishanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

MACHINE LEARNING BASED SMART CITY

AIR QUALITY PREDICTION SYSTEM

A PROJECT REPORT

Submitted by

AMANCHARLA VISHNUPRIYA 111719106005


ANUVARSHINI S S 111719106007
DEVADHARSHINI S 111719106029

in partial fulfillment for the award of the


degree of
BACHELOR OF ENGINEERING
in

DEPARTMENT OF ELECTRONICS AND COMMUNICATION


ENGINEERING

R.M.K. ENGINEERING COLLEGE


(An Autonomous Institution)
R.S.M. Nagar, Kavaraipettai-601 206.

ANNA UNIVERSITY: CHENNAI 600 025


APRIL 2023
R.M.K. ENGINEERING COLLEGE
(An Autonomous Institution)
R.S.M. Nagar, Kavaraipettai-601 206.

BONAFIDE CERTIFICATE

Certified that this project report “MACHINE LEARNING BASED SMART

CITY AIR QUALITY PREDICTION SYSTEM” is the bonafide work of

AMANCHARLA VISHNUPRIYA(111719106005), ANUVARSHINI S

S(111719106007), DEVADHARSHINI S (111719106029) who carried out the

project work under my supervision.

SIGNATURE SIGNATURE
Dr. T.Suresh, M.E., Ph.D., Mr. T.Joel, M.E.,(Ph.D)

HEAD OF THE DEPARTMENT SUPERVISOR


Associate Professor
Electronics and Communication Electronics and Communication
Engineering Engineering
R.M.K. Engineering College R.M.K. Engineering College
Kavaraipettai-601206. Kavaraipettai-601206.

Submitted for the Project Viva - Voce held on …………….

INTERNAL EXAMINER EXTERNAL EXAMINER

ii
ACKNOWLEDGEMENT

We would like to express our heartfelt thanks to the Almighty, our beloved
parentsfor their blessings and wishes for successfully doing this project.

We convey our thanks to Chairman Thiru R.S. Munirathinam and Vice


Chairman Thiru R.M. Kishore who took keen interest on us and encouraged
throughout the course of study and for their kind attention and valuable
suggestions offered to us. We express our sincere gratitude to our Principal Dr. K.
A. Mohamed Junaid M.E., Ph.D for fostering an excellent climate to excel.

We are extremely thankful to Dr. T. Suresh M.E., Ph.D, Professor and Head,
Department of Electronics and Communication Engineering, for having
permitted us to carry out this project effectively.

We are extremely thankful to Dr. S. Joshua Kumaresan M.E., Ph.D.,


Professor for his valued ideas in effectively carrying out this project.

We convey our sincere thanks to our mentor, skillful and efficient supervisor,
Mr.T.Joel M.E.,(Ph.D) Associate Professor for his extremely valuable guidance
throughout the course of project.

We are grateful to our Project Co-ordinators and all the department staff
members for their intense support.

iii
ABSTRACT

Air Pollution in smart cities in the world has been drastically increasing lately
and the increase in the concentration of particulate matter in the air is a threat
for the country and citizens as it can out-turn unbearable consequences such as
cardiovascular disease and worsen asthma. PM2.5 is a deadly air pollutant that
is a mixture of solid and liquid coarse particles and has a diameter of 2.5
micrometres.
In India, traffic congestion has been the main contributor to developing air
pollution in smart cities such as Delhi and Bombay. The systematic way of air
pollution prediction using machine learning has been widely studied globally
over the years and many machine learning algorithms were studied and tested
to find the solution to air pollution in their country. However, very few
approaches were done in INDIA to predict air pollution using machine learning
methods. project aims to implement machine learning algorithms to find the
accuracy of the prediction of particulate matter, in air pollution in smart cities
of INDIA. To test the implementation of machine learning in this prediction,
Adaboost algorithm is chosen and using the smart city air Pollution dataset. The
outcome of this research is that Adaboost gave the best accuracy in prediction
of Particulate Matter, Air Pollution Index by using Raspberry pi PICO
hardware in smart cities of INDIA. This project implements in real-time
hardware setup with Raspberry pi Pico and IoT supporting node.

iv
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ACKNOWLEDGEMENT iii

ABSTRACT iv

LIST OF FIGURES ix

LIST OF ABBREVATION x

1 INTRODUCTION 1

1.1 OBJECTIVES 6
1.2 EXISTING METHOD
6

1.3 DISADVANTAGES OF EXISTING


6
METHOD
1.4 PROPOSED METHOD
6

1.5 ADVANTAGES OF PROPOSED 7


METHOD
8
1.6 BLOCK DIAGRAM

1.7 HARDWARE/SOFTWARE
REQUIREMENT 9

1.7.1 HARDWARE REQUIREMENTS 9

1.7.2 SOFTWARE REQUIREMENTS 9


2 LITERATURE SURVEY 10
3 HARDWARE DESCRIPTION 14

v
3.1 POWER SUPPLY 14

3.1.1 GENERAL DESCRIPTION 14

3.1.2 PRODUCT DESCRIPTION 14

3.1.3 CIRCUIT DESCRIPTION 16

3.1.4 FEATURES 18

3.1.5 APPLICATION 19

3.2 MQ135 AIR QUALITY SENSOR 19

3.2.1 GENERAL DESCRIPTION 19

3.2.2 PIN CONFIGURATIONS 21

3.2.3 SPECIFICATIONS & FEATURES 23

3.2.4 APPLICATIONS 24

3.3 MQ05 GAS SENSOR 24

3.3.1 GENERAL DESCRIPTION 24

3.3.2 FEATURES 25

3.3.3 APPLICATIONS 25

3.4 MQ2 SENSOR 25

3.4.1 GENERAL DESCRIPTION 25

3.4.2 SPECIFICATIONS 27

3.5 RASPBERRY PI PICO 28

3.5.1 GENERAL DESCRIPTION 28

vi
3.6 ADABOOST ALGORITHM 30

4 SOFTWARE DESCRIPTION 36

4.1 PYTHON 36

4.1.1 ABOUT PYTHON 36

4.1.2 HISTORY OF PYTHON 37

4.1.3 PYTHON FEATURES 37

4.1.4 DEVELOPMENT 38

5 SOFTWARE SPECIFICATIONS 41

5.1 ANACONDA 41

5.2 KEY FEARTURES 41

5.3 WHAT IS ANACONDA 42

5.4 CREATING VIRTUAL ENVIRONMENT 44

5.5 ANACONDA NAVIGATOR 44

5.6 FEASIBILITY STUDY 45

5.6.1 ECONOMIC FEASIBILITY 45

6 SYSTEM TESTING AND RESULT 47

6.1 INTRODUCTION 47

6.2 TYPES OF TESTING 47

6.3 RESULT 51

6.3.1 HARDWARE PART 51

6.3.2 PREDICTION DATA 51

6.3.3 PREDICTION GRAPH 53

vii
7 CONCLUSION & FUTURE SCOPE 54

7.1 CONCLUSION 54

7.2 FUTURE SCOPE 54

REFERENCES 55

viii
LIST OF FIGURES

Figure Description
No

3.1.2 Power supply

3.1.3 Circuit Diagram of Power Supply

3.2.1 MQ135 Air quality sensor

3.2.2 MQ135 SENSOR MODULE

3.4.1 MQ135 Gas Sensor

3.5.1 MQ2 sensor

3.5.2 Board Schematic Diagram

3.6.1 Raspberry Pi

ix
LIST OF ABBREAVIATIONS

Abbreviation Expansion

IoT Internet of Things

Liquid Crystal Display


LCD

Light Emitting Diode


LED

Air Quality Index


AQI

Particulate Matter
PM

Long Short Term Memory


LSTM

Universal Asynchrous Receiver Transmitter


UART

x
CHAPTER 1

INTRODUCTION

Pollution of the air is currently one of the most pressing issues


facing humanity on a global scale, particularly in highly developed cities.
A material from any atmospheric source that exists in any form, whether
liquid, solid, or gas phase, and is capable of causing destruction or the
capability to alter the typical properties of the atmosphere and increasing
the health risk to living things or causing the environment and ecosystem
to be out of balance is referred to as polluting the air. PM2.5 is the air
pollutant that the United States Environmental Protection Agency (US-
EPA) identifies as being the most dangerous to human health and the
leading cause of death worldwide. The concentration of air pollutants,
particularly PM2.5 during the haze season, is the primary factor that
determines the API readings in Malaysia. PM2.5 is a very tiny particle that
is able to diffuse into respiratory systems and has a negative impact on
human lungs. The weight of PM2.5 is a fine and tiny particle in the air,
and its diameter is 2.5 micrometers, as seen in PM2.5 is a very tiny
particle that can be visualized in PM2.5. Pollution from vehicles and
industrialization are the two primary sources of PM2.5 emissions. Cities
that use smart city technology can be observed to be experiencing this
stage of the PM2.5 problem. Problems with air pollution are plaguing a
number of smart cities.
According to Ameer, a "smart city" is an urban municipality
that uses information technologies (ICT) to provide its residents with
adequate health, transportation, and energy- related facilities, and that also
assists the government in making efficient use of its available resources

1
for the benefit of its people. In terms of information and communications
technology (ICT) and the number of people living in urban areas, Kuala
Lumpur and Johor Bahru are two of the developed smart cities in
Malaysia. Both the process of industrialization and the movement of
people from rural to urban areas have contributed to the rapid growth of
urban populations in the modern world. The rise in the city's population
has resulted in an increase in the number of people who use various modes
of transportation and consume various forms of energy, both of which
have contributed to the expansion of the city's industrial capacity and its
vehicle population. As a result, the findings of a number of empirical
studies have led researchers to the conclusion that the issue of air quality
in smart cities has been one of the city's primary challenges, and that
machine learning has provided a better and more strategic solution to the
problem of air quality prediction. In contrast to the rest of the world, the
application of machine learning in Malaysia to the forecasting of air
pollutants and air pollutions has not been widely recognized. Since there
has been significant development in the prediction of air pollution all over
the world over the course of the last few decades, it is possible that the
concentration of air pollutants in smart cities in Malaysia that are predicted
using ML techniques will be accurate.
According to the World Health Organization (WHO), air pollution
is a contributing factor in approximately 1.3 million deaths each year
around the world. The release of pollutants into the atmosphere has many
negative effects, one of which is a deterioration in the quality of the air.
Other negative effects, such as acid rain, global warming, the production
of aerosols, and photochemical smog, have also worsened over the course
of the past few decades. Many researchers have been motivated to
2
investigate the underlying pollution-related conditions that are
contributing to COVID-19 pandemics in different countries as a result of
the recent rapid spread of COVID-19. Air pollution has been linked to
significantly higher COVID-19 death rates, and patterns in COVID-19
death rates mimic patterns in both areas with a high population density
and areas with a high PM 2.5 exposure. This is evidenced by several
pieces of circumstantial evidence. Because of everything that has been
discussed up to this point, it is absolutely necessary to forecast and prepare
for changes in pollution levels in order to assist communities and
individuals in becoming more effective at mitigating the harmful effects of
air pollution. Evaluation of the air's quality is an important factor in both
the monitoring and the regulation of pollution levels in the atmosphere.
The Environmental Protection Agency (EPA) monitors common
pollutants such as ground-level ozone (O 3), sulphur dioxide (SO 2),
particulate matter (PM 10 and PM 2.5), carbon monoxide (CO), carbon
dioxide (CO 2), and nitrogen dioxide (NO 2). The Air Quality Index
(AQI) is an index that is commonly used to indicate how clean or polluted
the air is currently or how polluted the air is forecasted to become in
certain areas. These substances are included in the composition of the
AQI. As the Air Quality Index (AQI) rises, a greater proportion of the
population will be subjected to the impacted conditions. Different
countries have their own air quality indices, which correspond to different
air quality standards in those countries. Lead, ozone, particulate matter 10,
particulate matter 2.5, nitrogen dioxide, and sulphur dioxide are the six
pollutants that the United States Environmental Protection Agency (EPA)
tracks at more than 4000 locations across the country.

3
1.1 OBJECTIVE

To design and utilise machine learning algorithms to assess the


feasibility of predicting particulate matter in air pollution in INDIA's smart
cities.

1.2 EXISTING METHOD

The existing prediction methods for air quality forecasting include


deterministic methods, statistical methods, machine learning, and
deep learning methods. Multiple wireless and wired sensors, a sensor
node, gateway, and ESP 8266 WiFi modules are utilised in the creation of
an air quality monitoring system. The manual air quality monitoring
stations, however, measure air pollutants once every 6 days.
Traditional statistical methods have been widely used to process air quality
forecastingproblems.

1.3 DISADVANTAGES OF EXISTING METHOD

 Time Consuming
 Biased results.
 Approach of using historical data

1.4 PROPOSED METHOD

In the proposed system, we are overcoming the drawbacks of the


existing system. The purpose of this project is to use machine learning
algorithms to determine how well particulate matter can be predicted in air

4
pollution in INDIA's smart cities.

The Adaboost algorithm is selected, and the Smart City Air


Pollution dataset is used, to evaluate the use of machine learning in this
prediction. This algorithm is boosting in nature which is part of
ensembling learning in which multiple individual models combine to
create a master model. It is a sequential learning process in which all the
models are dependent on each other or on the previous model.
This study found that Adaboost's prediction of PM2.5 and the
Air QualityIndex using Raspberry Pi PICO hardware in Indian smart cities
was the most accurate. Raspberry Pi Pico and an Internet of Things (IoT)
node are used in a real-world hardware implementation of this project.

1.5 ADVANTAGES OF PROPOSED SYSTEM

 Unbiased results
 Time Efficient
 Flexible Algorithm

5
1.6 BLOCK DIAGRAM:

Fig 1.6.1 Air Quality Monitoring Section

Fig 1.6.2 Server with Machine Learning Section


6
HARDWARE/ SOFTWARE REQUIREMENTS

1.7.1 HARDWARE REQUIREMENTS:

 RASPBERRY PI PICO

 RSPM SENSOR

 MQ135 SENSOR

 MQ05 SENSOR

 IOT MODULE

 LCD DISPLAY

 POWER SUPPLY

1.7.2 SOFTWARE REQUIREMENTS:

 PYTHON IDE

 PYTHON LANGUAGE

7
CHAPTER 2

LITERATURE SURVEY

[1] “A deep learning model for air quality prediction in smart cities, 2017”-
IEEE International Conference on Big Data (Big Data)

ABSTRACT:

In recent years, Internet of Things (IoT) concept has become a


promising research topic in many areas including industry, commerce and
education. Smart cities employ IoT based services and applications to
create a sustainable urban life. By using information and communication
technologies, IoT enables smart cities to make city stakeholders more
aware, interactive and efficient. With the increase in number of IoT based
smart city applications, the amount of data produced by these applications
is increased tremendously. Governments and city stakeholders take early
precautions to process these data and predict future effects to ensure
sustainable development. In prediction context, deep learning techniques
have been used for several forecasting problems in big data. This inspires
us to use deep learning methods for prediction of IoT data. Hence, in this
paper, a novel deep learning model is proposed for analyzing IoT smart
city data. We propose a novel model based on Long ShortTerm Memory
(LSTM) networks to predict future values of air quality in a smart city.
The evaluation results of the proposed model are found to be promising
and they show that the model can be used in other smart city prediction
problems as well.

AUTHORS: İbrahim Kök , Mehmet Ulvi Şimşek

8
[2] “Comparative Analysis of Machine Learning Techniques for
Predicting Air Quality in Smart Cities,2019”-IEEE Access

ABSTRACT:

Dealing with air pollution presents a major environmental challenge in


smart city environments. Real-time monitoring of pollution data enables
local authorities to analyze the current traffic situation of the city and make
decisions accordingly. Deployment of the Internet of Things-based sensors
has considerably changed the dynamics of predicting air quality. Existing
research has used different machine learning tools for pollution prediction;
however, comparative analysis of these techniques is required to have a
better understanding of their processing time for multiple datasets. In this
paper, we have performed pollution prediction using four advanced
regression techniques and present a comparative study to determine the
best model for accurately predicting air quality with reference to data size
and processing time. We have conducted experiments using Apache Spark
and performed pollution estimation using multiple datasets. The Mean
Absolute Error (MAE) and Root Mean Square Error (RMSE) have been
used as evaluation criteria for the comparison of these regression models.
Furthermore, the processing time of each technique through standalone
learning and through fitting the hyperparameter tuning on Apache Spark
has also been calculated to find the best-fit model in terms of processing
time and lowest error rate.

AUTHORS: Saba Ameer, Munam Ali Shah.

9
[3] “A Machine Learning Model for Air Quality Prediction for Smart
Cities, 2019”-International Conference on Wireless
Communications Signal Processing and Networking (WiSPNET)

ABSTRACT:

Air quality of a certain region can be used as one of the major factor
determining pollution index also how well the city’s industries and
population is managed. Urban air quality monitoring has been a constant
challenge with the advent of industrialization. Air pollution has remained a
major challenge for the public and the government all over the world. Air
pollution causes noticeable damage to the environment as well as to
human health resulting into acid rain, global warming, heart diseases and
skin cancer to the people. This paper addresses the challenge of predicting
the Air Quality Index (AQI), with the aim to minimize the pollution before
it gets adverse, using two Machine Learning Algorithms: Neural Networks
and Support Vector Machines. The air pollution databases were extracted
from the Central Pollution Control Board (CPCB), Ministry of
Environment, Forest and Climate change, Government of India. The
proposed Machine Learning (ML) model is promising in prediction
context for the Delhi AQI. The results show improvement of the
prediction accuracy and suggest that the model can be used in other smart
cities as well.

AUTHORS: Usha Mahalingam, Kirthiga Elangovan

10
[4] “Air Quality Prediction in Smart Cities Using Machine Learning
Technologies Based on Sensor Data: A Review,2020”-MDPI
ABSTRACT:

The influence of machine learning technologies is rapidly increasing


and penetrating almost in every field, and air pollution prediction is not
being excluded from those fields. This paper covers the revision of the
studies related to air pollution prediction using machine learning
algorithms based on sensor data in the context of smart cities. Using the
most popular databases and executing the corresponding filtration, the
most relevant papers were selected. After thorough reviewing those
papers, the main features were extracted, which served as a base to link
and compare them to each other. As a result, we can conclude that: (1)
instead of using simple machine learning techniques, currently, the authors
apply advanced and sophisticated techniques, (2) China was the leading
country in terms of a case study, (3) Particulate matter with diameter equal
to 2.5 micrometers was the main prediction target, (4) in 41% of the
publications the authors carried out the prediction for the next day, (5)
66% of the studies used data had an hourly rate, (6) 49% of the papers
used open data and since 2016 it had a tendency to increase, and (7) for
efficient air quality prediction it is important to consider the external
factors such as weather conditions, spatial characteristics, and temporal
features.

AUTHORS: Ditsuhi Iskandaryan, Francisco Ramos.

11
CHAPTER 3

HARDWARE DESCRIPTION

INTRODUCTION

Computer hardware is the collection of physical components


that constitute computer hardware. Computer hardware is the physical
parts or components of a computer, such as monitor, keyboard,
computer data storage, graphical card, sound card, motherboard and so
on. All of these are tangible objects. By contrast, software is
instructions that can be stored and run by hardware.

3.1 POWER SUPPLY


3.1.1 GENERAL DESCRIPTION:

A power supply (sometimes known as a power supply unit or PSU)


is a device or system that supplies electrical or other types of energy to an
output load or group of loads. The term is most commonly applied to
electrical energy supplies, less often to mechanical ones, and rarely to
others.

3.1.2 PRODUCT DESCRIPTION:

The transformer steps up or steps down the input line voltage and isolates
the power supply from the power line. The RECTIFIER section converts
the alternating current input signal to a pulsating direct current. However,
as you proceed in this chapter you will learn that pulsating dc is not
desirable. For this reason a FILTER section is used to convert pulsating dc
to a purer, more desirable form of dc voltage.
12
The final section, the REGULATOR, does just what the name implies. It
maintains the output of the power supply at a constant level in spite of
large changes in load current or input line voltages. Now that you know
what each section does, let's trace an ac signal through the power supply.
At this point you need to see how this signal is altered within each section
of the power supply. Later on in the chapter you will see how these
changes take place. In view B of figure 4-1, an input signal of 115 volts ac
is applied to the primary of the transformer. The transformer is a step-up
transformer with a turns ratio of 1:3. You can calculate the output for this
transformer by multiplying the input voltage by the ratio of turns in the
primary to the ratio of turns in the secondary; therefore, 115 volts ac ´ 3 =
345 volts ac (peak-to- peak) at the output. Because each diode in the
rectifier section conducts for 180 degrees of the 360-degree input, the
output of the rectifier will be one-half, or approximately 173 volts of
pulsating dc. The filter section, a network of resistors, capacitors, or
inductors, controls the rise and fall time of the varying signal;
consequently, the signal remains at a more constant dc level. You will see
the filter process more clearly in the discussion of the actual filter circuits.
The output of the filter is a signal of 110 volts dc, with ac ripple riding on
the dc. The reason for the lower voltage (average voltage) will be
explained later in this chapter. The regulator maintains its output at a
constant 110-volt dc level, which is used by the electronic equipment
(more commonly called the load).

13
Simple 5V power supply for digital circuits
 Brief description of operation: Gives out well regulated +5V output,
output current capability of 100 mA
 Circuit protection: Built-in overheating protection shuts down output when
regulator IC gets too hot
 Circuit complexity: Very simple and easy to build
 Circuit performance: Very stable +5V output voltage, reliable operation
 Availability of components: Easy to get, uses only very common basic
components
 Design testing: Based on datasheet example circuit, I have used this circuit
successfully as part of many electronics projects
 Applications: Part of electronics devices, small laboratory power supply
 Power supply voltage: Unregulated DC 8-18V power supply
 Power supply current: Needed output current + 5 mA
 Component costs: Few dollars for the electronics components + the input
transformer cost

Fig 3.1.2 Block diagram of basic Power supply

3.1.3 CIRCUIT DESCRIPTION:

This circuit is a small +5V power supply, which is useful when


14
experimenting with digital electronics. Small inexpensive wall
transformers with variable output voltage are available from any
electronics shop and supermarket. Those transformers are easily available,
but usually their voltage regulation is very poor, which makes then not
very usable for digital circuit experimenter unless a better regulation can
be achieved in some way. The following circuit is the answer to the
problem.

Fig 3.1.3 Circuit diagram of the power supply

This circuit can give +5V output at about 150 mA current, but it can be
increased to 1 A when good cooling is added to 7805 regulator chip. The
circuit has over overload and therminal protection. The capacitors must
have enough high voltage rating to safely handle the input voltage feed to
circuit. The circuit is very easy to build for example into a piece of Vero
board.

15
Pinout of the 7805 regulator IC
 Unregulated voltage in
 Ground
 Regulated voltage out

Component list
 7805 regulator IC
 100 uF electrolytic capacitor, at least 25V voltage rating
 10 uF electrolytic capacitor, at least 6V voltage rating
 100 nF ceramic or polyester capacitor

Fig 3.1.3 Equivalent Circuit diagram of the power supply

3.1.4 FEATURES:

• Output current:1A
• Supply voltage: 220-230VAC
• Output voltage: 12VDC
• Reduced costs
• Increased value across front-office and back-office functions
• Access to current, accurate, and consistent data
• It generates adapter metadata as WSDL files with J2CA extension.
16
3.1.5 APPLICATIONS:

• Back-end systems which need to send purchase order data to oracle


applications send it to the integration service via integration server client.

SMPS applications

3.2 MQ135 AIR QUALITY SENSOR

3.2.1 GENERAL DESCRIPTION:

A device that is used to detect or measure or monitor the gases like


ammonia, benzene, sulfur, carbon dioxide, smoke, and other harmful gases
are called as an air quality gas sensor. The MQ135 air quality sensor,
which belongs to the series of MQ gas sensors, is widely used to detect
harmful gases, and smoke in the fresh air. This article gives a brief
description of how to measure and detect gases by using an MQ135 air
quality sensor.
The alternatives for the MQ135 air quality sensor/detector are MQ-2
(methane, LPG, butane, and smoke), MQ-3 (alcohol, smoke, and ethanol),
MQ-4 (CNG gas and methane), MQ-5 (natural gas, and LPG), MQ-6
(butane and LPG), MQ-7 (CO), MQ-8 (Hydrogen), MQ-9 (CO, and
flammable gases), MQ131 (ozone), MQ136 (Hydrogen sulfide gas),
MQ137 (ammonia), MQ138 (benzene, alcohol, propane, toluene,
formaldehyde gas, and hydrogen), MQ214 (methane, and natural gas),
MQ303A (alcohol, smoke, Ethanol), MQ306A (LPG and butane),

17
MQ307A(CO), MQ309A(CO and flammable gas).An MQ135 air quality
sensor is one type of MQ gas sensor used to detect, measure, and monitor
a wide range of gases present in air like ammonia, alcohol, benzene,
smoke, carbon dioxide, etc. It operates at a 5V supply with 150mA
consumption. Preheating of 20 seconds is required before the operation, to
obtain the accurate output.
It is a semiconductor air quality check sensor suitable for monitoring
applications of air quality. It is highly sensitive to NH3, NOx, CO2,
benzene, smoke, and other dangerous gases in the atmosphere. It is
available at a low cost for harmful gas detection and monitoring
applications.
If the concentration of gases exceeds the threshold limit in the air, then the
digital output pin goes high. The threshold value can be varied by using
the potentiometer of the sensor. The analog output voltage is obtained
from the analog pin of the sensor, which gives the approximate value of
the gas level present in the air.

Fig 3.2.1 MQ135 Air Quality Sensor

18
3.2.2 PIN CONFIGURATION:

The MQ135 air quality sensor is a 4-pin sensor module that features both
analog and digital output from the corresponding pins.

Fig 3.2.2 MQ135 Air Quality Sensor Module

MQ135 Air Quality Sensor Pin Configuration:

Pin 1: VCC: This pin refers to a positive power supply of 5V that power
up the MQ135 sensor module.
Pin 2: GND (Ground): This is a reference potential pin, which connects
the MQ135 sensor module to the ground.
Pin 3: Digital Out (Do): This pin refers to the digital output pin that gives
the digital output by adjusting the threshold value with the help of a
potentiometer. This pin is used to detect and measure any one particular
gas and makes the MQ135 sensor work without a microcontroller.

Pin 4: Analog Out (Ao): This pin generates the analog output signal of 0V
to 5V and it depends on the gas intensity. This analog output signal is
proportional to the gas vapor concentration, which is measured by the
MQ135 sensor module. This pin is used to measure the gases in PPM. It is
driven by TTL logic, operates with 5V, and is mostly interfaced with

19
microcontrollers.
H-pins: There are 2 H-pins, where one is connected to the voltage supply
and the other is connected to the ground.
A-pins: Here A-pins and B-pins can be interchanged. These are connected
to the voltage supply.
B-pins: Here A-pins and B-pins can be interchanged. One pin is used to
generate output while the other pin is connected to the ground.

Fig 3.2.2 MQ135 Air Quality Sensor

20
3.2.3 SPECIFICATIONS AND FEATURES:

The MQ135 air quality sensor specifications and features are listed below.
 It has a wide detection scope.
 High sensitivity and faster response.
 Long life and stability.
 The operating voltage: +5V.
 Measures and detects NH3, alcohol, NOx, Benzene, CO2, smoke etc.
 Range of analog output voltage: 0V-5V.
 Range of digital output voltage: 0V-5V (TTL logic).
 Duration of preheating: 20 seconds.
 Used as an analog or digital sensor.
 The potentiometer is used to vary the sensitivity of the digital pin.
 Heating Voltage: 5V±0.1.
 Load resistance is adjustable.
 Heater resistance: 33ohms±5%.
 Heating consumption:<800mW.
 Operating temperature: -10°C to -45°C.
 Storage temperature: -20°C to -70°C.
 Related humidity: <95%Rh.
 Oxygen concentration: 21% (affects the sensitivity).
 Sensing resistance: 30kiloohms to 200kiloohms.
 Concentration slope rate: ≤0.65.
 Preheat time: over 24 hrs.
 Simple drive circuit.

21
3.2.4 APPLICATIONS :

The applications of the MQ135 quality sensor are:


 Used in the detection of excess or leakage of gases like nitrogen oxide,
ammonia,alcohol, aromatic compounds, smoke, and sulfide.
 Used as air quality monitors.
 Used in air quality equipment for offices and buildings.
 Used as a domestic air pollution detector.
 Used as an industrial air pollution detector.
 Works as a portable air pollution detector.

3.3 MQ05 GAS SENSOR:

3.3.1 GENERAL DESCRIPTION:

The Grove - Gas Sensor (MQ5) module is useful for gas leakage detection
(in home and industry). It is suitable for detecting H2, LPG, CH4, CO,
Alcohol. Due to its high sensitivity and fast response time, measurements
can be taken as soon as possible. The sensitivity of the sensor can be
adjusted by using the potentiometer.The sensor value only reflects the
approximated trend of gas concentration in a permissible error range, it
DOESNOT represent the exact gas concentration. The detection of certain
components in the airusually requires a more precise and costly
instrument, which cannot be done with a single gas sensor. If your project
is aimed at obtaining the gas concentration at a very precise level, then we
do not recommend this gas sensor.

22
3.3.2 FEATURES:
• Wide detecting scope
• Stable and long life
• Fast response and High sensitivity

Fig 3.3.1 MQ05 Gas Sensor

3.3.3APPLICATION:

• Gas leakage detection.


• Toys.

3.4 MQ2 SENSOR:

3.4.1 GENERAL DESCRIPTION:

MQ2 gas sensor can be used to detect the presence of LPG, Propane and
Hydrogen, also could be used to detect Methane and other combustible
steam, it is low cost and suitable for different application. Sensor is
sensitive to flammable gas and smoke. Smoke sensor is given 5 volt to
power it. Smoke sensor indicate smoke by the voltage that it outputs. More
smoke more output. A potentiometer is provided to adjust the sensitivity.
Sn02 is the sensor used which is of low conductivity when the air is clean.

23
But when, smoke exist sensor provides an analog resistive output based on
concentration of smoke. The circuit has a heater. Power is given to heater
by VCC and GND from power supply. The circuit has a variable resistor.
The resistance across the pin depends on the smoke in air in the sensor.
The resistance will be lowered if the content is more. And voltage is
increased between the sensor and load resistor.

Fig 3.4.1 MQ2 Sensor

The MQ2 has an electrochemical sensor, which changes its resistance for
different concentrations of varied gasses. The sensor is connected in series
with a variable resistor to form a voltage divider circuit, and the variable
resistor is used to change sensitivity. When one of the above gaseous
elements comes in contact with the sensor after heating, the sensor’s
resistance change. The change in the resistance changes the voltage across
the sensor, and this voltage can be read by a microcontroller. The voltage
value can be used to find the resistance of the sensor by knowing the
reference voltage and the other resistor’s resistance. The sensor has
different sensitivity for different types of gasses.

24
3.4.2 SPECIFICATIONS:

• Power Supply: 4.5V to 5V DC


• High sensitivity to Propane, Smoke, LPG and Butane
• Wide range high sensitivity to Combustible gases
• Long life and low cost
• Analog and Digital output available
• Onboard visual indicator (LED) for indicating alarm
• Compact design and easily mountable
• Simple 4 PIN header interface
• Drive circuit is simple.
• Sensor Type: Semiconductor
• Concentration: 300-10000ppm (Combustible gas)
• Supply voltage =5v APPLICATIONS
• Safety of home
• Control of air quality
• Measurement of gas level Working Principle

25
Fig 3.4.2 Board Schematic Diagram

3.5 RASPBERRY PI POCO:

3.5.1.GENERAL DESCRIPTION:

A Raspberry Pi Pico is miniature version of the Raspberry Pi that features


the RP2040 microcontroller. Raspberry Pi Pico has been developed to be a
low-cost and flexible development platform for RP2040.
• Two megabytes of flash memory on the RP2040 microcontroller :
• A micro-USB type B connection for power and data (and for
reprogramming the Flash)
• 40-pin, 2151 "DIP"-style PCB with 0.1" through-hole pins and edge
castellations, 1 mm thick
• Displays 26 3.3V GPIO that can be used for a variety of purposes.

26
• Designed to be put on a flat surface like a module
• Easily run the device on micro-USB, external power, or batteries.

• High Availability, Low Price, and Excellent Quality


• Up to 133MHz on a dual-core cortex M0+

• Core frequency can be adjusted thanks to the on-chip PLL.

• High-performance, multi-bank SRAM with 264 kilobytes of storage

• 16kB on-chip cache and external Quad-SPI flash memory


• It is made of a high-performance, full-crossbar bus fabric.

• There are 23 digital GPIO and 3 analog-to-digital converter ones

Fig 3.5.1 RASPBERRY PI POCO

27
3.6 ADABOOST ALGORITHM:

AdaBoost algorithm, short for Adaptive Boosting, is a Boosting technique


used as an Ensemble Method in Machine Learning. It is called Adaptive
Boosting as the weights are re-assigned to each instance, with higher
weights assigned to incorrectly classified instances. Boosting is used to
reduce bias as well as variance for supervised learning. It works on the
principle of learners growing sequentially. Except for the first, each
subsequent learner is grown from previously grown learners. In simple
words, weak learners are converted into strong ones. The AdaBoost
algorithm works on the same principle as boosting with a slight difference.
First, let us discuss how boosting works. It makes ‘n’ number of decision
trees during the data training period. As the first decision tree/model is
made, the incorrectly classified record in the first model is given
priority. Only these records are sent as input for the second model. The
process goes on until we specify a number of base learners we want to
create. Remember, repetition of records is allowed with all boosting
techniques.
This figure shows how the first model is made and errors from the first
model are noted by the algorithm. The record which is incorrectly
classified is used as input for the next model. This process is repeated until
the specified condition is met. As you can see in the figure, there are ‘n’
number of models made by taking the errors from the previous model.
This is how boosting works.
The models 1,2, 3,…, N are individual models that can be known as
decision trees. All types of boosting models work on the same principle.
28
Since we now know the boosting principle, it will be easy to understand
the AdaBoost algorithm .Let’s dive into AdaBoost’s working. When the
random forest is used, the algorithm makes an ‘n’ number of trees. It
makes proper trees that consist of a start node with several leaf nodes.
Some trees might be bigger than others, but there is no fixed depth in a
random forest. With AdaBoost, however, the algorithm only makes a node
with two leaves, known as Stump.
Here represents the stump. It can be seen clearly that it has only one node
with two leaves. These stumps are weak learners and boosting techniques
prefer this. The order of stumps is very important in AdaBoost. The error
of the first stump influences how other stumps are made. Let’s understand
this with an example.
Here’s a sample dataset consisting of only three features where the output
is in categorical form. The image shows the actual representation of the
dataset. As the output is in binary/categorical form, it becomes a
classification problem. In real life, the dataset can have any number of
records and features in it. Let us consider 5 datasets for explanation
purposes. The output is in categorical form, here in the form of Yes or No.
All these records will be assigned a sample weight. The formula used for
this is ‘W=1/N’ where N is the number of records. In this dataset, there
are only 5 records, so the sample weight becomes 1/5 initially. Every
record gets the same weight. In this case, it’s 1/5.

Step 1 – Creating the First Base Learner

To create the first learner, the algorithm takes the first feature, i.e., feature
1 and creates the first stump, f1. It will create the same number of stumps
as the number of features. Inthe case below, it will create 3 stumps as there

29
are only 3 features in this dataset. From these stumps, it will create three
decision trees. This process can be called the stumps- base learner model.
Out of these 3 models, the algorithm selects only one. Two properties are
considered while selecting a base learner – Gini and Entropy. We must
calculate Gini or Entropy the same way it is calculated for decision trees.
The stump with the least value will be the first base learner. In the figure
below, all the 3 stumps can be made with 3 features.The number below the
leaves represents the correctly and incorrectly classified records. By using
these records, the Gini or Entropy index is calculated. The stump that has
the least Entropy or Gini will be selected as the base learner. Let’s assume
that the entropy index is the least for stump 1. So, let’s take stump 1, i.e.,
feature 1 as our first base learner.
Here, feature (f1) has classified 2 records correctly and 1 incorrectly. The
row in the figure that is marked red is incorrectly classified. For this, we
will be calculating the totalerror.
Step 2 – Calculating the Total Error (TE)
The total error is the sum of all the errors in the classified record for
sample weights. Inour case, there is only 1 error, so Total Error (TE) =
1/5.
Step 3 – Calculating Performance of the Stump Formula for calculating
Performance of the Stump is: –where, ln is natural log and TE is Total
Error.
In our case, TE is 1/5. By substituting the value of total error in the above
formula and solving it, we get the value for the performance of the stump
as 0.693. Why is it necessary to calculate the TE and performance of a
stump? The answer is, we must

30
update the sample weight before proceeding to the next model or stage
because if the same weight is applied, the output received will be from
the first model. In boosting, only the wrong records/incorrectly classified
records would get more preference than the correctly classified records.
Thus, only the wrong records from the decision tree/stump are passed on
to another stump. Whereas, in AdaBoost, both records were allowed to
pass and the wrong records are repeated more than the correct ones. We
must increase the weight for the wrongly classified records and decrease
the weight for the correctly classified records. In the next step, we will be
updating the weights based on the performance of the stump.

Step 4 – Updating Weights


For incorrectly classified records, the formula for updating weights is:New
Sample Weight = Sample Weight * e^(Performance)
In our case Sample weight = 1/5 so, 1/5 * e^ (0.693) = 0.399
For correctly classified records, we use the same formula with the
performance value being negative. This leads the weight for correctly
classified records to be reduced as compared to the incorrectly classified
ones. The formula is:
New Sample Weight = Sample Weight * e^- (Performance) Putting the
values, 1/5 * e^-(0.693) = 0.100
The updated weight for all the records can be seen in the figure. As is
known, the total sum of all the weights should be 1. In this case, it is seen
that the total updated weight of all the records is not 1, it’s 0.799. To bring
the sum to 1, every updated weight must be divided by the total sum of
updated weight. For example, if our updated weight is 0.399 and we
divide this by 0.799, i.e. 0.399/0.799=0.50.

31
0.50 can be known as the normalized weight. In the below figure, we can
see all the normalized weight and their sum is approximately 1.
Step 5 – Creating a New Dataset
Now, it’s time to create a new dataset from our previous one. In the new
dataset, the frequency of incorrectly classified records will be more than
the correct ones. The new dataset has to be created using and considering
the normalized weights. It will probably select the wrong records for
training purposes. That will be the second decision tree/stump. To make a
new dataset based on normalized weight, the algorithm will divide it into
buckets.
So, our first bucket is from 0 – 0.13, second will be from 0.13 –
0.63(0.13+0.50), third will be from 0.63 – 0.76(0.63+0.13), and so on.
After this the algorithm will run 5 iterations to select different records
from the older dataset. Suppose in the 1st iteration, the algorithm will take
a random value 0.46 to see which bucket that value falls into and select
that record in the new dataset. It will again select a random value, see
which bucket it is in and select that record for the new dataset. The same
process is repeated 5 times.
There is a high probability for wrong records to get selected several times.
This will form the new dataset. It can be seen in the image below that row
number 2 has been selected multiple times from the older dataset as that
row is incorrectly classified in the previous one.
Based on this new dataset, the algorithm will create a new decision
tree/stump and it will repeat the same process from step 1 till it
sequentially passes through all stumps and finds that there is less error as
compared to normalized weight that we had in the initial stage.
In Python, coding the AdaBoost algorithm takes only 3-4 lines and is easy.

32
We must import the AdaBoost classifier from the sci-kit learn library.
Before applying AdaBoost to any dataset, one should split the data into
train and test. After splitting the data into train and test, the training data is
ready to train the AdaBoost model. This data has both the input as well as
output. After training the data, our algorithm will try to predict the result
on the test data. Test data consists of only the inputs. The output of test
data is not known by the model. Accuracy can be checked by comparing
the actual output of the test data and the output predicted by the model.
This can help us conclude how our model is performing and how much
accuracy can be considered, depending on the problem statement. If it’s a
medical problem, then accuracy should be above 90%. Usually, 70%
accuracy is considered good. Accuracy also depends on factors apart from
the type of model. The figure below shows the code used to implement
AdaBoost.
Adaptive Boosting is a good ensemble technique and can be used for both
Classification and Regression problems. In most cases, it is used for
classification problems. It is better than any other model as it improves
model accuracy which can be checked by going in sequence. One can first
try decision trees and then go for the random forest to finally apply the
boost and implement AdaBoost. Accuracy keeps increasing as we follow
the above sequence. The weight-assigning technique after every iteration
makes the AdaBoost algorithm different from all other boosting
algorithms and that is the best thing about it.

33
CHAPTER 4
SOFTWARE DESCRIPTION

4.1 .PYTHON:
4.1.1. ABOUT PYTHON:
 Python is an interpreted, high-level, general-purpose programming
language.Python is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including procedural,
object-oriented, and functional programming. Python is often described
as a "batteries included" language due to its comprehensive
standard library.Python interpreters are available for many operating
systems. A global community of programmers develops and maintains C
Python, an open source reference implementation. A non-profit
organization, the Python Software Foundation, manages and directs
resources for Python and C Python development.
 Python is a high-level, interpreted, interactive and object-oriented
scripting language. Python is designed to be highly readable. It uses
English keywords frequently where as other languages use punctuation,
and it has fewer syntactical constructions than other languages
 Python is Interpreted − Python is processed at runtime by the
interpreter. You do not need to compile your program before executing
it. This is similarto PERL and PHP.
 Python is Interactive − You can actually sit at a Python prompt interact
with the interpreter directly to write your programs.
 Python is Object-Oriented − Python supports Object-Oriented style or
technique of programming that encapsulates code within objects.

34
 Python is a Beginner's Language − Python is a great language for the
beginner-level programmers and supports the development of a wide range
ofapplications from simple text processing to WWW browsers to games.
4.1.2 History of Python:
 Python was developed by Guido van Rossum in the late eighties and
early nineties at the National Research Institute for Mathematics and
Computer Science in the Netherlands.
 Python is derived from many other languages, including ABC, Modula-
3, C, C++, Algol-68, Small Talk, and Unix shell and other scripting
languages.
 Python is copyrighted. Like Perl, Python source code is now available
under the GNU General Public License (GPL).
 Python is now maintained by a core development team at the institute,
although Guido van Rossum still holds a vital role in directing its progress.
4.1.3 Python Features:
Python's features include −
 Easy-to-learn − Python has few keywords, simple structure, and a
clearlydefined syntax. This allows the student to pick up the language
quickly.
 Easy-to-read − Python code is more clearly defined and visible to the
eyes.
 Easy-to-maintain − Python's source code is fairly easy-to-maintain.
 A broad standard library − Python's bulk of the library is very portable
andcross-platform compatible on UNIX, Windows, and Macintosh.
 Interactive Mode − Python has support for an interactive mode which
allows interactive testing and debugging of snippets of code.

35
 Portable − Python can run on a wide variety of hardware platforms and
hasthe same interface on all platforms.

 Extendable − You can add low-level modules to the Python interpreter.


These modules enable programmers to add to or customize their tools to
be more efficient.
 Databases − Python provides interfaces to all major commercial
databases.
 GUI Programming − Python supports GUI applications that can be
created and ported to many system calls, libraries and windows systems,
such as Windows MFC, Macintosh, and the X Window system of Unix.
 Scalable − Python provides a better structure and support for large
programsthan shell scripting.
 Apart from the above-mentioned features, Python has a big list of good
features, few are listed below -
 It supports functional and structured programming methods as well as
OOP.
 It can be used as a scripting language or can be compiled to byte-code
forbuilding large applications.
 It provides very high-level dynamic data types and supports dynamic
typechecking.
 It supports automatic garbage collection.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and
Java.
 Python is available on a wide variety of platforms including Linux and
MacOS X. Let's understand how to set up our Python environment.

4.1.4 Development:
 Python's development is conducted largely through the Python
Enhancement Proposal (PEP) process, the primary mechanism for
36
proposing major new features, collecting community input on issues and
documenting Python design decisions. Python coding style is covered in
PEP 8.
Outstanding PEPs are reviewed and commented on by the Python
community and the steering council.
 Enhancement of the language corresponds with development of the
CPython reference implementation. The mailing list python-dev is the
primary forum for the language's development. Specific issues are
discussed in the Roundup bug tracker maintained at python.org.
Development originally took place on a self-hosted source-code repository
running Mercurial, until Python moved to GitHub in January 2017.
 C Python's public releases come in three types, distinguished by which
part ofthe version number is incremented:
 Backward-incompatible versions, where code is expected to break and
need to be manually ported. The first part of the version number is
incremented. These releases happen infrequently for example, version 3.0
was released 8 years after 2.0.
 Major or "feature" releases, about every 18 months, are largely compatible
but introduce new features. The second part of the version number is
incremented. Each major version is supported by bugfixes for several years
after its release.
 Bugfix releases, which introduce no new features, occur about every 3
months and are made when a sufficient number of bugs have been fixed
upstream since the last release. Security vulnerabilities are also patched in
these releases. The third and final part of the version number is
incremented.
 Many alpha, beta, and release-candidates are also released as previews and
for testing before final releases. Although there is a rough schedule for
37
each release, they are often delayed if the code is not ready. Python's
development team monitors the state of the code by running the large unit
test suite during development, and using the Build Bot continuous
integration system.

 The community of Python developers has also contributed over


86,000 software modules (as of 20 August 2016) to the Python Package
Index (Py PI), the official repository of third-party Python libraries.
 The major academic conference on Python is P y Con. There are also
specialPython mentoring programmers, such as P y ladies.
 Python features a dynamic type system and automatic memory
management and supports multiple programming ,paradigms, including
object-oriented, imperative, functional programming, and procedural
styles. It has a largeand comprehensive standard library.

38
CHAPTER 5
SOFTWARE SPECIFICATION

5.1. ANACONDA:
5.1.1 ABOUT ANACONDA:

Anaconda is an amazing collection of scientific Python packages,


tools, resources, and IDEs. This package includes many important tools
that a Data Scientist can use to harness the incredible force of Python.
Anaconda individual edition is free and open source. This makes working
with Anaconda accessible and easy. Anaconda has grown an exceptionally
large community. Anaconda makes it easy to connect to several different
scientific, Machine Learning and Data Science packages.

5.2.The key features:

 Neural Networks

 Machine Learning

 Predictive Analytics

 Data Visualization

39

5.3.What is Anaconda?

 Anaconda is a free open source data science tool that focusses on the
distribution of R and Python programming languages for data science and
machine learning tasks. Anaconda aims at simplifying the data
management and deployment of the same.
 Anaconda is a powerful data science platform for data scientists. The
package manager of Anaconda is the anaconda which manages the package
versions.
 Anaconda is a tool that offers all the required package involved in data
science at once. The programmers choose Anaconda for its ease of use.
 Anaconda is written in Python, and the worthy information on anaconda
is unlike pip in Python, this package manager checks for the requirement
of the dependencies and installs it if it is required. More importantly,
warning signsare given if the dependencies already exist.
 Anaconda very quickly installs the dependencies along with frequent
updates.It facilitates creation and loading with equal speed along with easy
environment switching.
 The installation of Anaconda is very easy and most preferred by non-
programmers who are data scientists.
 Anaconda is pre-built with more than 1500 Python or R data science
packages. Anaconda has specific tools to collect data using Machine
learningand Artificial Intelligence.

40
 Anaconda is indeed a tool used for developing, testing and training in
one single system. The tool can be managed with any project as the
environment is easily manageable.
 Anaconda is great for deep models and neural networks. You can build
models, deploy them, and integrate with leading technologies in the
subject. Anaconda is optimized to run efficiently for machine learning
tasks and will save you time when developing great algorithms. Over 250
packages are included in the distribution. You can install other third-party
packages through the Anaconda terminal with conda install. With over
7500 data science and machine learning packages available in their cloud-
based repository, almost any package you need will be easily accessible.
Anaconda offers individual, team, and enterprise editions. Included also is
support for the R programming language.
 The Anaconda distribution comes with packages that can be used on
Windows, Linux, and MacOS. The individual edition includes popular
package names like numpy, pandas, scipy, sklearn, tensorflow, pytorch, matplotlib,
and more. The Anaconda Prompt and PowerShell make working within the filesystem
easy and manageable. Also, the GUI interface on Anaconda Navigator makes working
with everything exceptionally smooth. Anaconda is an excellent choice if you are
looking for a thriving community of Data Scientists and ever-growing support in the
industry. Conducting Data Scienceprojects is an increasingly simpler task with the help
of great tools like this.

41
5.4 Creating virtual environment :
 Like many other languages Python requires a different version for
different kind of applications. The application needs to run on a specific
version of the language because it requires a certain dependency that is
present in older versions but changes in newer versions.
 Virtual environments make it easy to ideally separate different
applications and avoid problems with different dependencies. Using
virtual environment we can switch between both applications easily and
get them run.
 There are multiple ways of creating an environment using virtual env,
environment and anaconda.
 Anaconda command is preferred interface for managing installations and
virtual environments with the Anaconda Python distribution.

5.5.Anaconda Navigator:

 Anaconda Navigator is a desktop graphical user interface (GUI) included


in Anaconda distribution that allows users to launch applications and
manage conda packages, environments and channels without using
command-line commands.
 Navigator can search for packages on Anaconda Cloud or in a local
Anaconda Repository, install them in an environment, run the packages and
update them. It is available for Windows, macOS and Linux.

The following applications are available by default in Navigator:

 Jupyter Lab
 Jupyter Notebook
 Qt Console

42
 Glue
 Orange
 RStudio
 Visual Studio Code
 Spyder

5.6 FEASIBILITY STUDY:

 The feasibility of the project is analyzed in this phase and business


proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed
system is to be carried out.
 This is to ensure that the proposed system is not a burden to the company.
For feasibility analysis, some understanding of the major requirements for
the system is essential. Three key considerations involved in the feasibility
analysis are:
5.6.1.Economic Feasibility:

 This study is carried out to check the economic impact will have on the
system will have on the organization. The amount of fund that the company
can pour into the research and development of the system is limited.
 The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies
used are freely available. Only the customized products have to be
purchased.

43
5.6.2.Technical Feasibility:

 This study is carried out to check the technical feasibility, that is, the
technicalrequirements of the system. Any system developed must not have
a high demand on the available technical resources.
 This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes
forthe implementing this system.

5.6.3. Operational Feasibility:

 The aspect of study is to check the level of acceptance of the system by


the user. This includes the process of training the user to use the system
efficiently. The user must not feel threatened by the system, instead must
accept it as a necessity.
 The level of acceptance by the users solely depends on the methods that
are employed to educate the user about the system and to make him
familiar withit.
 His level of confidence must be raised so that he is also able to make
some constructive criticism, which is welcomed, as he is the final user of the
system.

44
CHAPTER 6

SYSTEM TESTING AND RESULT

6.1 INTRODUCTION:
The purpose of testing is to discover errors. Testing is the process of
trying todiscover every conceivable fault or weakness in a work product. It
provides a way to check the functionality of components, sub-assemblies,
assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific
testing requirement.

6.2 TYPES OF TESTING:


6.2.1 Unit testing:
 Unit testing involves the design of test cases that validate that the
internal program logic is functioning properly, and that program inputs
produce valid outputs. All decision branches and internal code flow
should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its
construction and is invasive.
 Unit tests perform basic tests at component level and test a specific
business process, application, and/or system configuration. Unit tests
ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs
and expected results.

45
6.2.2 Integration testing:
 Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven
and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing
is specifically aimed at exposing the problems that arise from the
combination of components.
6.2.3 Functional test:
 Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements,
system documentation, and user manuals.

 Functional testing is centred on the following items:

Valid Input : identified classes of valid input must be accepted.


Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised.

6.2.4 Systems/Procedures:
 Interfacing systems or procedures must be invoked.

 Organization and preparation of functional tests is focused on


requirements, key functions, or special test cases. In addition,
systematic coverage pertaining to identify Business process flows; data
fields, predefined processes, and successive processes must be
considered for testing.

46
6.2.5 System Test:
 System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable
results. An example of system testing is the configuration oriented system
integration test. System testing is based on process descriptions and
flows, emphasizingpre-driven process links and integration points.

6.2.6 White Box Testing:


 White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the
software, or at least its purpose. It is purpose. It is used to test areas that
cannot be reachedfrom a black box level.

6.2.7 Black Box Testing:


 Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black
box tests, as most other kinds of tests, must be written from a definitive
source document, such as specification or requirements document, such
as specification or requirements document.
 It is a testing in which the software under test is treated, as a black box
you cannot “see” into it. The test provides inputs and responds to
outputs withoutconsidering how the software works.

47
6.2.8 Unit Testing:
 Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding
and unit testing to be conducted as two distinct phases.
6.2.8.1 Test strategy and approach:
 Field testing will be performed manually and functional tests will be
written in detail.

6.2.8.2 Test objectives:


 All field entries must work properly.
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.
6.2.8.3 Features to be tested:
 Verify that the entries are of the correct format
 No duplicate entries should be allowed
 All links should take the user to the correct page.

6.2.9 Integration Testing:

 Software integration testing is the incremental integration testing of two


or more integrated software components on a single platform to produce
failurescaused by interface defects.
 The task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.
6.2.9.1 Test Results:
All the test cases mentioned above passed successfully. No defects
encountered.

48
6.2.9.2Acceptance Testing:
 User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system
meets the functional requirements.

6.3 RESULT

6.3.1 Hardware part of the project

We used a stepdown transformer to power the gas sensor and the


raspberry pico, the pico as the system's brains, and a trio of MQ135,
MQ5, and MQ2 sensors to measure the concentration of gases in the air
and save the results in a database.
The sensors, which changes its resistance for different concentrations of
varied gasses. The sensor is connected in series with a variable resistor to
form a voltage divider circuit, and the variable resistor is used to change
sensitivity.

49
6.3.2 Software Code

6.3.3 Graph of Prediction

Algorithms, such as Modified adaboost, have been proposed for the


Chennai, India, air quality dataset in order to forecast the AQI. A
comparison of the suggested algorithms' performance with that of the
standard adaboost has been made, with the evaluation based on Accuracy.
The Accuracy for each method was determined by factoring in the mean
recall and mean precision across all classes. According on empirical
evidence, the proposed classifiers outperform the standard adaboost
classifiers on accuracy measures. The error rates of these algorithms are
relatively low.

50
6.3.4 PREDICTION GRAPH

51
CHAPTER 7

CONCLUSION

7.1 CONCLUSION
Since air pollution is such a serious problem in urban areas, this paper
explores the performance of machine learning models for air pollution
prediction in Smart Cities in Malaysia. Adaboost's Smart air quality forecasts
were thus the most precise. Thus, further research can be done to develop a
more effective algorithm for addressing this issue. Other air pollutants'
predictions can be tested, too. In smart cities, where forecasts are made
using machine learning, factors like temperature can be used to evaluate the
accuracy of air-pollution forecasts.

7.2 FUTURE SCOPE

Our study fills the gap in the comprehensive research that has been done
on air quality prediction and machine learning. It employs an innovative
bibliometric technique to determine the major progress and new insights in
this field, and by identifying the current hot zones and trends. Results
revealed evidence of a surge of interest in air quality prediction with
machine learning models.

52
REFERENCES:
[1]. J. Sentian, F. Herman, C. Y. Yin and J. C. H. Wui, "Long-term air
pollution trend analysis in Malaysia," International Journal of
Environmental Impacts 2(4):309-324, vol. 2, 2019.
[2]. D. o. Environment, "Air Pollutant Index," [Online]. Available:
http://apims.doe.gov.my/public_v2/aboutapi.html.
[3]. United States Environmental Protection Agency, "Particulate Matter
(PM) Pollution," 1 October 2020. [Online]. Available:
https://www.epa.gov/pmpollution/particulate-matter-pm-basics.
[4]. S. Ameer, M. Ali Shah, A. Khan, H. Song, C. Maple, S. U. Islam
and M. N. Asghar, "Comparative Analysis of Machine Learning
Techniques For Predicting Air Quality in Smart Cities," Urban Computing
and Intelligence, vol. 7, p. 128325, 2017.
[5]. U. Mahalingam, K. Elangovan, H. Dobhal, C.
[6]. Valiappa, S. Shresta and G. Kedam, "A Machine Learning Model to
Air Quality Prediction for Smart Cities," vol. 19, p. 452, 2019.
[7]. R. M. Espana, A. B. Crespo, I. Timon, J. Soto, A. Munoz and J. M.
Cecilia, "Air Pollution in Smart Cities through Machine Learning
Methods," Universal Computer Science, vol. 24, 2017.
[8]. Yi, Wei, Kin Lo, Terrence Mak, Kwong Leung, Yee Leung, and
Mei Meng. "A survey of wireless sensor network based air pollution
monitoring systems." Sensors 15, no. 12 (2015): 31392-31427.

53
[9]. Y. Xing, Y. Xu, M. Shi, and Y. Lian, \The impact of PM2 . 5 on the
human respiratory system," vol. 8, no. I, pp. 69{74, 2016.
[10]. M. M. Rathore, A. Paul, A. Ahmad, and S. Rho, \US CR," Comput.
Networks, no. 2016, 2015.
[11]. Asgari, Marjan, Mahdi Farnaghi, and Zeinab Ghaemi. "Predictive
mapping of urban air pollution using Apache Spark on a Hadoop cluster."
In Proceedings of the 2017 International Conference on Cloud and Big
Data Computing, pp. 89-93. ACM, 2017.
[12]. D . Zhu, C. Cai, T. Yang, and X. Zhou, \A Machine Learning
Approach for Air Quality Prediction: Model Regularization and
Optimization," no. December,pp. 1{14, 2017.
[13]. R. W. Gore, \An Approach for Classi_cation of Health Risks Based
on Air Quality Levels," pp. 58{61, 2017.
[14]. K. G. Ri, R. Manimegalai, G. D. M. Si, R. Si, U. Ki, and R. B. Ni,
\Air Pollution Analysis Using Enhanced K-Means Clustering Algorithm
for Real Time Sensor Data," no. August 2006, pp. 1945{1949, 2016.
[15]. N. Zimmerman et al., \Closing the gap on lower cost air quality
monitoring:machine learning calibration models to improve low-cost
sensor performance," no. 2, pp. 1~36, 2017.
[16]. I. Bougoudis, K. Demertzis, and L. Iliadis, \EANN HISYCOL a
hybrid computational intelligence system for combined machine learning:
the case of air pollution modeling in Athens," Neural Comput. Appl., vol.
27, no. 5, pp. 1191{1206, 2016.
[17]. C. Yan, S. Xu, Y. Huang, Y. Huang, and Z. Zhang, \Two-Phase
Neural Network Model for Pollution Concentrations Forecasting," Proc. -
5th Int. Conf. Adv. Cloud Big Data, CBD 2017, pp. 385{390, 2017.

54

You might also like