Batch 3 Report
Batch 3 Report
A PROJECT REPORT
Submitted by
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr. T.Suresh, M.E., Ph.D., Mr. T.Joel, M.E.,(Ph.D)
ii
ACKNOWLEDGEMENT
We would like to express our heartfelt thanks to the Almighty, our beloved
parentsfor their blessings and wishes for successfully doing this project.
We are extremely thankful to Dr. T. Suresh M.E., Ph.D, Professor and Head,
Department of Electronics and Communication Engineering, for having
permitted us to carry out this project effectively.
We convey our sincere thanks to our mentor, skillful and efficient supervisor,
Mr.T.Joel M.E.,(Ph.D) Associate Professor for his extremely valuable guidance
throughout the course of project.
We are grateful to our Project Co-ordinators and all the department staff
members for their intense support.
iii
ABSTRACT
Air Pollution in smart cities in the world has been drastically increasing lately
and the increase in the concentration of particulate matter in the air is a threat
for the country and citizens as it can out-turn unbearable consequences such as
cardiovascular disease and worsen asthma. PM2.5 is a deadly air pollutant that
is a mixture of solid and liquid coarse particles and has a diameter of 2.5
micrometres.
In India, traffic congestion has been the main contributor to developing air
pollution in smart cities such as Delhi and Bombay. The systematic way of air
pollution prediction using machine learning has been widely studied globally
over the years and many machine learning algorithms were studied and tested
to find the solution to air pollution in their country. However, very few
approaches were done in INDIA to predict air pollution using machine learning
methods. project aims to implement machine learning algorithms to find the
accuracy of the prediction of particulate matter, in air pollution in smart cities
of INDIA. To test the implementation of machine learning in this prediction,
Adaboost algorithm is chosen and using the smart city air Pollution dataset. The
outcome of this research is that Adaboost gave the best accuracy in prediction
of Particulate Matter, Air Pollution Index by using Raspberry pi PICO
hardware in smart cities of INDIA. This project implements in real-time
hardware setup with Raspberry pi Pico and IoT supporting node.
iv
TABLE OF CONTENTS
ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF FIGURES ix
LIST OF ABBREVATION x
1 INTRODUCTION 1
1.1 OBJECTIVES 6
1.2 EXISTING METHOD
6
1.7 HARDWARE/SOFTWARE
REQUIREMENT 9
v
3.1 POWER SUPPLY 14
3.1.4 FEATURES 18
3.1.5 APPLICATION 19
3.2.4 APPLICATIONS 24
3.3.2 FEATURES 25
3.3.3 APPLICATIONS 25
3.4.2 SPECIFICATIONS 27
vi
3.6 ADABOOST ALGORITHM 30
4 SOFTWARE DESCRIPTION 36
4.1 PYTHON 36
4.1.4 DEVELOPMENT 38
5 SOFTWARE SPECIFICATIONS 41
5.1 ANACONDA 41
6.1 INTRODUCTION 47
6.3 RESULT 51
vii
7 CONCLUSION & FUTURE SCOPE 54
7.1 CONCLUSION 54
REFERENCES 55
viii
LIST OF FIGURES
Figure Description
No
3.6.1 Raspberry Pi
ix
LIST OF ABBREAVIATIONS
Abbreviation Expansion
Particulate Matter
PM
x
CHAPTER 1
INTRODUCTION
1
for the benefit of its people. In terms of information and communications
technology (ICT) and the number of people living in urban areas, Kuala
Lumpur and Johor Bahru are two of the developed smart cities in
Malaysia. Both the process of industrialization and the movement of
people from rural to urban areas have contributed to the rapid growth of
urban populations in the modern world. The rise in the city's population
has resulted in an increase in the number of people who use various modes
of transportation and consume various forms of energy, both of which
have contributed to the expansion of the city's industrial capacity and its
vehicle population. As a result, the findings of a number of empirical
studies have led researchers to the conclusion that the issue of air quality
in smart cities has been one of the city's primary challenges, and that
machine learning has provided a better and more strategic solution to the
problem of air quality prediction. In contrast to the rest of the world, the
application of machine learning in Malaysia to the forecasting of air
pollutants and air pollutions has not been widely recognized. Since there
has been significant development in the prediction of air pollution all over
the world over the course of the last few decades, it is possible that the
concentration of air pollutants in smart cities in Malaysia that are predicted
using ML techniques will be accurate.
According to the World Health Organization (WHO), air pollution
is a contributing factor in approximately 1.3 million deaths each year
around the world. The release of pollutants into the atmosphere has many
negative effects, one of which is a deterioration in the quality of the air.
Other negative effects, such as acid rain, global warming, the production
of aerosols, and photochemical smog, have also worsened over the course
of the past few decades. Many researchers have been motivated to
2
investigate the underlying pollution-related conditions that are
contributing to COVID-19 pandemics in different countries as a result of
the recent rapid spread of COVID-19. Air pollution has been linked to
significantly higher COVID-19 death rates, and patterns in COVID-19
death rates mimic patterns in both areas with a high population density
and areas with a high PM 2.5 exposure. This is evidenced by several
pieces of circumstantial evidence. Because of everything that has been
discussed up to this point, it is absolutely necessary to forecast and prepare
for changes in pollution levels in order to assist communities and
individuals in becoming more effective at mitigating the harmful effects of
air pollution. Evaluation of the air's quality is an important factor in both
the monitoring and the regulation of pollution levels in the atmosphere.
The Environmental Protection Agency (EPA) monitors common
pollutants such as ground-level ozone (O 3), sulphur dioxide (SO 2),
particulate matter (PM 10 and PM 2.5), carbon monoxide (CO), carbon
dioxide (CO 2), and nitrogen dioxide (NO 2). The Air Quality Index
(AQI) is an index that is commonly used to indicate how clean or polluted
the air is currently or how polluted the air is forecasted to become in
certain areas. These substances are included in the composition of the
AQI. As the Air Quality Index (AQI) rises, a greater proportion of the
population will be subjected to the impacted conditions. Different
countries have their own air quality indices, which correspond to different
air quality standards in those countries. Lead, ozone, particulate matter 10,
particulate matter 2.5, nitrogen dioxide, and sulphur dioxide are the six
pollutants that the United States Environmental Protection Agency (EPA)
tracks at more than 4000 locations across the country.
3
1.1 OBJECTIVE
Time Consuming
Biased results.
Approach of using historical data
4
pollution in INDIA's smart cities.
Unbiased results
Time Efficient
Flexible Algorithm
5
1.6 BLOCK DIAGRAM:
RASPBERRY PI PICO
RSPM SENSOR
MQ135 SENSOR
MQ05 SENSOR
IOT MODULE
LCD DISPLAY
POWER SUPPLY
PYTHON IDE
PYTHON LANGUAGE
7
CHAPTER 2
LITERATURE SURVEY
[1] “A deep learning model for air quality prediction in smart cities, 2017”-
IEEE International Conference on Big Data (Big Data)
ABSTRACT:
8
[2] “Comparative Analysis of Machine Learning Techniques for
Predicting Air Quality in Smart Cities,2019”-IEEE Access
ABSTRACT:
9
[3] “A Machine Learning Model for Air Quality Prediction for Smart
Cities, 2019”-International Conference on Wireless
Communications Signal Processing and Networking (WiSPNET)
ABSTRACT:
Air quality of a certain region can be used as one of the major factor
determining pollution index also how well the city’s industries and
population is managed. Urban air quality monitoring has been a constant
challenge with the advent of industrialization. Air pollution has remained a
major challenge for the public and the government all over the world. Air
pollution causes noticeable damage to the environment as well as to
human health resulting into acid rain, global warming, heart diseases and
skin cancer to the people. This paper addresses the challenge of predicting
the Air Quality Index (AQI), with the aim to minimize the pollution before
it gets adverse, using two Machine Learning Algorithms: Neural Networks
and Support Vector Machines. The air pollution databases were extracted
from the Central Pollution Control Board (CPCB), Ministry of
Environment, Forest and Climate change, Government of India. The
proposed Machine Learning (ML) model is promising in prediction
context for the Delhi AQI. The results show improvement of the
prediction accuracy and suggest that the model can be used in other smart
cities as well.
10
[4] “Air Quality Prediction in Smart Cities Using Machine Learning
Technologies Based on Sensor Data: A Review,2020”-MDPI
ABSTRACT:
11
CHAPTER 3
HARDWARE DESCRIPTION
INTRODUCTION
The transformer steps up or steps down the input line voltage and isolates
the power supply from the power line. The RECTIFIER section converts
the alternating current input signal to a pulsating direct current. However,
as you proceed in this chapter you will learn that pulsating dc is not
desirable. For this reason a FILTER section is used to convert pulsating dc
to a purer, more desirable form of dc voltage.
12
The final section, the REGULATOR, does just what the name implies. It
maintains the output of the power supply at a constant level in spite of
large changes in load current or input line voltages. Now that you know
what each section does, let's trace an ac signal through the power supply.
At this point you need to see how this signal is altered within each section
of the power supply. Later on in the chapter you will see how these
changes take place. In view B of figure 4-1, an input signal of 115 volts ac
is applied to the primary of the transformer. The transformer is a step-up
transformer with a turns ratio of 1:3. You can calculate the output for this
transformer by multiplying the input voltage by the ratio of turns in the
primary to the ratio of turns in the secondary; therefore, 115 volts ac ´ 3 =
345 volts ac (peak-to- peak) at the output. Because each diode in the
rectifier section conducts for 180 degrees of the 360-degree input, the
output of the rectifier will be one-half, or approximately 173 volts of
pulsating dc. The filter section, a network of resistors, capacitors, or
inductors, controls the rise and fall time of the varying signal;
consequently, the signal remains at a more constant dc level. You will see
the filter process more clearly in the discussion of the actual filter circuits.
The output of the filter is a signal of 110 volts dc, with ac ripple riding on
the dc. The reason for the lower voltage (average voltage) will be
explained later in this chapter. The regulator maintains its output at a
constant 110-volt dc level, which is used by the electronic equipment
(more commonly called the load).
13
Simple 5V power supply for digital circuits
Brief description of operation: Gives out well regulated +5V output,
output current capability of 100 mA
Circuit protection: Built-in overheating protection shuts down output when
regulator IC gets too hot
Circuit complexity: Very simple and easy to build
Circuit performance: Very stable +5V output voltage, reliable operation
Availability of components: Easy to get, uses only very common basic
components
Design testing: Based on datasheet example circuit, I have used this circuit
successfully as part of many electronics projects
Applications: Part of electronics devices, small laboratory power supply
Power supply voltage: Unregulated DC 8-18V power supply
Power supply current: Needed output current + 5 mA
Component costs: Few dollars for the electronics components + the input
transformer cost
This circuit can give +5V output at about 150 mA current, but it can be
increased to 1 A when good cooling is added to 7805 regulator chip. The
circuit has over overload and therminal protection. The capacitors must
have enough high voltage rating to safely handle the input voltage feed to
circuit. The circuit is very easy to build for example into a piece of Vero
board.
15
Pinout of the 7805 regulator IC
Unregulated voltage in
Ground
Regulated voltage out
Component list
7805 regulator IC
100 uF electrolytic capacitor, at least 25V voltage rating
10 uF electrolytic capacitor, at least 6V voltage rating
100 nF ceramic or polyester capacitor
3.1.4 FEATURES:
• Output current:1A
• Supply voltage: 220-230VAC
• Output voltage: 12VDC
• Reduced costs
• Increased value across front-office and back-office functions
• Access to current, accurate, and consistent data
• It generates adapter metadata as WSDL files with J2CA extension.
16
3.1.5 APPLICATIONS:
SMPS applications
17
MQ307A(CO), MQ309A(CO and flammable gas).An MQ135 air quality
sensor is one type of MQ gas sensor used to detect, measure, and monitor
a wide range of gases present in air like ammonia, alcohol, benzene,
smoke, carbon dioxide, etc. It operates at a 5V supply with 150mA
consumption. Preheating of 20 seconds is required before the operation, to
obtain the accurate output.
It is a semiconductor air quality check sensor suitable for monitoring
applications of air quality. It is highly sensitive to NH3, NOx, CO2,
benzene, smoke, and other dangerous gases in the atmosphere. It is
available at a low cost for harmful gas detection and monitoring
applications.
If the concentration of gases exceeds the threshold limit in the air, then the
digital output pin goes high. The threshold value can be varied by using
the potentiometer of the sensor. The analog output voltage is obtained
from the analog pin of the sensor, which gives the approximate value of
the gas level present in the air.
18
3.2.2 PIN CONFIGURATION:
The MQ135 air quality sensor is a 4-pin sensor module that features both
analog and digital output from the corresponding pins.
Pin 1: VCC: This pin refers to a positive power supply of 5V that power
up the MQ135 sensor module.
Pin 2: GND (Ground): This is a reference potential pin, which connects
the MQ135 sensor module to the ground.
Pin 3: Digital Out (Do): This pin refers to the digital output pin that gives
the digital output by adjusting the threshold value with the help of a
potentiometer. This pin is used to detect and measure any one particular
gas and makes the MQ135 sensor work without a microcontroller.
Pin 4: Analog Out (Ao): This pin generates the analog output signal of 0V
to 5V and it depends on the gas intensity. This analog output signal is
proportional to the gas vapor concentration, which is measured by the
MQ135 sensor module. This pin is used to measure the gases in PPM. It is
driven by TTL logic, operates with 5V, and is mostly interfaced with
19
microcontrollers.
H-pins: There are 2 H-pins, where one is connected to the voltage supply
and the other is connected to the ground.
A-pins: Here A-pins and B-pins can be interchanged. These are connected
to the voltage supply.
B-pins: Here A-pins and B-pins can be interchanged. One pin is used to
generate output while the other pin is connected to the ground.
20
3.2.3 SPECIFICATIONS AND FEATURES:
The MQ135 air quality sensor specifications and features are listed below.
It has a wide detection scope.
High sensitivity and faster response.
Long life and stability.
The operating voltage: +5V.
Measures and detects NH3, alcohol, NOx, Benzene, CO2, smoke etc.
Range of analog output voltage: 0V-5V.
Range of digital output voltage: 0V-5V (TTL logic).
Duration of preheating: 20 seconds.
Used as an analog or digital sensor.
The potentiometer is used to vary the sensitivity of the digital pin.
Heating Voltage: 5V±0.1.
Load resistance is adjustable.
Heater resistance: 33ohms±5%.
Heating consumption:<800mW.
Operating temperature: -10°C to -45°C.
Storage temperature: -20°C to -70°C.
Related humidity: <95%Rh.
Oxygen concentration: 21% (affects the sensitivity).
Sensing resistance: 30kiloohms to 200kiloohms.
Concentration slope rate: ≤0.65.
Preheat time: over 24 hrs.
Simple drive circuit.
21
3.2.4 APPLICATIONS :
The Grove - Gas Sensor (MQ5) module is useful for gas leakage detection
(in home and industry). It is suitable for detecting H2, LPG, CH4, CO,
Alcohol. Due to its high sensitivity and fast response time, measurements
can be taken as soon as possible. The sensitivity of the sensor can be
adjusted by using the potentiometer.The sensor value only reflects the
approximated trend of gas concentration in a permissible error range, it
DOESNOT represent the exact gas concentration. The detection of certain
components in the airusually requires a more precise and costly
instrument, which cannot be done with a single gas sensor. If your project
is aimed at obtaining the gas concentration at a very precise level, then we
do not recommend this gas sensor.
22
3.3.2 FEATURES:
• Wide detecting scope
• Stable and long life
• Fast response and High sensitivity
3.3.3APPLICATION:
MQ2 gas sensor can be used to detect the presence of LPG, Propane and
Hydrogen, also could be used to detect Methane and other combustible
steam, it is low cost and suitable for different application. Sensor is
sensitive to flammable gas and smoke. Smoke sensor is given 5 volt to
power it. Smoke sensor indicate smoke by the voltage that it outputs. More
smoke more output. A potentiometer is provided to adjust the sensitivity.
Sn02 is the sensor used which is of low conductivity when the air is clean.
23
But when, smoke exist sensor provides an analog resistive output based on
concentration of smoke. The circuit has a heater. Power is given to heater
by VCC and GND from power supply. The circuit has a variable resistor.
The resistance across the pin depends on the smoke in air in the sensor.
The resistance will be lowered if the content is more. And voltage is
increased between the sensor and load resistor.
The MQ2 has an electrochemical sensor, which changes its resistance for
different concentrations of varied gasses. The sensor is connected in series
with a variable resistor to form a voltage divider circuit, and the variable
resistor is used to change sensitivity. When one of the above gaseous
elements comes in contact with the sensor after heating, the sensor’s
resistance change. The change in the resistance changes the voltage across
the sensor, and this voltage can be read by a microcontroller. The voltage
value can be used to find the resistance of the sensor by knowing the
reference voltage and the other resistor’s resistance. The sensor has
different sensitivity for different types of gasses.
24
3.4.2 SPECIFICATIONS:
25
Fig 3.4.2 Board Schematic Diagram
3.5.1.GENERAL DESCRIPTION:
26
• Designed to be put on a flat surface like a module
• Easily run the device on micro-USB, external power, or batteries.
27
3.6 ADABOOST ALGORITHM:
To create the first learner, the algorithm takes the first feature, i.e., feature
1 and creates the first stump, f1. It will create the same number of stumps
as the number of features. Inthe case below, it will create 3 stumps as there
29
are only 3 features in this dataset. From these stumps, it will create three
decision trees. This process can be called the stumps- base learner model.
Out of these 3 models, the algorithm selects only one. Two properties are
considered while selecting a base learner – Gini and Entropy. We must
calculate Gini or Entropy the same way it is calculated for decision trees.
The stump with the least value will be the first base learner. In the figure
below, all the 3 stumps can be made with 3 features.The number below the
leaves represents the correctly and incorrectly classified records. By using
these records, the Gini or Entropy index is calculated. The stump that has
the least Entropy or Gini will be selected as the base learner. Let’s assume
that the entropy index is the least for stump 1. So, let’s take stump 1, i.e.,
feature 1 as our first base learner.
Here, feature (f1) has classified 2 records correctly and 1 incorrectly. The
row in the figure that is marked red is incorrectly classified. For this, we
will be calculating the totalerror.
Step 2 – Calculating the Total Error (TE)
The total error is the sum of all the errors in the classified record for
sample weights. Inour case, there is only 1 error, so Total Error (TE) =
1/5.
Step 3 – Calculating Performance of the Stump Formula for calculating
Performance of the Stump is: –where, ln is natural log and TE is Total
Error.
In our case, TE is 1/5. By substituting the value of total error in the above
formula and solving it, we get the value for the performance of the stump
as 0.693. Why is it necessary to calculate the TE and performance of a
stump? The answer is, we must
30
update the sample weight before proceeding to the next model or stage
because if the same weight is applied, the output received will be from
the first model. In boosting, only the wrong records/incorrectly classified
records would get more preference than the correctly classified records.
Thus, only the wrong records from the decision tree/stump are passed on
to another stump. Whereas, in AdaBoost, both records were allowed to
pass and the wrong records are repeated more than the correct ones. We
must increase the weight for the wrongly classified records and decrease
the weight for the correctly classified records. In the next step, we will be
updating the weights based on the performance of the stump.
31
0.50 can be known as the normalized weight. In the below figure, we can
see all the normalized weight and their sum is approximately 1.
Step 5 – Creating a New Dataset
Now, it’s time to create a new dataset from our previous one. In the new
dataset, the frequency of incorrectly classified records will be more than
the correct ones. The new dataset has to be created using and considering
the normalized weights. It will probably select the wrong records for
training purposes. That will be the second decision tree/stump. To make a
new dataset based on normalized weight, the algorithm will divide it into
buckets.
So, our first bucket is from 0 – 0.13, second will be from 0.13 –
0.63(0.13+0.50), third will be from 0.63 – 0.76(0.63+0.13), and so on.
After this the algorithm will run 5 iterations to select different records
from the older dataset. Suppose in the 1st iteration, the algorithm will take
a random value 0.46 to see which bucket that value falls into and select
that record in the new dataset. It will again select a random value, see
which bucket it is in and select that record for the new dataset. The same
process is repeated 5 times.
There is a high probability for wrong records to get selected several times.
This will form the new dataset. It can be seen in the image below that row
number 2 has been selected multiple times from the older dataset as that
row is incorrectly classified in the previous one.
Based on this new dataset, the algorithm will create a new decision
tree/stump and it will repeat the same process from step 1 till it
sequentially passes through all stumps and finds that there is less error as
compared to normalized weight that we had in the initial stage.
In Python, coding the AdaBoost algorithm takes only 3-4 lines and is easy.
32
We must import the AdaBoost classifier from the sci-kit learn library.
Before applying AdaBoost to any dataset, one should split the data into
train and test. After splitting the data into train and test, the training data is
ready to train the AdaBoost model. This data has both the input as well as
output. After training the data, our algorithm will try to predict the result
on the test data. Test data consists of only the inputs. The output of test
data is not known by the model. Accuracy can be checked by comparing
the actual output of the test data and the output predicted by the model.
This can help us conclude how our model is performing and how much
accuracy can be considered, depending on the problem statement. If it’s a
medical problem, then accuracy should be above 90%. Usually, 70%
accuracy is considered good. Accuracy also depends on factors apart from
the type of model. The figure below shows the code used to implement
AdaBoost.
Adaptive Boosting is a good ensemble technique and can be used for both
Classification and Regression problems. In most cases, it is used for
classification problems. It is better than any other model as it improves
model accuracy which can be checked by going in sequence. One can first
try decision trees and then go for the random forest to finally apply the
boost and implement AdaBoost. Accuracy keeps increasing as we follow
the above sequence. The weight-assigning technique after every iteration
makes the AdaBoost algorithm different from all other boosting
algorithms and that is the best thing about it.
33
CHAPTER 4
SOFTWARE DESCRIPTION
4.1 .PYTHON:
4.1.1. ABOUT PYTHON:
Python is an interpreted, high-level, general-purpose programming
language.Python is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including procedural,
object-oriented, and functional programming. Python is often described
as a "batteries included" language due to its comprehensive
standard library.Python interpreters are available for many operating
systems. A global community of programmers develops and maintains C
Python, an open source reference implementation. A non-profit
organization, the Python Software Foundation, manages and directs
resources for Python and C Python development.
Python is a high-level, interpreted, interactive and object-oriented
scripting language. Python is designed to be highly readable. It uses
English keywords frequently where as other languages use punctuation,
and it has fewer syntactical constructions than other languages
Python is Interpreted − Python is processed at runtime by the
interpreter. You do not need to compile your program before executing
it. This is similarto PERL and PHP.
Python is Interactive − You can actually sit at a Python prompt interact
with the interpreter directly to write your programs.
Python is Object-Oriented − Python supports Object-Oriented style or
technique of programming that encapsulates code within objects.
34
Python is a Beginner's Language − Python is a great language for the
beginner-level programmers and supports the development of a wide range
ofapplications from simple text processing to WWW browsers to games.
4.1.2 History of Python:
Python was developed by Guido van Rossum in the late eighties and
early nineties at the National Research Institute for Mathematics and
Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-
3, C, C++, Algol-68, Small Talk, and Unix shell and other scripting
languages.
Python is copyrighted. Like Perl, Python source code is now available
under the GNU General Public License (GPL).
Python is now maintained by a core development team at the institute,
although Guido van Rossum still holds a vital role in directing its progress.
4.1.3 Python Features:
Python's features include −
Easy-to-learn − Python has few keywords, simple structure, and a
clearlydefined syntax. This allows the student to pick up the language
quickly.
Easy-to-read − Python code is more clearly defined and visible to the
eyes.
Easy-to-maintain − Python's source code is fairly easy-to-maintain.
A broad standard library − Python's bulk of the library is very portable
andcross-platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which
allows interactive testing and debugging of snippets of code.
35
Portable − Python can run on a wide variety of hardware platforms and
hasthe same interface on all platforms.
4.1.4 Development:
Python's development is conducted largely through the Python
Enhancement Proposal (PEP) process, the primary mechanism for
36
proposing major new features, collecting community input on issues and
documenting Python design decisions. Python coding style is covered in
PEP 8.
Outstanding PEPs are reviewed and commented on by the Python
community and the steering council.
Enhancement of the language corresponds with development of the
CPython reference implementation. The mailing list python-dev is the
primary forum for the language's development. Specific issues are
discussed in the Roundup bug tracker maintained at python.org.
Development originally took place on a self-hosted source-code repository
running Mercurial, until Python moved to GitHub in January 2017.
C Python's public releases come in three types, distinguished by which
part ofthe version number is incremented:
Backward-incompatible versions, where code is expected to break and
need to be manually ported. The first part of the version number is
incremented. These releases happen infrequently for example, version 3.0
was released 8 years after 2.0.
Major or "feature" releases, about every 18 months, are largely compatible
but introduce new features. The second part of the version number is
incremented. Each major version is supported by bugfixes for several years
after its release.
Bugfix releases, which introduce no new features, occur about every 3
months and are made when a sufficient number of bugs have been fixed
upstream since the last release. Security vulnerabilities are also patched in
these releases. The third and final part of the version number is
incremented.
Many alpha, beta, and release-candidates are also released as previews and
for testing before final releases. Although there is a rough schedule for
37
each release, they are often delayed if the code is not ready. Python's
development team monitors the state of the code by running the large unit
test suite during development, and using the Build Bot continuous
integration system.
38
CHAPTER 5
SOFTWARE SPECIFICATION
5.1. ANACONDA:
5.1.1 ABOUT ANACONDA:
Neural Networks
Machine Learning
Predictive Analytics
Data Visualization
39
5.3.What is Anaconda?
Anaconda is a free open source data science tool that focusses on the
distribution of R and Python programming languages for data science and
machine learning tasks. Anaconda aims at simplifying the data
management and deployment of the same.
Anaconda is a powerful data science platform for data scientists. The
package manager of Anaconda is the anaconda which manages the package
versions.
Anaconda is a tool that offers all the required package involved in data
science at once. The programmers choose Anaconda for its ease of use.
Anaconda is written in Python, and the worthy information on anaconda
is unlike pip in Python, this package manager checks for the requirement
of the dependencies and installs it if it is required. More importantly,
warning signsare given if the dependencies already exist.
Anaconda very quickly installs the dependencies along with frequent
updates.It facilitates creation and loading with equal speed along with easy
environment switching.
The installation of Anaconda is very easy and most preferred by non-
programmers who are data scientists.
Anaconda is pre-built with more than 1500 Python or R data science
packages. Anaconda has specific tools to collect data using Machine
learningand Artificial Intelligence.
40
Anaconda is indeed a tool used for developing, testing and training in
one single system. The tool can be managed with any project as the
environment is easily manageable.
Anaconda is great for deep models and neural networks. You can build
models, deploy them, and integrate with leading technologies in the
subject. Anaconda is optimized to run efficiently for machine learning
tasks and will save you time when developing great algorithms. Over 250
packages are included in the distribution. You can install other third-party
packages through the Anaconda terminal with conda install. With over
7500 data science and machine learning packages available in their cloud-
based repository, almost any package you need will be easily accessible.
Anaconda offers individual, team, and enterprise editions. Included also is
support for the R programming language.
The Anaconda distribution comes with packages that can be used on
Windows, Linux, and MacOS. The individual edition includes popular
package names like numpy, pandas, scipy, sklearn, tensorflow, pytorch, matplotlib,
and more. The Anaconda Prompt and PowerShell make working within the filesystem
easy and manageable. Also, the GUI interface on Anaconda Navigator makes working
with everything exceptionally smooth. Anaconda is an excellent choice if you are
looking for a thriving community of Data Scientists and ever-growing support in the
industry. Conducting Data Scienceprojects is an increasingly simpler task with the help
of great tools like this.
41
5.4 Creating virtual environment :
Like many other languages Python requires a different version for
different kind of applications. The application needs to run on a specific
version of the language because it requires a certain dependency that is
present in older versions but changes in newer versions.
Virtual environments make it easy to ideally separate different
applications and avoid problems with different dependencies. Using
virtual environment we can switch between both applications easily and
get them run.
There are multiple ways of creating an environment using virtual env,
environment and anaconda.
Anaconda command is preferred interface for managing installations and
virtual environments with the Anaconda Python distribution.
5.5.Anaconda Navigator:
Jupyter Lab
Jupyter Notebook
Qt Console
42
Glue
Orange
RStudio
Visual Studio Code
Spyder
This study is carried out to check the economic impact will have on the
system will have on the organization. The amount of fund that the company
can pour into the research and development of the system is limited.
The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies
used are freely available. Only the customized products have to be
purchased.
43
5.6.2.Technical Feasibility:
This study is carried out to check the technical feasibility, that is, the
technicalrequirements of the system. Any system developed must not have
a high demand on the available technical resources.
This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes
forthe implementing this system.
44
CHAPTER 6
6.1 INTRODUCTION:
The purpose of testing is to discover errors. Testing is the process of
trying todiscover every conceivable fault or weakness in a work product. It
provides a way to check the functionality of components, sub-assemblies,
assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific
testing requirement.
45
6.2.2 Integration testing:
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven
and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing
is specifically aimed at exposing the problems that arise from the
combination of components.
6.2.3 Functional test:
Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements,
system documentation, and user manuals.
6.2.4 Systems/Procedures:
Interfacing systems or procedures must be invoked.
46
6.2.5 System Test:
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable
results. An example of system testing is the configuration oriented system
integration test. System testing is based on process descriptions and
flows, emphasizingpre-driven process links and integration points.
47
6.2.8 Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding
and unit testing to be conducted as two distinct phases.
6.2.8.1 Test strategy and approach:
Field testing will be performed manually and functional tests will be
written in detail.
48
6.2.9.2Acceptance Testing:
User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system
meets the functional requirements.
6.3 RESULT
49
6.3.2 Software Code
50
6.3.4 PREDICTION GRAPH
51
CHAPTER 7
CONCLUSION
7.1 CONCLUSION
Since air pollution is such a serious problem in urban areas, this paper
explores the performance of machine learning models for air pollution
prediction in Smart Cities in Malaysia. Adaboost's Smart air quality forecasts
were thus the most precise. Thus, further research can be done to develop a
more effective algorithm for addressing this issue. Other air pollutants'
predictions can be tested, too. In smart cities, where forecasts are made
using machine learning, factors like temperature can be used to evaluate the
accuracy of air-pollution forecasts.
Our study fills the gap in the comprehensive research that has been done
on air quality prediction and machine learning. It employs an innovative
bibliometric technique to determine the major progress and new insights in
this field, and by identifying the current hot zones and trends. Results
revealed evidence of a surge of interest in air quality prediction with
machine learning models.
52
REFERENCES:
[1]. J. Sentian, F. Herman, C. Y. Yin and J. C. H. Wui, "Long-term air
pollution trend analysis in Malaysia," International Journal of
Environmental Impacts 2(4):309-324, vol. 2, 2019.
[2]. D. o. Environment, "Air Pollutant Index," [Online]. Available:
http://apims.doe.gov.my/public_v2/aboutapi.html.
[3]. United States Environmental Protection Agency, "Particulate Matter
(PM) Pollution," 1 October 2020. [Online]. Available:
https://www.epa.gov/pmpollution/particulate-matter-pm-basics.
[4]. S. Ameer, M. Ali Shah, A. Khan, H. Song, C. Maple, S. U. Islam
and M. N. Asghar, "Comparative Analysis of Machine Learning
Techniques For Predicting Air Quality in Smart Cities," Urban Computing
and Intelligence, vol. 7, p. 128325, 2017.
[5]. U. Mahalingam, K. Elangovan, H. Dobhal, C.
[6]. Valiappa, S. Shresta and G. Kedam, "A Machine Learning Model to
Air Quality Prediction for Smart Cities," vol. 19, p. 452, 2019.
[7]. R. M. Espana, A. B. Crespo, I. Timon, J. Soto, A. Munoz and J. M.
Cecilia, "Air Pollution in Smart Cities through Machine Learning
Methods," Universal Computer Science, vol. 24, 2017.
[8]. Yi, Wei, Kin Lo, Terrence Mak, Kwong Leung, Yee Leung, and
Mei Meng. "A survey of wireless sensor network based air pollution
monitoring systems." Sensors 15, no. 12 (2015): 31392-31427.
53
[9]. Y. Xing, Y. Xu, M. Shi, and Y. Lian, \The impact of PM2 . 5 on the
human respiratory system," vol. 8, no. I, pp. 69{74, 2016.
[10]. M. M. Rathore, A. Paul, A. Ahmad, and S. Rho, \US CR," Comput.
Networks, no. 2016, 2015.
[11]. Asgari, Marjan, Mahdi Farnaghi, and Zeinab Ghaemi. "Predictive
mapping of urban air pollution using Apache Spark on a Hadoop cluster."
In Proceedings of the 2017 International Conference on Cloud and Big
Data Computing, pp. 89-93. ACM, 2017.
[12]. D . Zhu, C. Cai, T. Yang, and X. Zhou, \A Machine Learning
Approach for Air Quality Prediction: Model Regularization and
Optimization," no. December,pp. 1{14, 2017.
[13]. R. W. Gore, \An Approach for Classi_cation of Health Risks Based
on Air Quality Levels," pp. 58{61, 2017.
[14]. K. G. Ri, R. Manimegalai, G. D. M. Si, R. Si, U. Ki, and R. B. Ni,
\Air Pollution Analysis Using Enhanced K-Means Clustering Algorithm
for Real Time Sensor Data," no. August 2006, pp. 1945{1949, 2016.
[15]. N. Zimmerman et al., \Closing the gap on lower cost air quality
monitoring:machine learning calibration models to improve low-cost
sensor performance," no. 2, pp. 1~36, 2017.
[16]. I. Bougoudis, K. Demertzis, and L. Iliadis, \EANN HISYCOL a
hybrid computational intelligence system for combined machine learning:
the case of air pollution modeling in Athens," Neural Comput. Appl., vol.
27, no. 5, pp. 1191{1206, 2016.
[17]. C. Yan, S. Xu, Y. Huang, Y. Huang, and Z. Zhang, \Two-Phase
Neural Network Model for Pollution Concentrations Forecasting," Proc. -
5th Int. Conf. Adv. Cloud Big Data, CBD 2017, pp. 385{390, 2017.
54