100% found this document useful (2 votes)
381 views33 pages

Rainfall Prediction

The document discusses a project on rainfall prediction. It describes the existing and proposed systems, including the advantages of the proposed system. It outlines the hardware and software specifications. It also covers the system design, testing, implementation and evaluation. The key modules discussed are data collection, pre-processing, feature extraction, and model evaluation. The goal is to build an accurate model for rainfall prediction using machine learning techniques.

Uploaded by

Naasif M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
381 views33 pages

Rainfall Prediction

The document discusses a project on rainfall prediction. It describes the existing and proposed systems, including the advantages of the proposed system. It outlines the hardware and software specifications. It also covers the system design, testing, implementation and evaluation. The key modules discussed are data collection, pre-processing, feature extraction, and model evaluation. The goal is to build an accurate model for rainfall prediction using machine learning techniques.

Uploaded by

Naasif M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

RAINFALL PREDICTION

CONTENT

S.NO TITLE PAGE NO

1. INTRODUCTION

1.1 ABOUT THE PROJECT

1.2 INTRODUCTION

1.3 MODULE DESCRIPTION

2. SYSTEM STUDY

2.1 EXISTING SYSTEM

2.1.1 DRAWBACK OF EXISTING SYSTEM

2.2 PROPOSED SYSTEM

2.2.1 ADVANTAGE OF PROPOSED SYSTEM

3. SYSTEM SPECIFICATION

3.1 HARDWARE SPECIFICATION

3.2 SOFTWARE SPECIFICATION

3.2.1 ABOUT THE SOFTWARE

4. SYSTEM DESIGN

4.1 DATA FLOW DIAGRAM

4.2 DATABASE DESIGN

4.3 INPUT DESIGN

4.4 OUTPUT DESIGN

5. SYSTEM TESTING & IMPLEMENTATION


5.1 SYSTEM TESTING

5.2 SYSTEM IMPLEMENTATION

6. CONCLUSION AND FUTURE ENHANCEMENT

BIBLIOGRAPHY

APPENDIX
A) SCREENSHOTS
B) REPORT
C) SAMPLE CODING

1. INTRODUCTION

1.1 ABOUT THE PROJECT

Rainfall prediction is important as heavy rainfall can lead to many disasters. The
prediction helps people to take preventive measures and moreover the prediction should be
accurate. There are two types of prediction short term rainfall prediction and long term
rainfall. Prediction mostly short term prediction can gives us the accurate result. The main
challenge is to build a model for long term rainfall prediction. Heavy precipitation prediction
could be a major drawback for earth science department because it is closely associated with
the economy and lifetime of human. It’s a cause for natural disasters like flood and drought
that square measure encountered by individuals across the world each year. Accuracy of
rainfall statement has nice importance for countries like India whose economy is basically
dependent on agriculture. The dynamic nature of atmosphere, applied mathematics
techniques fail to provide sensible accuracy for precipitation statement. The prediction of
precipitation using machine learning techniques may use regression. Intention of this project
is to offer non-experts easy access to the techniques, approaches utilized in the sector of
precipitation prediction and provide a comparative study among the various machine learning
techniques.

1.2 INTRODUCTION

Rainfall prediction remains a serious concern and has attracted the attention of
governments, industries, risk management entities, as well as the scientific community.
Rainfall is a climatic factor that affects many human activities like agricultural production,
construction, power generation, forestry and tourism, among others . To this extent, rainfall
prediction is essential since this variable is the one with the highest correlation with adverse
natural events such as landslides, flooding, mass movements and avalanches. These incidents
have affected society for years .
Therefore, having an appropriate approach for rainfall prediction makes it possible to
take preventive and mitigation measures for these natural phenomena.Also these predictions
facilitate the supervision of agriculture activities, construction, tourism, transport, and health,
among others. For agencies responsible for disaster prevention, providing accurate
meteorological predictions can help decision-making in the face of possible occurrence of
natural events.For achieving these predictions there are a number of methods, ranging from
naive methods, to those that use more complex techniques such as artificial intelligence (AI),
artificial neural networks (ANNs) being one of the most valuable and attractive methods for
forecasting tasks. In prediction, ANNs, as opposed to traditional methods in meteorology, are
based on self-adaptive mechanisms that learn from examples and capture functional
relationships between data, even if the relationships are unknown or difficult to describe.Over
the last few years, Deep Learning has been used as a successful mechanism in ANN for
solving complex problems .
Deep Learning is a general term used to refer to a series of multilayer architectures
that are trained using unsupervised algorithms. The main improvement is learning a compact,
valid, and non-linear representation of data via unsupervised methods, with the hope that the
new data representation contributes to the prediction task at hand. This approach has been
successfully applied to fields like computer vision, image recognition, natural language
processing, and bioinformatics.
Deep learning has shown promise for modeling time-series data through techniques
like Restricted Boltzmann Machine (RBM), Conditional RBM, Autoencoder, Recurrent
neural network, Convolution and pooling, Hidden Markov Model .In this experimental study
we use data gathered from a meteorological station located in a central area of Manizales,
Colombia. The data gathered comprises more than a decade of measurements taken in real
time and stored by Instituto deEstudios Ambientales (IDEA) of Universad Nacional de
Colombia, also located in the same city. In order to perform forecasts, a deep rchitecture
combining the use of an autoencoder and a multilayer perceptron is used. For testing the
validity of the proposed model, we optimized the parameters of the deep architecture and
tested the resulting network via a series of error measurement criteria. The results show that
the proposed architecture outperforms the state of the art in the task of predicting the
accumulated daily precipitation for the next day.

1.3 MODULE DESCRIPTION

MODULE

 DATA COLLECTION

 DATA PRE-PROCESSING
 FEATURE EXTRATION

 EVALUATION MODEL

Module Description:

DATA COLLECTION

Data used in this paper is a set of student details in the school records. This step is concerned
with selecting the subset of all available data that you will be working with. ML problems
start with data preferably, lots of data (examples or observations) for which you already know
the target answer. Data for which you already know the target answer is called labelled data.

DATA PRE-PROCESSING

Organize your selected data by formatting, cleaning and sampling from it.

Three common data pre-processing steps are:

1. Formatting

2. Cleaning

3. Sampling

Formatting: The data you have selected may not be in a format that is suitable for you to
work with. The data may be in a relational database and you would like it in a flat file, or the
data may be in a proprietary file format and you would like it in a relational database or a text
file.

Cleaning: Cleaning data is the removal or fixing of missing data. There may be data
instances that are incomplete and do not carry the data you believe you need to address the
problem. These instances may need to be removed. Additionally, there may be sensitive
information in some of the attributes and these attributes may need to be anonym zed or
removed from the data entirely.

Sampling: There may be far more selected data available than you need to work with. More
data can result in much longer running times for algorithms and larger computational and
memory requirements. You can take a smaller representative sample of the selected data that
may be much faster for exploring and prototyping solutions before considering the whole
dataset.

FEATURE EXTRATION

Next thing is to do Feature extraction is an attribute reduction process. Unlike feature


selection, which ranks the existing attributes according to their predictive significance,
feature extraction actually transforms the attributes. The transformed attributes, or features,
are linear combinations of the original attributes. Finally, our models are trained using
Classifier algorithm. We use classify module on Natural Language Toolkit library on
Python.We use the labelled dataset gathered. The rest of our labelled data will be used to
evaluate the models. Some machine learning algorithms were used to classify pre-processed
data. The chosen classifiers were Random forest. These algorithms are very popular in text
classification tasks.

EVALUATION MODEL

Model Evaluation is an integral part of the model development process. It helps to find the
best model that represents our data and how well the chosen model will work in the future.
Evaluating model performance with the data used for training is not acceptable in data
science because it can easily generate overoptimistic and over fitted models. There are two
methods of evaluating models in data science, Hold-Out and Cross Validation to avoid
over fitting, both methods use a test set (not seen by the model) to evaluate model
performance. Performance of each classification model is estimated base on its averaged. The
result will be in the visualized form. Representation of classified data in the form of graphs.
Accuracy is defined as the percentage of correct predictions for the test data. It can be
calculated easily by dividing the number of correct predictions by the number of total
predictions.

While working with data, it can be difficult to truly understand your data when it’s just in
tabular form. To understand what exactly our data conveys, and to better clean it and select
suitable models for it, we need to visualize it or represent it in pictorial form. This helps
expose patterns, correlations, and trends that cannot be obtained when data is in a table or
CSV file.

The process of finding trends and correlations in our data by representing it pictorially is
called Data Visualization. To perform data visualization in python, we can use various
python data visualization modules such as Matplotlib, Seaborn, Plotly, etc. In this article, The
Complete Guide to Data Visualization in Python, we will discuss how to work with some of
these modules for data visualization in python and cover the following topics in detail.

• Data Visualization in Python

• Matplotlib and Seaborn

• Line Charts

• Bar Graphs

• Histograms

• Scatter Plots

• Heat Maps

2.SYSTEM STUDY

2.1 EXISTING SYSTEM


The existing systems is Prior systems needed lot of human efforts time.Cost of hiring is
high.we should guess the outcome of rainfall with knowledge and experience of our own.we
may guess wrong in rainfall it will lead to inconvients.

2.1.1 DRAWBACK OF EXISTING SYSTEM

1.We should due lot of research work manually.

2.we will have more mental pressure in the existing system.

2.2 PROPOSED SYSTEM

1. The proposed system uses the data of previous Rainfall data results which we have been
given for system as input and predict the rainfall for us.

2. Machine learning use the Previous data of rainfall which was given by user/develop to
predict the future outcome of the rainfall forecasting .

3.Predict of this system may be come true or not but it helps us assitant the rainfall
forecasting prediction for user.

2.2.1 ADVANTAGE OF PROPOSED SYSTEM

1. It helps us to predict the stock market in advance.

2. Benifits of ML.

 Fast and Accurate


 Perfect For the New World of Social Recruiting
 Customizes to your Needs
 Gets Smarter

3. SYSTEM SPECIFICATION

3.1 HARDWARE SPECIFICATION

Processor: Intel Pentium 4

Speed: 1.2 GHz


RAM: 256MB

Hard disk: 80GB

Keyboard: Samsung

Mouse: Logitech

Printer: Lexmark inkjet printer

Monitor: 17 inch Samsung

3.2 SOFTWARE SPECIFICATION

Front end: Jupyter using PYTHON

Back end: DataSet(CSV)

Operating system: Windows 7/8/10/11

3.2.1 ABOUT THE SOFTWARE

Python is an interpreted, object-oriented, high-level programming language with


dynamic semantics. Its high-level built in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development, as well as for
use as a scripting or glue language to connect existing components together. Python's simple,
easy to learn syntax emphasizes readability and therefore reduces the cost of program
maintenance. Python supports modules and packages, which encourages program modularity
and code reuse. The Python interpreter and the extensive standard library are available in
source or binary form without charge for all major platforms, and can be freely distributed.

Often, programmers fall in love with Python because of the increased productivity it
provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation
fault. Instead, when the interpreter discovers an error, it raises an exception. When the
program doesn't catch the exception, the interpreter prints a stack trace. A source level
debugger allows inspection of local and global variables, evaluation of arbitrary expressions,
setting breakpoints, stepping through the code a line at a time, and so on. The debugger is
written in Python itself, testifying to Python's introspective power. On the other hand, often
the quickest way to debug a program is to add a few print statements to the source: the fast
edit-test-debug cycle makes this simple approach very effective.

• Python 3.7: Python is an interpreted, high level, general programming

• language. Its formatting is visually uncluttered, and it often uses English

keywords where other languages use punctuation. It provides a vast library for data mining
and predictions.

• Jupiter Notebook/ Spider/ PyCharm: It is an open source cross-platform integrated


development environment (IDE) for scientific programming in the Python language. Spyder
integrates with a number of prominent packages as well as another open-source software.

• NumPy: NumPy was used for building the front-end part of the system.

• Pandas: Pandas was used for the data pre-processing and statistical analysis of data.

• Matplotlib: Matplotlib was used for the graphical representation of our prediction.

III. WHAT IS DATA SCIENCE

AI is an utilization of man-made reasoning (AI) that gives frameworks the capacity to


naturally take in and improve as a matter of fact without being unequivocally programmed.
Machine learning centers around the advancement of PC programs that can get to information
and use it learn for themselves. The way toward learning starts with perceptions or
information, for example, precedents, direct involvement, or guidance, so as to search for
examples in information and settle on better choices later on dependent on the models that we
provide. The essential point is to permit the PCs learn automatically without human
intercession or help and modify activities likewise.

a. Machine Learning Methods

AI calculations are frequently ordered as supervised or unsupervised. Supervised AI


algorithms can apply what has been realized in the past to new information utilizing named
guides to foresee future occasions. Beginning from the investigation of a known preparing
dataset, the learning calculation creates an induced capacity to make forecasts about the yield
esteems. The framework can give focuses to any new contribution after adequate preparing.
The learning calculation can likewise contrast its yield and the right, planned yield and
discover mistakes so as to alter the model in like manner.

i. In contrast, unsupervised AI algorithms are utilized when the data used to prepare is neither
ordered nor marked. Unsupervised learning thinks about how frameworks can derive a
capacity to portray a concealed structure from unlabeled information. The framework doesn't
make sense of the correct yield, however it investigates the information and can draw
deductions from datasets to depict concealed structures from unlabeled information. ii. Semi-
supervised AI calculations fall some place in the middle of administered and unsupervised
learning, since they utilize both marked and unlabeled information for preparing – regularly a
little measure of named information and a lot of unlabeled information. The frameworks that
utilization this strategy can extensively improve learning precision. More often than not,
semidirected learning is picked when the obtained named information requires talented and
applicable assets so as to prepareit/gain from it. Something else, acquiring unlabeled
information by and large doesn't require extra assets.

iii. Reinforcement AI calculations is a learning technique that communicates with its


condition by creating activities and finds mistakes or rewards. Experimentation seek and
deferred compensate are the most applicable qualities of support learning. This technique
enables machines and programming operators to consequently decide the perfect conduct
inside a particular setting so as to boost its execution. Straightforward reward criticism is
required for the operator to realize which activity is ideal; this is known as the fortification
flag.

Machine learning empowers investigation of enormous amounts of information. While it for


the most part conveys quicker, progressively exact outcomes so as to recognize productive
chances or perilous dangers, it might likewise require extra time and assets to prepare it
appropriately. Consolidating AI with AI and subjective advancements can make it
significantly progressively viable in handling substantial volumes of data.

PREDICTION WITH MACHINE LEARNING


1)Data Preparation

In order to carry out the study, a set of operations was carried out to prepare the data:
Missing data

Thus, the total missing data correspond to 10% of the analyzed data (considering
× the
total of 140,787 samples 22 variables). Thus, if the samples that do not have data in any of
their variables are eliminated, then approximately 50% of the samples would have to be
eliminated. That is why the variables for which there are no data were analyzed, separating
them by cities in order not to discard a large amount of data (the total number of cities for
which data was available is 49). The results of the analysis of variables for which there are no
data are as follows:

 There are variables •that do not have a single piece of data in some of the cities. It is
considered that this is because the corresponding sensor does not exist in the
meteorological station of that city. For this case, the absent values werewere replaced by

the monthly average [35] of said variable considering all the cities. There are samples for
which there is no data for some of the variables. The reason could be a failure of the
sensors or communication with them. Likewise, in this case, it was found that there are
two different situations: data loss for one day only and data loss for several consecutive
days. For this situation, it wasdecided to substitute the missing values for the monthly
average [35] of said variable in the corresponding city.

 Finally, in the case• of the objective variable, RainToday, the decision was made to
eliminate the samples in which the variable does not exist (an “NA” appears). In this
sense, 1% of the data samples were deleted, leaving a total of 140,787 samples that
contain a value other than “NA” in RainToday.

 Conversion of categorical variables to numeric. It was necessary to carry out this


operation on two sets of variables. On the one hand, for the wind direction and for the
Boolean, this indicates whether there is rain or not. In the first case, the variables that
indicate the wind direction (WindGustDir, WindDir9am, and WindDir3pm) are of type
“string” and must be transformed to be used. These variables can take 16 dif- ferent
directions, so that, in order to convert these values to real numbers, it must be taken into
account that it presents a circular distribution. That is why each of the variables was split
into two, one with the sine and the other with the cosine of the angle: WindGustDir_Sin,
WindGustDir_Cos, WindDir9am_Sin, WindDir9am_Cos, WindDir3pm_Sin, and
WindDir3pm_Cos. With respect to the second case, the vari- ables of type “string” that
represent Booleans (they take YES/NO) are transformed into the numerical values “1/0”.
Elimination of variables

The date and location variables were eliminated, since they contain information that
can be explained using other variables (the data correspond- ing to each location has not been
separated to construct independent subsets, since the data in the region as a whole was to be
studied). For example, there are cities that, depending on their locations, have humidity and
temperature conditions that more or less favor rain. Likewise, on the date the data is obtained,
different meteorological conditions may occur that influence the rain.

Data normalization

A normalization of the data of the mix–max type [35] was carried out so that all the
variables would take values between 0 and 1. In this way, variables taking values of great
magnitudes having a greater influence on the application of machine learning algorithms was
avoided.

Detection of outliers

For this, the “Z-score” formula [36] was used, and all those samples that have Z > 3
were discarded. As a result, 7% of the data was removed from 140,787 samples to make
131,086.

2)Correlation Analysis

Following the pre-processing of the data, a descriptive analysis of the variables was carried
out in order to find out the form of the data to be analyzed. In particular, it was to carry out a
study of the correlation that existed between the variable “RainToday” and the rest of the
variables was studied.

The following relationships with the variables can be observed (in those cases where the
correlation value is greater than 0.2, it was decided to eliminate the variable):

 Rainfall: It is a variable that indicates the rain that has fallen (in mm), so it is a direct
measure that indicates whether it has rained or not. For this reason, it was decided to
eliminate rainfall from the set of variables to be used.

 RISK_MM: It is a variable that indicates the risk of rain. This variable was eliminated
as it is not a measure of something physical since its value has probably been obtained by
applying some non-detailed prediction model in the dataset.
 Humidity (Humidity3pm and Humidity9am). It is reasonable that they are related,
since the higher the humidity, the greater the possibility of rain.Cloudiness
(Cloud9am and Cloud3pm). It is reasonable that they are related because the greater the
number of clouds, the more likely it is that it will rain more.In other variables, it is
observed that there was an inverse relationship. So, by increas- ing the values of these
variables, the possibility of rain is reduced. This happens with the variables Temp3pm
and MaxTemp (an increase in temperature does not favor condensation) and Sunshine
(an increase in radiation from the Sun would be directly related to a less cloudy day, and,
therefore, there would be less rain).

3)Results of Data Preprocessing


The results of the data preprocessing is as follows:
Initially there were 24 variables (‘Date’, ‘Location’, ‘MinTemp’, ‘MaxTemp’, ‘Rain-
fall’, ‘Evaporation’, ‘Sunshine’, ‘WindGustDir’, ‘WindGustSpeed’, ‘WindDir9am’,
‘WindDir3pm’, ‘WindSpeed9am’, ‘WindSpeed3pm’, ‘Humidity9am’, ‘Humidity3pm ’,
‘Pressure9am’, ‘Pressure3pm’, ‘Cloud9am’, ‘Cloud3pm’, ‘Temp9am’, ‘Temp3pm’, ‘Rain-
Today’, ‘RISK_MM’, ‘RainTomorrow’) with a total of 142,193 samples corresponding to
measurements of rainfall and atmospheric conditions produced in 49 cities in tamilnadu over
10 years.
The following variables were eliminated: Location/Date (eliminated because they are string
variables), Rainfall (eliminated because they are highly related to the RainToday variable),
RISK_MM (artificial variable obtained to predict the rain), RainTomorrow (removed
because it is a variable artificial obtained from RISK_MM), and the vari- ables
WindGustDir, WindDir9am and WindDir3pm (each is split into two variables
containing the cosines and sines of the wind direction angles).
 The samples that
• had “NA” in the RainToday variable were eliminated (it reduced
the number from 142,193 samples to 140,787), and the samples that represent outliers
were also eliminated (it reduced the number from 140,787 samples to 131,086)
 As a result, 21• variables were obtained (‘MinTemp’, ‘MaxTemp’, ‘Evaporation’,
‘Sun- shine’, ‘WindGustSpeed’, ‘WindSpeed9am’, ‘WindSpeed3pm’,
‘Humidity9am’, ‘Humid- ity3pm’, ‘Pressure9am’, ‘Pressure3pm’, ‘Cloud9am’,
‘Cloud3pm’, ‘Temp9am’, ‘RainTo- day’, ‘Temp3pm’, ‘WindGustDir_Cos’,
‘WindGustDir_Sin’, ‘WindDir9am_Cos’, ‘Wind- Dir9am_Sin’,
‘WindDir3pm_Cos’, ‘WindDir3pm_Sin’) with a total of 131,086 samples.
Note that of these 131,086 samples, 75% of the data is used to train the models with the
different algorithms and the remaining 25% to check the effectiveness of these models.
4.SYSTEM DESIGN

4.1 DATA FLOW DIAGRAM (BLOCK DIAGRAM)

4.2 DATABASE DESIGN


4.3 INPUT DESIGN

Input design is the process of converting user-originated inputs to a computer-based


format. Input design is one of the most expensive phases of the operation of computerized
system and is often the major problem of a system.

In the project, the input design is made in various window forms with various methods.

 RAINFALL DETAIL

4.4 OUTPUT DESIGN

Output design generally refers to the results and information that are generated by the
system for many end-users; output is the main reason for developing the system and the basis
on which they evaluate the usefulness of the application. In any system, the output design
determines the input to be given to the application.

In the project, the output design is made in various window forms with various methods.

 View Graph
 RESULTS
5.SYSTEM TESTING & IMPLEMENTATION

5.1 SYSTEM TESTING

Testing methodologies:

THE TERM SYSTEM TESTING CAN BE USED IN A NUMBER OF WAYS. IN A GENERAL

SENSE, THE TERM ‘SYSTEM TESTING’ REFERS TO THE TESTING OF THE SYSTEM IN ARTIFICIAL

CONDITION TO ENSURE THAT IT SHOULD PERFORM AS EXPECTED AND AS REQUIRED.

FROM A SYSTEM DEVELOPMENT PERSPECTIVE, SYSTEM TESTING REFERS TO THE

TESTING PERFORMED BY THE DEVELOPMENT TEAM (THE PROGRAMMERS AND OTHER

TECHNICIANS) TO ENSURE THAT THE SYSTEM WORKS MODULE BY MODULE (‘UNIT TESTING’)

AND ALSO AS A WHOLE. SYSTEM TESTING SHOULD ENSURE THAT EACH FUNCTION OF THE

SYSTEM WORKS AS EXPECTED AND THAT ANY ERRORS (BUGS) ARE NOTED AND ANALYZED. IT

SHOULD ADDITIONALLY ENSURE THAT INTERFACE FOR EXPORT AND IMPORT ROUTINES,

FUNCTION AS REQUIRED. SYSTEM TESTING DOES NOT CONCERN ITSELF WITH THE

FUNCTIONALITY OF THE SYSTEM AND WHETHER THIS IS APPROPRIATE TO MEET THE NEEDS OF

THE USERS. HAVING MET THE CRITERIA OF THE TEST PLAN THE SOFTWARE MAY THEN BE

PASSED FOR USER ACCEPTANCE TESTING.

THE VARIOUS TESTING METHODOLOGIES PERFORMED FOR THIS SYSTEM IS:


 UNIT TESTING

 Integration Testing

 White Box Testing

 Black Box Testing

Unit testing

In computer programming, a unit test is a procedure used to validate that a particular


module of source code is working properly. The idea about unit test is to write test cases for
all functions and methods so that whenever a change causes a regression, it can be quickly
identified and fixed. Ideally, each test case is separate from the others; constructs such as
mock object can assist in separating unit tests. This type of testing is mostly done by the
developers and not by end-users.

The goal of unit testing is to isolate each part of the program and show that the individual
parts are correct. Unit testing provides a strict, written contract that the piece of code must
satisfy. As a result, it affords several benefits. The goal of unit testing is to isolate each part
of the program and show that the individual parts are correct. Unit testing provides a strict,
written contract that the piece of code must satisfy. As a result, it affords several benefits and
allowed to correct the following errors.

1. Mixed mode operations

2. Incorrect initialization

3. Incorrect symbolic representation of the expression

4. Simplified integration

5. Facilitated for the various changes made to the system

IntegrationTesting:

Integration testing can proceed in a number of different ways, which can be broadly
characterized as top down or bottom up. In top down integration testing the high level
control routines is tested first, possibly with the middle level control structure present only as
stubs. Subprogram stubs are incomplete subprograms which are only present to allow the
higher level control routines to be tested.

Top down testing can proceed in a depth-first or breadth-first manner. For depth-first
integration each module is tested in increasing detail, replacing more and more levels of
details with actual code rather than stubs. Alternatively breadth-first would proceed by
refining all the modules at the same level of the control throughout the application.

In practice a combination of the two techniques would be used. At the initial stage all
the modules might be only partly functional, possibly being implemented only to deal with
non-erroneous data. These would be tested in breadth-first manner, but over a period of time
each would be replaced with successive refinements which were closer to the full
functionality. This allows depth-first testing of a module to be performed simultaneously with
breadth-first testing of all the modules.

The other major category of integration testing is bottom up integration testing where
an individual module is tested from a test harness. Once a set of individual modules have
been tested they are then combined into a collection of modules, know as builds, which are
then by a second test harness. This process can continue until the build consists of the entire
application.

This second approach is used in this project where the individual modules that are-
Mobile Call Status, Mobile Time Retrieval and Internet connectivity are first developed and
then later they were integrated into one application and tested for the results.

White Box Testing:

White box testing is testing from the inside--tests that go in and test actual program
structure.

Basis path testing: Very simply, test every statement in the program at least once. You’ll
note that the testing department at FCC chose test cases that did this; the entire execution tree
was covered.

Basis path testing is MANDATORY--so much so that there are software products written
especially to assist in it.
 Profiling: there are a lot of tools--often included with compilers--which show where
the CPU is spending most of its time in a program. Naturally, the busiest parts of the
program are the ones you want to test most.

 Loop tests: Exercise each DO, WHILE, FOR and other repeating statements several
times.

 Input tests: as the old saying goes-- garbage out, garbage out. If a procedure receives
the wrong data, it’s not going to work. Each procedure should be tested to make
certain that the procedure actually received the data you sent to it. This will spot type
mismatches, bad pointers, and other such bugs.

Here in this project each decision path is check and all the loops are executed separately to
ensure that the program is logically correct and has exited right time

Black Box Testing:

Black box testing, concrete box or functional testing is used in computer


programming, software engineering and software testing to check that the outputs of a
program, given certain inputs, conform to the functional specification of the program.

The term black box indicates that the internal implementation of the program being
executed is not examined by the tester. For this reason black box testing is not normally
carried out by the programmer. In most real-world engineering firms, one group does design
work while a separate group does the testing.

Boundary value analysis is a technique of black box testing in which input values at
the boundaries of the input domain are tested. it has been widely recognized that input values
at the extreme ends of, and just outside of, input domains tend to the cause errors in system
functionality.

In boundary value analysis, value at and just beyond boundaries of the input domain are
used to generate test cases to ensure proper functionality of the system.

Advantages of Black Box Testing

 More effective on larger units of code than glass box testing

 Testing needs no knowledge of implementation, including specific programming


language
 Tester and programmer are independent of each other

 Test are done from a user’s point of view

 Will help to expose any ambiguities or inconsistencies in the specifications

 Test cases can be designed as soon as the specifications are complete

In this project all the function are tested to check whether all of them are working
properly. The performance rate is verified by considering response time and speed. Hence the
error are identified and corrected.

QUALITY ASSURANCE

Quality assurance comprises all those planned and systematic

actions necessary to provide confidence that a structure, system or component will


perform satisfactorily is service.

Quality assurance includes formal view of care, problem definition, corrective actions to
remedy any deficiencies and evaluation of actions that to be taken.

The function of software quality that assures that the standards, processes, and
procedures are appropriate for the project and are correctly implemented. This is an
“umbrella activity” that is applied throughout the engineering process. Quality software is
reasonably bug-free, delivered on time and within budget, meets requirements and/or
expectations, and is maintainable.

The system is developed such that it ensures all the level of quality. It checks whether a
user friendly environment is provided to the users and that there is a reliable, accurate and
efficient flow of data within the system. The system also checks that due it contains the level
of security required for the user. Hence as long as there is no hardware complaint, there is no
problem with the software.

5.2 SYSTEM IMPLEMENTATION

Plan:

Implementation is the state in the project where the theoretical design is turned into a
working system. The most crucial stage in achieving a new successful system and giving
confidence on the new system for the users that will work efficiently and effectively. The
system is implemented only after thorough testing is done and if it is found to work according
to the specification.

It involves careful planning, investigation of the current system and is constraints on


implementation, design of methods to achieve changeover, and evaluation of the changeover
methods apart from planning. Two major tasks for preparing the implementation are
educating, training the users and testing the system.

Implementation plan preparation

The implementation process begins with the preparation of plan for implementation.
According to this plan other activities are carried out. In this plan discussion has been made
regarding the equipment, resources and how to test the activities. Thus a clear planner
prepared for the activities.

Equipment Acquisition

According to the above plan the necessary equipment have to be acquired to implement
the new system, which would include all the requirements for installing and maintaining .Net
framework, VB.net, SQL server,

Program code preparation

One of the most important development activities is coding or programming. The system
flowcharts and other charts are converted into modular programs. They have to be compiled,
tested and debugged.

User training and documentation

Once the planning has been completed the major effort in the computer department is
that the user department must consist of educated and trained staff as the system becomes
more complex. The success of the system depends upon how they are operated and used the
system.

Thus the quality of training the personnel is connected to the success of the system.
Implementation depends upon the right people being trained at the right time. Education
involves creating the right atmosphere and motivating the user. Staff education should
encourage the participation of all the staff.

Changeover

Changeover is the change of moving over from the old system to the new computerized
system. In order that this is done all the files have to be converted to the new format. The
accuracy of the conversion is of utmost importance both to user confidence in the system and
to effective operation. When the files have been set up on the computer, the changeover can
take place. There are several possible methods of doing this.

E.g. direct changeover, parallel running, pilot running, and staged changeover.

This method is the complete replacement of the old system by new, in one move.
When direct changeover is planned, system tests and training should be comprehensive and
changeover itself is planned in detail.

Parallel Running:

Parallel running or operation means processing current data by both the old and new
systems to cross check the results.

The old system is kept alive and operational until the system has been proved for at least
one system cycle, using full live data in the operational environment of place, people,
equipment and time. It allows the result of the new system to be compared with the old
system before the acceptance by the user. Parallel operation does not allow much time or
learning and testing activities.

Staged Changeover:

A staged changeover involves a series of limited size direct changeovers. The new
system being introduced piece by piece. A complete start, a logical section is committed to
the new system while the remaining parts or sections will be processed by the old system.

In this project, direct changeover is applied where the entire system is implemented
directly after it has been developed.

SYSTEM MAINTENANCE:

Maintenance
The term “Software Maintenance” is used to describe software engineering activities.
Maintenance activities involve making enhancements to software products, adapting to new
environments and correcting problems. Software product enhancements may involve
providing new functional capabilities, improving user displays and nodes of interaction,
upgrading external documents and internal documentation or upgrading the performance
characteristics of a system. Adaptation of software to a new environment may involve
moving the software to a different machine, or for instance, modifying the software to
accommodate a new telecommunication protocol or an additional disk drives. Problem
correction involves modification and revalidation of software to correct errors.

Many activities performed during software development enhance the maintainability


of a software product. They are:-

Analysis activities:

The analysis phase of software development is concerned with determining customer


requirements and constraints and establishing feasibility of the product.

 Develop standards and guidelines

 Set milestones for supporting documents

 Specify quality assurance procedures

 Identify likely product enhancements

 Determine resources required for maintenance

 Estimate maintenance costs

Architectural Design Activities:

 Emphasize clarity and modularity as design criteria

 Design to ease likely enhancement

 Use standardized notations to document, data flows, functions, structure and


interconnections

 Observe the principles of information hiding, data abstraction and top-down


hierarchical decomposition
Detailed Design Activities

 Use standardized notations to specify algorithms, data structures and


procedure interface specifications

 Specify side effects and exception handling for each routine

Implementation activitie:s

 Use single entry, single exit constructs

 Use standard indentation of constructs

 Use simple, clear coding style

 Use symbolic constants to parameterize routines

 Provide margins on resources

 Provide standard documentation

 Follow standard internal commenting guidelines

Other activities:

 Develop a maintenance guide

 Develop a test suite

 Provide test suite documentation

6.CONCLUSION AND FUTURE ENHANCEMENT

In this work, the applicability of machine learning techniques to the problem of


rainfall forecasting in the specific case ofTamil Nadu was studied. There are previous works
[28–34] that have applied this type of technique in regions other thanTamil Nadu and with
different types of monthly, annual, and other time period datasets. The locations that were
studied were the regions of chennai and coimbatore , and the models used were, generally,
Neural Networks and Random Forest. In line with these previous studies, in this work, a set
of meteorological data from 38 different Districts inTamil Nadu was taken. In the set, there
is a variable (RainToday) that indicates whether or not it has rained on the day of taking the
sample, and there are also other variables that show meteorological properties on the day of
taking the sample, such as cloudiness, wind, sunlight, humidity, pressure, or temperature.
From the preprocessed dataset, several prediction models based on machine learning
techniques were applied to predict rainfall (Knn, Decision Tree, Random Forest, and Neural
Networks). As a result, it was found that the best model to describe this type of phenomenon
is neural networks. Likewise, the applicability of the models to various cities was analyzed
independently. In this case, it was observed that the efficiency of the algorithms was higher.
Finally, the possible improvement of the results by modifying the data used to carry out the
training (quantity of data and actual values) was studied but without improvement compared
with previous analyses. Therefore, this new study allowed me to conclude several ideas. On
the one hand, the possibilities offered by machine learning techniques as alternative tools to
classical rain forecasting methods (they also have some advantages over classical forecasting
methods, such as the possibility of estimating the reliability of the results using the Indicators,
Performance Key, or the possibility of adjusting the performance of the algorithms by
manipulating their input parameters, which allows them to be adapted to particular cases).
Likewise, it can be seen that algorithms based on Neural Networks work quite well to model
nonlinear natural phenomena. Finally, the locality of the phenomenon can be observed, since,
by considering the data independently by city, the algorithms work and are more efficient.

The work can be continued in several ways. Thus, it would be interesting to check the
results obtained considering the meteorological information from 2019 to the present, as well
as the analysis of data from other countries. The latter would allow me to check whether the
efficiency obtained can be extrapolated to other geographical areas and there is no
geographical dependence on the results. Finally, another very interesting future study related
to the one described would be to study the problem of predicting, several days in advance,
which models are the most interesting or how many days in advance are optimal for making a
prediction.
BIBLIOGRAPHY

Reference:

1.Datta, A.; Si, S.; Biswas, S. Complete Statistical Analysis to Weather Forecasting. In
Computational Intelligence in Pattern Recognition; Springer: Singapore, 2020; pp. 751–763.

2.Burlando, P.; Montanari, A.; Ranzi, R. Forecasting of storm rainfall by combined use of
radar, rain gages and linear models. Atmos. Res. 1996, 42, 199–216. [CrossRef]

3.Valipour, M. How much meteorological information is necessary to achieve reliable


accuracy for rainfall estimations? Agriculture

2016, 6, 53. [CrossRef]

4.Murphy, A.H.; Winkler, R.L. Probability forecasting in meteorology. J. Am. Stat. Assoc.
1984, 79, 489–500. [CrossRef]

5.Jolliffe, I.T.; Stephenson, D.B. (Eds.) Forecast Verification: A Practitioner’s Guide in


Atmospheric Science; John Wiley & Sons: Hoboken, NJ, USA, 2012.

6.Wu, J.; Huang, L.; Pan, X. A novel bayesian additive regression trees ensemble model
based on linear regression and nonlinear regression for torrential rain forecasting. In
Proceedings of the 2010 Third International Joint Conference on Computational Science and
Optimization, Huangshan, China, 28–31 May 2010; Volume 2, pp. 466–470.

7.Tanessong, R.S.; Vondou, D.A.; Igri, P.M.; Kamga, F.M. Bayesian processor of output for
probabilistic quantitative precipitation forecast over central and West Africa. Atmos. Clim.
Sci. 2017, 7, 263. [CrossRef]

8.Georgakakos, K.P.; Hudlow, M.D. Quantitative precipitation forecast techniques for use in
hydrologic forecasting. Bull. Am. Meteorol. Soc. 1984, 65, 1186–1200. [CrossRef]

9.Migon, H.S.; Monteiro, A.B.S. Rain-fall modeling: An application of Bayesian forecasting.


Stoch. Hydrol. Hydraul. 1997, 11, 115–127. [CrossRef]
10.Wu, J. An effective hybrid semi-parametric regression strategy for rainfall forecasting
combining linear and nonlinear regression. In Modeling Applications and Theoretical
Innovations in Interdisciplinary Evolutionary Computation; IGI Global: New York, NY,
USA, 2013; pp. 273–289.

11.Wu, J. A novel nonlinear ensemble rainfall forecasting model incorporating linear and
nonlinear regression. In Proceedings of the 2008 Fourth International Conference on Natural
Computation, Jinan, China, 18–20 October 2008; Volume 3, pp. 34–38.

12.Zhang, C.J.; Zeng, J.; Wang, H.Y.; Ma, L.M.; Chu, H. Correction model for rainfall
forecasts using the LSTM with multiple meteorological factors. Meteorol. Appl. 2020, 27,
e1852. [CrossRef]

13.Liguori, S.; Rico-Ramirez, M.A.; Schellart, A.N.A.; Saul, A.J. Using probabilistic radar
rainfall nowcasts and NWP forecasts for flow prediction in urban catchments. Atmos. Res.
2012, 103, 80–95. [CrossRef]

14.Koussis, A.D.; Lagouvardos, K.; Mazi, K.; Kotroni, V.; Sitzmann, D.; Lang, J.; Malguzzi,
P. Flood forecasts for urban basin with integrated hydro-meteorological model. J. Hydrol.
Eng. 2003, 8, 1–11. [CrossRef]

15.Yasar, A.; Bilgili, M.; Simsek, E. Water demand forecasting based on stepwise multiple
nonlinear regression analysis. Arab. J. Sci. Eng. 2012, 37, 2333–2341. [CrossRef]

16.Holmstrom, M.; Liu, D.; Vo, C. Machine learning applied to weather forecasting.
Meteorol. Appl. 2016, 10, 1–5.

17.Singh, N.; Chaturvedi, S.; Akhter, S. Weather forecasting using machine learning
algorithm. In Proceedings of the 2019 International Conference on Signal Processing and
Communication (ICSC), Noida, India, 7–9 March 2019; pp. 171–174.

18.Hasan, N.; Uddin, M.T.; Chowdhury, N.K. Automated weather event analysis with
machine learning. In Proceedings of the 2016 International Conference on Innovations in
Science, Engineering and Technology (ICISET), Dhaka, Bangladesh, 28–29 October 2016;
pp. 1–5.

19.Balamurugan, M.S.; Manojkumar, R. Study of short term rain forecasting using machine
learning based approach. Wirel. Netw.

2021, 27, 5429–5434. [CrossRef]


20.Booz, J.; Yu, W.; Xu, G.; Griffith, D.; Golmie, N. A deep learning-based weather forecast
system for data volume and recency analysis. In Proceedings of the 2019 International
Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA,
18–21 February 2019; pp. 697–701.C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b1 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b9 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b15C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b3 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b5 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b6 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b7C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b8 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b24 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b10 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b12C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b16 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b21 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b22 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b23C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b25 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b26 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b27 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
table.3C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b3 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b5 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b17 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b21C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b22 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b23 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b26 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b1C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b5 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b6 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b8 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b10C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b11 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b16 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b22 C:\home\zorin\
Documents\x\home\zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf -
cite.b24C:\home\zorin\Documents\x\home\zorin\Downloads\download files\
MBW_Stock_Market_Prediction.pdf - cite.b25 C:\home\zorin\Documents\x\home\zorin\Downloads\
download files\MBW_Stock_Market_Prediction.pdf - cite.b26 C:\home\zorin\Documents\x\home\
zorin\Downloads\download files\MBW_Stock_Market_Prediction.pdf - cite.b27
APPENDIX
A) SCREENSHOTS
B) REPORT

C) SAMPLE CODING

port pandas as pd
import numpy as np
import sklearn as sk
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
data = pd.read_csv("austin_final.csv")
X = data.drop(['PrecipitationSumInches'], axis = 1)
 Y = data['PrecipitationSumInches']
Y = Y.values.reshape(-1, 1)
day_index = 798
days = [i for i in range(Y.size)]
clf = LinearRegression()
clf.fit(X, Y)
inp = np.array([[74], [60], [45], [67], [49], [43], [33], [45],
                [57], [29.68], [10], [7], [2], [0], [20],
[4], [31]])
inp = inp.reshape(1, -1)
print('The precipitation in inches for the input is:', clf.predict(inp))
  
print("the precipitation trend graph: ")
plt.scatter(days, Y, color = 'g')
plt.scatter(days[day_index], Y[day_index], color ='r')
plt.title("Precipitation level")
plt.xlabel("Days")
plt.ylabel("Precipitation in inches")
plt.show()
x_vis = X.filter(['TempAvgF', 'DewPointAvgF', 'HumidityAvgPercent',
                  'SeaLevelPressureAvgInches',
'VisibilityAvgMiles',
                  'WindAvgMPH'], axis = 1)
print("Precipitation vs selected attributes graph: ")
  
for i in range(x_vis.columns.size):
    plt.subplot(3, 2, i + 1)
    plt.scatter(days, x_vis[x_vis.columns.values[i][:100]],
                                      
         color = 'g')
  
    plt.scatter(days[day_index], 
                x_vis[x_vis.columns.values[i]][day_index],
                color ='r')
  
    plt.title(x_vis.columns.values[i])
  
plt.show()

You might also like