0% found this document useful (0 votes)
95 views7 pages

Rainfall Prediction System: (Peer-Reviewed, Open Access, Fully Refereed International Journal)

This document summarizes a research paper on rainfall prediction using machine learning. It discusses how accurate rainfall forecasting is important for agriculture, construction and other sectors. The paper aims to optimize rainfall prediction results using machine learning models like logistic regression, support vector classification, random forest, catboost and xgboost. It will compare the performance of these algorithms based on parameters like accuracy. The proposed system seeks to develop a web-based rainfall forecasting tool that predicts rainfall a few days in advance at a location, using algorithms like SVR, ANN and regression. It will analyze data patterns and compare prediction results of different ML algorithms.

Uploaded by

shravani kadam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views7 pages

Rainfall Prediction System: (Peer-Reviewed, Open Access, Fully Refereed International Journal)

This document summarizes a research paper on rainfall prediction using machine learning. It discusses how accurate rainfall forecasting is important for agriculture, construction and other sectors. The paper aims to optimize rainfall prediction results using machine learning models like logistic regression, support vector classification, random forest, catboost and xgboost. It will compare the performance of these algorithms based on parameters like accuracy. The proposed system seeks to develop a web-based rainfall forecasting tool that predicts rainfall a few days in advance at a location, using algorithms like SVR, ANN and regression. It will analyze data patterns and compare prediction results of different ML algorithms.

Uploaded by

shravani kadam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and


Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com

Rainfall Prediction System


Shravani Kadam *1, Dhanesh Salgaonkar *2, Bhishmesh Chaudhari *3, Santosh Tamboli*4
*1 *2 *3 Student, Department of Information Technology, Vidyalankar Institute of Technology, Mumbai,
Maharashtra, India
*4 Asst. Prof., Department of Information Technology, Vidyalankar Institute of Technology, Mumbai,
Maharashtra, India

ABSTRACT
In modern times, global warming is affecting the entire world, having a major impact on humankind and accelerating
climate change. As a result, the atmosphere and ocean are warming, sea levels are rising, and floods and droughts are
occurring. One of the major consequences of this is uneven rainfall / precipitation. Precipitation forecasting now-a-days
is a tedious task that is being considered by most of the major world authorities. Precipitation is a climatic factor that
affects the various human activities on which they depend. Like Agricultural production, construction, energy
production and tourism. This makes rainfall a serious problem and requires better rainfall forecasts. For these various
reasons, accurate forecasting of rainfall is paramount. There are many ways to predict it, but the one that is chosen for
this project is to observe and collect previous year's rainfall data which is collected over 10 years of rainfall
measurements. Next, we forecast rainfall for the next day. Therefore, this project seeks to optimize results and find
suitable machine learning models for predicting rainfall. We will also compare the machine learning algorithms and
methods used in machine learning. Methodologies include logistic regression, support vector classification, random
forest, catboost, and xgboost. Compare each of these algorithms individually, taking into account the parameters that
apply to machine learning, such as accuracy.
Keywords: accuracy, forecasting, machine learning algorithms, rainfall.
I. INTRODUCTION
Rainfall forecasting is one of the challenging and uncertain tasks that has a major impact on human society. Timely and
accurate forecasts help you proactively reduce human and financial losses. Heavy rain forecasts are a big problem for
the Meteorological Bureau because they are closely related to the economy and people's lives. This is the cause of natural
disasters such as floods and droughts that people all around the world face each year. As global warming progresses,
rainfall detection and prediction has become a major issue in countries where appropriate technology is not available,
and if done correctly, it can serve a variety of purposes such as agriculture, health and drinking. The accuracy of
precipitation forecasts is very important where the economy is heavily dependent on agriculture. Due to the dynamic
nature of the atmosphere, statistical methods do not provide excellent accuracy in predicting precipitation. The non-
linearity of precipitation data makes machine learning and AI better techniques. Predictions help people take
precautions, and predictions must be accurate. The purpose of this project is to provide non-experts with easy access to
the techniques and approaches used in the field of precipitation prediction, to provide comparative studies between
various machine learning techniques, and to provide early precipitation prediction and to identify the best machine
learning algorithm for you. The objective of this system is to understand & analyze the data collected for rainfall for all
the states and derive conclusive information & statistics regarding the rainfall pattern across the country as well as to
design an interactive and user-friendly web interface for the system for user’s ease of understanding the results obtained
from the input data provided.
II. LITERATURE SURVEY
A. Kala and Dr. S. Ganesh Vaidyanathan in [1] have used Artificial Neural Network for implementing their idea. Their
procedure for rainfall prediction includes gathering the weather data, then preprocessing it, building the Feed Forward
Neural Network (FFNN) model with training data and then validating it with testing data and in the end evaluating the
model by comparing desired and actual output. There was 0.935 accuracy using Artificial Neural Network. R. Kingsy
Grace and B. Suganya in their paper has said that they have implemented their model using machine learning [2]. They
have compared various models such as Deep Convolutional Neural Network, Genetic Programming, ANN, Linear
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and
Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
Regression, Hybrid Neural Network, Likelihood, LSTM and ConvNet. There was 99% accuracy of the Multiple Linear
Regression. Hiyam Abobaker Yousif Ahmed and Sondos W. A. Mohamed in [3] have implemented their model using
linear regression. Their whole idea is done in 2 major parts i.e., first is data collection and selection and second is data
cleaning and transformation. There was 85% accuracy given by the Multiple Linear Regression model. CMAK Zeelan
Basha, Nagulla Bhavana, Ponduru Bhavya, Sowmya V. has implemented a model on the same topic using machine
learning and deep learning techniques [4]. They have taken the help of Auto-Regressive Integrated Moving Average
(AREMA Model), Artificial Neural Network, Support Vector Machine and Self Organizing Map. The outcomes intend
that in terms of MSE and RMSE, their architecture outruns other approaches. B. Vasantha, R. Tamilkodi, L.
Venkateswara Kiran in [5] have forecasted the rainfall applying actual time global climate parameters. They have mainly
used convolution neural systems to anticipate weather parameters which in result will give meaningful designs to
understand the forecasting. They believe that the results precision is expected to be extended by 70%. Anjali Samad,
Bhagyanidhi, Vaibhav Gautam, Piyush Jain, Sangeeta, Kanishka Sarkar has done precipitation forecasting applying
Long Short Term Memory Neural Network [6]. This model is based on Australian dataset which covers seasonal
decomposition methods. They claim that their model has been forecasting accurately for check cases but there was some
increase in error rate while handling the outliers. Rose Ellen N. Macabiog, Jennifer C. Dela Cruz have done the model
building on rainfall predictive approach applying classification [7]. They have done data collection, data preprocessing,
building the predictive model, model evaluation and selection and finally testing and evaluation. They observed that we
can get most accurate results if we use all 5 attributes for the model. Course KNN, Fine Gaussian SVM, Neural Network
given the best 0.811 accuracy Arief Bramanto Wicaksono Putra, Rheo Malani, Bedi Suprapty, Achamad Fanany Onnilita
Gaffar have implemented Deep Auto Encoder using Semi CNN [8]. Their method of working includes Auto-Encoder
Neural Network (AENN), Convolution Neural Network (CNN), proposed method, model data time series autoregressive
(AR model). This approach is one of the recent ones for rainfall prediction with 99% performance. Yuana Ratna Sari,
Esmeralda Contessa Djamal, Fikri Nugraha in [9] have done the precipitation forecasting using 1-D CNN. They have
done preprocessing and convolutional neural network (CNN). They observed that there was 0.9463 accuracy of training
data and 0.8146 accuracy of testing data. Also, they configured that more the layers you use in the model, more is the
accuracy. Eslam Hussein, Mehrdad Ghaziasgar, Christopher Thron in [10] have applied Support Vector Machine
Classification. The methodology used consists data gathering, preprocessing, classification, comparison between
different SVM inputs, comparison between regional predictions. Nikhil Oswal in [11] has used machine learning
techniques to implement his model. His methodology includes data exploration and analysis, data preprocessing,
modelling and evaluating. At the end he has concluded that Australian rainfall is not certain there is not a particular
relation between time and shower. Still, he is able to find certain pattern and develop high performance model.
III. PROPOSED SYSTEM
The proposed solution to this is to predict precipitation in short-term forecasts. Designing and developing a precipitation
forecasting system with a web interface (GUI). To predict precipitation a few days ago at a particular location. Accurate
and accurate forecasts help develop better strategies for agriculture and water storage, and are also informed about floods
to implement precautionary measures. Prediction systems are implemented comparing various machine learning
algorithms such as SVR, ANN, and multiple regression. The data is analyzed and visualized using histograms, graphs,
etc. to derive meaningful information from the patterns of precipitation data obtained. The results of the various ML
algorithms implemented are compared with accuracy. The aim of the proposed study is to be effective and efficient in
predicting rainfall with maximum accuracy and precision.
IV. METHODOLOGY
The first and most important step in the process is data collection. We have collected our data from the official website
of the Indian Government from the year 1951 to 2015 from all the districts and for all months. Next step is Converting
the data into the correct format to conduct experiments i.e. doing preprocessing tasks on our dataset like replacing
missing and null values with the mean of the column and detecting and removing outliers and visualizing using boxplots.
Next step involves analysis of the data and observing variations in the patterns of rainfall to derive conclusions and also
to determine the correlation between the different parameters. After that, we visualize our analyzed data using bar
graphs, heat maps, histograms, etc. for a better and easy understanding of the information obtained. After that, we try to
predict the average rainfall by separating data into training and testing datasets. We apply various statistical and machine

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[2]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and
Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
learning approaches like SVR, KNN, logistic regression, etc. in prediction and make analysis over various approaches.
At last, we intend to create a simple, user-friendly, and interactive webpage so that users can input different parameters
and can get predictions accordingly. Figure 1 depicts the methodology.

Figure 1: Methodolgy

Figure 2 explains the basic data flow of the entire methodology and explains each of the individual components of our
approach. The different components are data collection, data cleaning and analysis, data visualization, splitting into
training and testing set, implementing all the mentioned ML Models, input state, month and other values through a web
interface to predict the rainfall.

Figure 2: Flow of the system

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and
Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
V. IMPLEMENTATION AND RESSULTS
Dataset selection and acquisition:
Authors together decided that the dataset on which they will be working should have ample number of rows or tuples
which helps in good and accurate prediction also it should have necessary attributes so that they can determine how
many can be effectively used for the prediction considering their co-relation. After comparing different datasets, they
came to conclusion that the one that is available on Kaggle with title “Rain in Australia – Predict next-day rain in
Australia” owned by Joe Young would be ideal for their prediction model. It has 23 distinct attributes and 145460 tuples
of data. Out of 23 attributes 16 attributes contain numerical values and remaining 7 columns have categorical values.
Furthermore, that 16 numerical values can be subdivided into 2 discrete and 14 continuous values.

Data pre-processing:
After successfully loading the dataset, we checked if this dataset has null values or not. It was the same case which
happens with most of the dataset meaning there were null values present. 21 out of 23 attributes has some or more null
values. After checking the percentage null values, there were 4 such attributes of which more than 35% of rows were
not having any data. Replacing that amount of data with only one value, that could be mean or median, affects the
variance because of its central tendency, also it alters visualization considerably. So, for those 4 attributes, we have
defined a function which will randomly fit a value at the place of null or NaN. Then for the remaining columns which
have continuous numerical values we used the method of replacing null values with median value of that column. After
that, attributes having categorical value with null values are also replaced with distinct but dummy values. At this point
there were zero value in the entire dataset.
Our next aim was to convert categorical values into numerical values which will later will be used for prediction.
Categorical values such as yes and no are replaced with one, zero respectively; location, wind direction these attributes
are assigned distinct representative number. After making the dataset completely numerical we checked for outliers in
the attributes using boxplot. Outliers provide useful insights into the data you are investigating and can affect statistical
results. This can help you find inconsistencies in the statistical process and find errors. For outlier removal we first
define a range using inter quartile range in which if a value resides it isn’t considered as an outlier. But if not, then it
needs to get changed so that it won’t bother the prediction. So, for that we looked for lowest and highest value of that
range. Attribute value lower than that of lowest range value is replaced with the lowest value and higher attribute value
than highest range value is replaced by highest value. After the process of pre-processing, we also drew some graphs
and plots along with boxplot such as histogram, distplot, countplot, subplot for the visualization.

Model validation:
A validation process involves comparing a trained model to a test set of similar data. The testing set is different from
the training set from which the model was trained. Model validation follows model training in order to find a perfect
model with the most ideal performance. The testing data set is utilized principally to check the speculation capacity of
a prepared model. Model validation aims to find an ideal model that has the best performance. For this first thing that
we have done is train test split. Because a model can't be evaluated by using the same data it was trained on. It needs to
get evaluated it with fresh data. So, we did 80-20 percent split of our total dataset. 80 percent is used for training and
remaining for testing.
Another problem we are considering here is handling imbalanced data. An imbalanced dataset is one where one class
has a very high number of observations while the other has many fewer, i.e., one class has very high observations while
the other has very few observations. A standard approach to deal with an imbalanced dataset is to resample the data.
There are two kinds of resampling: oversampling and undersampling. Oversampling is more commonly used than
undersampling. The reason is that when we undersample, we tend to remove instances from the data that may contain
valuable information. A technique called Synthetic Minority Oversampling Technique (SMOTE) generates synthetic
samples for the minority class, thereby reducing the overfitting problem posed by random oversampling. In this method,
interpolation between positive instances that lie together is used to create new instances based on the feature space.
That’s why we did perform this technique and gathered resampled data. Then we applied this data to seven different

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[4]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and
Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
algorithms to check which gives us the best results among the seven. The seven algorithms include CatBoost Classifier,
Random Forest Classifier, Logistic Regression, Gaussian Naïve Bayes, K Neighbours Classifier, XGBoost Classifier
and Support Vector Classifier. After considering every algorithm’s accuracy, RoC curve and AuC score, we decided
that CatBoost will be most appropriate algorithm for prediction. Below is the table having accuracy and AuC score for
each of the algorithms.
Table 1. Accuracy and AuC score of the algorithms
Sr. No. Algorithm Accuracy AuC Score
1. Catboost Classifier 0.9112 0.89
2. Random Forest Classifier 0.8614 0.87
3. Logistic Regression 0.7686 0.85
4. Gaussian Naïve Bayes 0.7501 0.82
5. K Neighbours Classifier 0.7491 0.79
6. XGboost Classifier 0.9081 0.88
7. Support Vector Classifier 0.7770 0.85

Feature selection:
As the crucial features are determined during this step of Data cleaning, feature selection is vital to the process. By
selecting the relevant features, we not only get rid of the unimportant ones but also increase the performance of our
model. We are using mutual_info_classif method in our project. This is an interesting method of selecting features based
on mutual information(entropy) gain that is applicable to classification problems. Using this method, the model is more
accurate due to its univariate filtering. In univariate methods, because features are calculated separately instead of in
groups, the top ten performing variables seem to perform poorly when they are grouped. This results in the selection of
suboptimal features. Nevertheless, univariate filtering methods are relatively quick and can be used for screening, which
leads to better performance and less training. That’s why we figured out mutual information values of every attribute,
arranged it decreasing order (Higher the value, more the dependency) and selected top 10 attributes which were well
suited for prediction. Taking these 10 attributes and performing resampling we used CatBoost Classifier for the
prediction model. Figure 3 illustrates the value on which we got the best 10 attributes.

Figure 3: Mutual Information Score

Interface building:
A web interface has been built to help users understand the importance of building prediction models. In order to create
the web interface, we used flask. It is a web framework which is a Python module that lets you create web applications
easily. One of the important parts of this interface is analysis of the dataset that we have done using PowerBI. Power BI
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and
Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
is a business intelligence tool that allows users to visualize raw data and analyze it as well as present it in the form of
actionable insights. We have made use of line chart, heatmap, bar graph, decomposition tree, area chart, radial chart to
analyze our data. We have also used slicer for periodic or time-oriented analysis.
Figure 4 and 5 holds the view of our powerbi analysis dashboard on which you can see all the charts and plots. There
are two dedicated predictor pages designed for making prediction using user’s data. On one page we are predicting using
limited attributes that we have got using feature extraction. And on the other we can enter data of all 22 entities for
prediction.

Figure 4: PowerBi Analysis Dashboard 1

Figure 5: PowerBi Analysis Dashboard 2

VI. IMPLEMENTATION AND RESULTS


In this paper, we have explored and applied several preprocessing steps and implemented various machine learning
algorithms. Further we have carried out a comparative study to understand the overall performance of the algorithms
and select the most accurate and effective algorithm to implement. Out of the seven algorithms implemented, Catboost
classifier proved to be the most efficient ml model with an accuracy of 91.12% and AuC score of 0.89. We have also
developed an user-friendly front end system to accept different input parameters to observe how the input data affects
the model prediction and how accurately it will predict the rainfall for the next day. A well-developed front end and
interactive dashboard provides easy and better understanding of the data and the built system. We concluded that
Australia's rainfall pattern is irregular and uncertain. Using Feature extraction and data visualization through PowerBI

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[6]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and
Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
dashboard we were able to recognize and figure out relationship among the data and identify the factors which have
significant effect on the rainfall received in the Australian Regions.

VII. FUTURE SCOPE


More changes & ideas can be adapted such as integrating the system with a flood prediction and alert model
pertaining to the rivers. The system can also include another prediction model to predict whether it will rain
tomorrow or not based on weather data for that particular day in major cities. Additional Features such as real-
time updates & major alerts around the country can be displayed on the website for precautionary measures.

REFERENCES
[1] Kala, A., & Vaidyanathan, S. G. (2018, July). Prediction of rainfall using artificial neural network. 2018
International Conference on Inventive Research in Computing Applications (ICIRCA).
[2] Grace, R. K., & Suganya, B. (2020, March). Machine Learning based Rainfall Prediction. 2020 6th International
Conference on Advanced Computing and Communication Systems (ICACCS).
[3] Ahmed, H. A. Y., & Mohamed, S. W. A. (2021, February 26). Rainfall Prediction using Multiple Linear Regressions
Model. 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE).
[4] Basha, C. Z., Bhavana, N., Bhavya, P., & V, S. (2020, July). Rainfall Prediction using Machine Learning & Deep
Learning Techniques. 2020 International Conference on Electronics and Sustainable Communication Systems
(ICESC).
[5] Vasantha, B., Tamilkodi, R., & kiran, L. V. (2019, March). Rainfall pattern prediction using real time global climate
parameters through machine learning. 2019 International Conference on Vision Towards Emerging Trends in
Communication and Networking (ViTECoN).
[6] Samad, A., Bhagyanidhi, Gautam, V., Jain, P., Sangeeta, & Sarkar, K. (2020, October 30). An approach for rainfall
prediction using long short term memory neural network. 2020 IEEE 5th International Conference on Computing
Communication and Automation (ICCCA).
[7] Macabiog, R. E. N., & Dela Cruz, J. C. (2019, November). Rainfall Predictive Approach for La Trinidad, Benguet
using Machine Learning Classification. 2019 IEEE 11th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and Management ( HNICEM ).
[8] Wicaksono Putra, A. B., Malani, R., Suprapty, B., & Onnilita Gaffar, A. F. (2020, July). A deep auto encoder semi
convolution neural network for yearly rainfall prediction. 2020 International Seminar on Intelligent Technology
and Its Applications (ISITIA).
[9] Sari, Y. R., Djamal, E. C., & Nugraha, F. (2020, September 15). Daily rainfall prediction using one dimensional
convolutional neural networks. 2020 3rd International Conference on Computer and Informatics Engineering
(IC2IE).
[10] Hussein, E., Ghaziasgar, M., & Thron, C. (2020, July). Regional rainfall prediction using support vector machine
classification of large-scale precipitation maps. 2020 IEEE 23rd International Conference on Information Fusion
(FUSION).
[11] Oswal, N. (2021). Predicting Rainfall using Machine Learning Techniques. Institute of Electrical and Electronics
Engineers (IEEE).

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[7]

You might also like