0% found this document useful (0 votes)
26 views5 pages

Standardization of Rainfall Prediction in Bangladesh Using Machine Learning Approach

Uploaded by

mr.jhion.adbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views5 pages

Standardization of Rainfall Prediction in Bangladesh Using Machine Learning Approach

Uploaded by

mr.jhion.adbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Standardization Of Rainfall Prediction In

Bangladesh Using Machine Learning Approach


Nushrat Jahan Ria Jannatul Ferdous Ani Mirajul Islam
Dept. of CSE Dept. of CSE Dept. of CSE
Daffodil International University Daffodil International University Daffodil International University
Dhaka, Bangladesh Dhaka, Bangladesh Dhaka, Bangladesh
[email protected] [email protected] [email protected]

Abu Kaisar Mohammad Masum


Dept. of CSE
Daffodil International University
Dhaka, Bangladesh
[email protected]

Abstract— Rainfall has a significant impact on human life in Rainfall forecasting accuracy is being enhanced by
many areas, including natural disasters such as agriculture, scientists and researchers all over the world. For forecasting
droughts, floods, and landslides. Artificial intelligence rainfall, several statistical and numerical models have been
technologies have a high chance of succeeding in these fields. developed. In the realm of forecasting, however, the
The goal of this research is to develop the rain prediction model importance of numerical models is fading due to ongoing
using Machine Learning. This model is built based on a 2,391 developments in soft computing methodologies. Artificial
records dataset that data has collected from the website named neural networks have emerged as an inductive strategy for
Bangladesh Jatiyo Tottho Batayon. We used a total of five hydrological and clamatorial forecasting in recent decades,
models for the study. Each model has trained with eight input
owing to its ability to model both linear and non-linear
features then it has validated the rainfall predictions. All of the
model performance indicated well. But Random Forest predicts
systems without any prior knowledge about catchment
the dataset most accurately so that it is the best for this study. behavior and flow processes. For early warning of landslide
Random Forest classifier delivered higher Accuracy, Recall, occurrence, a daily rainfall forecasting system based on the
Precision, and F1 Score. BPNN model can be used. The research provides four
backpropagation neural networks with various topologies
Keywords— Machine learning, Rainfall, Climate, Random capable of predicting the next day's rainfall intensity [2].
Forest, Confusion matrix. Rainfall thresholds and antecedent rainfall records were
among the input features. Support vector machine (SVM),
I. INTRODUCTION another innovative machine learning strategy, has proven to
Rainfall is the most important component of the be an effective strategy for handling non-linear and
hydrological cycle, and it is the principal source of rainfall complicated machine learning issues. Non-linear regression
distribution that may have been used and regulated directly to problems are solved using support vector regression (SVR), a
meet consumption requirements. Water is the most valuable modified form of SVM. The comparison of several modern
natural resource that plays a significant role in social and methodologies and algorithms aids in enhancing rainfall
economic development. Water resources are generally under forecasting accuracy. The proposed study evaluates the
stress or facing a mismatch between supply and demand in accuracy of monthly rainfall forecasts generated using time-
numerous regions around the globe. This is due to a rise in series analysis approaches with the single input parameter
demand, a shortage of accessible water, or poor management being antecedent rainfall intensity [3]. Linear regression,
of available water resources. In fact, proper water planning backpropagation neural network (BPNN), support vector
and management are the main options for reducing water regression (SVR), and the long short term memory (LSTM)
stress or closing the gap between supply and demand. recurrent neural network are the machine learning approaches
Numerous steps have been taken around the world to not only used to make predictions. To determine the likelihood of a
predict rainfall but also to establish the interaction between landslide, rainfall estimates for a certain site can be compared
rainfall and runoff in order to alter water management to the proper landslide triggering rainfall thresholds.
procedures to make water accessible whenever it is needed In this research study, we have made rainfall predictions
[1]. based on daily climate data. Five machine learning algorithms
Bangladesh's rainfall varies based on the season and have been trained and tested using the data. Those algorithms
region. Winter (November to February) is extremely dry, are Decision Tree, K Neighbors, Logistic Regression,
accounting for only around 4% of the yearly rainfall. The Multinomial NB, and Random Forest. The remainder of the
westerly disturbances that approach the nation from the article continues as follows: The next part of the related work
northwestern states of Bangladesh cause rainfall to vary from depicts a comprehensive summary of the relevant articles.
20 mm in the west and south to 40 mm in the northeast during Then methodology describes the dataset and algorithm that is
this season. Bangladesh's average annual rainfall ranges from used in our paper. Then there's the result and discussion
1500 millimeters in the west-central regions to over 3000 section, which goes over all of the models' findings and
millimeters in the northeast and southeast. The rainfall in outcomes. The final section summarizes our work.
Surma Valley and the surrounding hills is extremely high.
II. RELATED WORK
Rainfall averages 4180 mm in Sylhet, 5330 mm on the foot of
the steep Meghalaya Plateau in Sunamganj, and 6400 mm in This section discusses some of the significant studies
Lalakhal, the highest in Bangladesh. related to our research topic that has been completed by other
authors. Rani et al. [4] presented a flood detection system and basis from 3 district stations: Gazipur, Rangpur, and Barishal
an alerting system using machine learning techniques. For from the year 2016 to 2019.
rainfall prediction, they used Linear Regression, Support
Vector Machines, and Artificial Neural Networks in their
research. Kader et al. [5] applied Particle Swarm Optimization
(PSO) and Multi Layer Perceptron (MLP) techniques for
rainfall forecasting. They experimented with four factors: high
temperature, low temperature, humidity, and wind speed.
Misra et al. [6] used the machine learning approach to
anticipate average rainy season rainfall in the Indian state of
Odisha. In their study maximum of 91% accuracy was
achieved by Random forest regression. Zhang et al. [7] used
Support Vector Regression and Multilayer perception
developed for maximum rainfall prediction in both monsoon
and non-monsoon seasons. The parameters used in their
research are average temperature in a month, wind velocity,
humidity, and cloud cover. Adnan et al. [8] find the
applicability of four machine learning methods, ANFIS-PSO,
ANFIS-FCM, MARS, and the results were compared to those
of EBA4SUB, a physically event-based technique. Anwar et
al. [9] used historical meteorological data and builds a rain
prediction model using a rule-based Machine Learning
technique. In their research, the decision tree model had the
highest accuracy. Anwar et al. [10] used the Extreme Gradient
Boosting model to predict rainfall. In their dataset, they used Fig. 1. Dataset creation procedure.
weather data obtained by the weather station over the previous The process of creating a dataset after collecting the raw
seven years. The 8 attributes of their dataset are Minimum data files from the website is shown in Figure 1. Out of the
temperature, Maximum temperature, Average temperature, total 9 attributes, 8 are input attributes and their details are
Average Humidity, Sun exposure time, Maximum wind shown in Table I. There are 2,391 records in the dataset, with
speed, Average wind speed, and Rainfall. 1660 records indicating no rain and 731 records indicating
Singh et al. [11] proposed a method and builds an rain. Of these, 80% of data have been used for model training
application by using Raspberry Pi 3 B. This program uses real- and 20% for model testing. Since the average amount of
time data from humidity, temperature, and pressure sensors to rainfall in a year is so limited that’s why our dataset is
forecast whether it will rain today. Park et al. [12] used imbalanced as seen in Figure 2. On our dataset, we trained and
different machine learning algorithms for estimating the tested five of the most popular machine learning models.
probability of flooding in South Korea's coastal districts.
Dikshit et al. [13] predicting temporal drought trends in New TABLE I. DATASET ATTRIBUTES IN DETAILS
South Wales, Australia's south-eastern state by using machine Attribute Description Type
learning approaches. Again machine learning techniques are
used to predict the Standardized Streamflow Index for
hydrological dryness in paper [14]. Klompenburg et al. [15] Max Temp The maximum temperature Numeric
predict Crop yield using machine learning techniques. Their of a day is shown in Celsius
research differs from others in terms of scale, geological Min Temp The minimum temperature Numeric
location, and crop. To forecast drought in Pakistan, of a day is shown in Celsius
researchers [16] employed Support Vector Machine (SVM), Actual The actual evaporation of a Numeric
Artificial Neural Network (ANN), and k-Nearest Neighbour Evaporation day is shown in millimeters
(KNN). In these studies the aithors worked on a specific (mm)
region and used a little parameters. Here, in our paper we used Relative Relative humidity recorded Numeric
the of three different regions. Moreover, we used eight Humidity at 9.00 am in one day is
parameters for our dataset. (9.00 am) shown in percent (%)
III. RESEARCH METHODOLOGY Relative Relative humidity recorded Numeric
Humidity at 2.00 pm in one day is
This section will go through the data collection and (2.00 pm) shown in percent (%)
preparation process, including the procedure and approaches
used in this study. Sunshine The total number of hours of Numeric
sunshine in a day is shown in
A. Dataset (hrs/day)
The dataset is the most significant and crucial aspect of our Cloudy The total number of hours Numeric
research study. We have collected the data we have used from cloudy in a day is shown in
Bangladesh's government official website named Bangladesh (hrs/day)
Jatiyo Tottho Batayon [17]. Mainly Climate database [18] Solar Total Solar Radiation in one Numeric
section of the Bangladesh Rice Research Institute section of Radiation day is shown in (cal / cm2)
this site. Our dataset contains Climate data recorded on a daily Rainfall 0= No Rain, 1= Rain Nominal
Rain
31%

No
Rain
69%

Fig. 2. Output class ratio.

TABLE II. DATASET FOR THIS RESEARCH


Relative Humidity (2.00 pm)
Relative Humidity (9.00 am)
Actual Evaporation

Solar Radiation
Max Temp

Min Temp

Sunshine

Rainfall
Cloudy

Fig. 4. Correlation matrix.

B. Classification algorithms
 Decision Tree: The most powerful and widely used
tool for categorization and prediction is the decision
tree. A decision tree is a flowchart-like tree structure in
10.5 3.2 1 92 60 5.3 5.4 243.44 0
which each internal node represents an attribute test,
each branch reflects the test's conclusion, and each leaf
16.4 11.6 1.4 93 70 6.1 4.7 272.67 1
node (terminal node) stores a class label.
37.4 25.8 5 76 42 6.6 5.9 319.06 0
36 27 4.2 84 74 5.7 7.2 307.18 1  K Neighbors: A non-parametric classification method
is the k-nearest neighbor's algorithm (k-NN). It is
35.3 21.8 4 86 40 7.7 4.1 407.63 0
employed in the categorization and regression of data.
The input in both situations is the k closest training
examples in the data set. Whether k-NN is used for
classification or regression determines the outcome.
 Logistic Regression: Logistic regression is a statistical
model that uses a logistic function to represent a binary
dependent variable in its most basic form, though there
are many more advanced variants. Logistic regression
(or logit regression) is a method of estimating the
parameters of a logistic model in regression analysis.
 Multinomial NB: The multinomial Naive Bayes
classifier is effective for discrete feature classification
(e.g., word counts for text classification). Normally,
integer feature counts are required for the multinomial
distribution. Fractional counts, such as TF-IDF, may
also function in practice. alpha float is a parameter with
a default value of 1.0. The probability formula of
Naive Bayes is:

in which, PA denotes the likelihood of an event


occurring in the past. PBA denotes the likelihood of B
occurring if A occurs. PAB denotes the likelihood of
Fig. 3. Dataset histogram representation. A occurring if B occurs. PB denotes the likelihood of
occurrence of B.
Figure 3 shows a histogram representation of the data
distribution for each of the nine attributes. And the Correlation  Random Forest: Random forest is a versatile, easy-to-
matrix for our dataset is shown in Figure 4. A correlation use machine learning technique that, in most cases,
matrix is a table that shows the correlation coefficients for gives excellent results even without hyper-parameter
adjustment. Because of its simplicity and versatility, it
various variables.
is also one of the most widely used algorithms (it can
be used for both classification and regression tasks).
We'll look at how the random forest algorithm works Table IV demonstrates the prediction performances of the
and how it varies from other algorithms in this post. algorithms. These performances are evaluated using
evaluation metrics. It is seen that the Random Forest classifier
performed better by obtaining the maximum 87.68% of
classification accurateness comparing with other machine
learning (ML) models performance where MultinomialNB,
Logistic Regression, Decision Tree, and KNeighbors
classification accuracy were 76.41%, 80.38%, 81.21%, and
81.63% respectively. According to the findings of the study,
Random Forest performs better because it achieves precision,
recall, and F1 Score results of 90.63 %, 92.46%, and 91.54%
respectively. All of these are superior to other algorithms.
IV. RESULT ANALYSIS AND DISCUSSION
This section provides a comparative analysis of the 100
supervised machine learning algorithms used to predict
rainfall probability depending on daily climate data. 80
Comparative analysis is carried out using the Confucian 60
Matrix and four performance evaluation metrics such as 40 Recall
Recall, Precision, F1 Score, and Accuracy. 20 Precision
TABLE III. PREDICTION RESULTS 0 F1 Score
Model True True False False Accuracy
Positive Negative Positive Negative
(TP) (TN) (FP) (FN)
Decision 293 96 38 52
Tree
K 306 85 49 39
Neighbors Fig. 5. Prediction performance visualization.
Logistic 306 79 55 38
Regression V. CONCLUSION
Multinomial 280 86 48 65
The rain forecast model is a very helpful model for
NB
Random 319 101 33 26 humans as it will make everyday life easier. This research
Forest proposes rainfall prediction by using Machine learning
models with weather datasets. In this study, there have 69%
no rain data and 31% rain data. Raw data is always
The Confusion Matrix of five used machine learning incomplete and that data cannot be sent through a model. So,
algorithms is shown in Table III. Confusion matrices are we have preprocessed the dataset and divided into two part:
plotted to estimate the representation of classifiers which train and test. These predictions depend on eight features
include True Positive (TP), True Negative (TN), False those are Max Temp, Min Temp, Actual Evaporation,
Positive (FP), and False Negative (FN) [19]. The number of
Relative Humidity (9.00 am), Relative Humidity (2.00 pm),
erroneous predictions of the MultinomialNB classifier is
higher than the others. It was able to make 366 accurate Sunshine, Cloudy, and Solar Radiation. We used five
predictions and 113 incorrect predictions out of 479 data in the Machine learning models in this study. In Confusion Matrix,
test set. On the other hand, the number of accurate predictions Random Forest classifiers have gained much TP and TN
of the Random Forest classifier is the highest in our research through all of the models. The result had demonstrated that
study which is 420. the Random Forest is capable of producing accurate
predictions for daily rainfall projections. Random Forest
TABLE IV. CLASSIFICATION REPORT model's achieved the highest accuracy that is 87.68%. We
used the dataset of 2016 to 2019. To improve the accuracy it
Model Recall Precision F1 Accuracy
needs to increase the data volume of at least ten years. In the
(%) (%) Score (%)
future, we will work with all of Bangladesh's weather stations
(%)
last 10 years of climate data.
Decision 84.93 88.52 86.69 81.21
Tree REFERENCES
K 88.70 86.20 87.43 81.63
Neighbors [1] Y. Tikhamarine, D. Souag-Gamane, A. N. Ahmed, S. S. Sammen, O.
Logistic 88.70 84.76 86.69 80.38 Kisi, Y. F. Huang, and A. El-Shafie, “Rainfall-runoff modelling using
Regression improved machine learning methods: Harris hawks optimizer vs.
particle swarm optimization,” Journal of Hydrology, vol. 589, p.
Multinomial 81.16 85.37 83.21 76.41 125133, 2020.
NB [2] S. R. Devi, C. Venkatesh, P. Agarwal, and P. Arulmozhivarman,
Random 92.46 90.63 91.54 87.68 “Daily rainfall forecasting using artificial neural networks for early
Forest warning of landslides,” 2014 International Conference on Advances in
Computing, Communications and Informatics (ICACCI), 2014.
[3] S. Srivastava, N. Anand, S. Sharma, S. Dhar, and L. K. Sinha, [11] N. Singh, S. Chaturvedi, and S. Akhter, “Weather Forecasting Using
“Monthly Rainfall Prediction Using Various Machine Learning Machine Learning Algorithm,” 2019 International Conference on
Algorithms for Early Warning of Landslide Occurrence,” 2020 Signal Processing and Communication (ICSC), 2019.
International Conference for Emerging Technology (INCET), 2020. [12] S.-J. Park and D.-K. Lee, “Prediction of coastal flooding risk under
[4] D. S. Rani, G. N. Jayalakshmi, and V. P. Baligar, “Low Cost IoT based climate change impacts in South Korea using machine learning
Flood Monitoring System Using Machine Learning and Neural algorithms,” Environmental Research Letters, vol. 15, no. 9, p. 094052,
Networks: Flood Alerting and Rainfall Prediction,” 2020 2nd 2020.
International Conference on Innovative Mechanisms for Industry [13] A. Dikshit, B. Pradhan, and A. M. Alamri, “Temporal Hydrological
Applications (ICIMIA), 2020. Drought Index Forecasting for New South Wales, Australia Using
[5] H. Abdel-Kader, M. A.-E. Salam, and M. Mohamed, “Hybrid Machine Machine Learning Approaches,” Atmosphere, vol. 11, no. 6, p. 585,
Learning Model for Rainfall Forecasting,” Journal of Intelligent 2020.
Systems and Internet of Things, 2021. [14] S. Shamshirband, S. Hashemi, H. Salimi, S. Samadianfard, E. Asadi,
[6] R. K. Misra, P. K. Panda, A. K. Sahu, S. Sahoo, and D. P. Behera, S. Shadkani, K. Kargar, A. Mosavi, N. Nabipour, and K.-W. Chau,
‘’Rainfall Prediction using Machine Learning Approach: A Case Study “Predicting Standardized Streamflow index for hydrological drought
for the State of Odisha,” Indian Journal of Natural Sciences, 2020. using machine learning models,” Engineering Applications of
[7] X. Zhang, S. N. Mohanty, A. K. Parida, S. K. Pani, B. Dong, and X. Computational Fluid Mechanics, vol. 14, no. 1, pp. 339–350, 2020.
Cheng, “Annual and Non-Monsoon Rainfall Prediction Modelling [15] T. V. Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction
Using SVR-MLP: An Empirical Study From Odisha,” IEEE Access, using machine learning: A systematic literature review,” Computers
vol. 8, pp. 30223–30233, 2020. and Electronics in Agriculture, vol. 177, p. 105709, 2020.
[8] R. M. Adnan, A. Petroselli, S. Heddam, C. A. G. Santos, and O. Kisi, [16] N. Khan, D. Sachindra, S. Shahid, K. Ahmed, M. S. Shiru, and N.
“Short term rainfall-runoff modelling using several machine learning Nawaz, “Prediction of droughts over Pakistan using machine learning
methods and a conceptual event-based model,” Stochastic algorithms,” Advances in Water Resources, vol. 139, p. 103562, 2020.
Environmental Research and Risk Assessment, vol. 35, no. 3, pp. 597– [17] “Bangladesh Jatiyo Tottho Batayon” - Government of the People's
616, 2020. Republic of Bangladesh. [Online]. Available:
[9] M. T. Anwar, S. Nugrohadi, V. Tantriyati, and V. A. Windarni, “Rain https://bangladesh.gov.bd/index.php. [Accessed: 23-May-2021].
Prediction Using Rule-Based Machine Learning Approach,” Advance [18] “Bangladesh Rice Research Institute” - Government of the People's
Sustainable Science, Engineering and Technology, vol. 2, no. 1, 2020. Republic of Bangladesh. [Online]. Available:
[10] M. T. Anwar, E. Winarno, W. Hadikurniawati, and M. Novita, http://www.brri.gov.bd/site/page/ [Accessed: 23-May-2021].
“Rainfall prediction using Extreme Gradient Boosting,” Journal of [19] Zubair Hasan K.M., Zahid Hasan M., “Performance Evaluation of
Physics: Conference Series, vol. 1869, no. 1, p. 012078, 2021. Ensemble-Based Machine Learning Techniques for Prediction of
Chronic Kidney Disease”. Advances in Intelligent Systems and
Computing, vol 882, 2019.

You might also like