100% found this document useful (1 vote)
143 views12 pages

Prediction Rainfall With Regression Analysis

Weather forecasting is one of the many widely used applications of artificial intelligence. Forecasting precipitation is one of the most popular research topics because it results in a great deal of property damage and numerous fatalities. Large-scale flooding can have an impact on a variety of social and practical spheres, including agriculture and disaster preparedness. Even with the most advanced mathematical techniques, older,
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
143 views12 pages

Prediction Rainfall With Regression Analysis

Weather forecasting is one of the many widely used applications of artificial intelligence. Forecasting precipitation is one of the most popular research topics because it results in a great deal of property damage and numerous fatalities. Large-scale flooding can have an impact on a variety of social and practical spheres, including agriculture and disaster preparedness. Even with the most advanced mathematical techniques, older,
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

11 III March 2023

https://doi.org/10.22214/ijraset.2023.49852
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

Prediction Rainfall with Regression Analysis


Sumit Sarkar1, Ayush Srivastava2, Er. Avneet Kaur3
Department of Computer Science and Engineering, Chandigarh University Gharuan-140413

Abstract: Weather forecasting is one of the many widely used applications of artificial intelligence. Forecasting
precipitation is one of the most popular research topics because it results in a great deal of property damage and numerous
fatalities. Large-scale flooding can have an impact on a variety of social and practical spheres, including agriculture and
disaster preparedness. Even with the most advanced mathematical techniques, older, widely used precipitation prediction
models were unable to achieve higher classification rates. This article introduces a cutting-edge new technique for
forecasting monthly precipitation that makes use of linear regression analysis. Using quantitative data about the state of
the atmosphere, forecast when it will rain. Complex information can be recognized by some machine learning systems. a
mapping that joins inputs and outputs with a small number of samples. Because of how quickly the atmosphere may
change, it is challenging to anticipate precipitation with absolute confidence. The variation in conditions from the previous
year should be used to forecast the likelihood of precipitation. For several factors like temperature, humidity, and wind, I
advise utilizing linear regression. Given that the suggested model frequently estimates precipitation based on historical
data for a specific geographic area, this forecast should be more accurate. Comparing the model's performance to well-
known methods for precipitation prediction, it performs more accurately.
Keywords: Weather, Linear Regression, Accuracy, Parameters, Rainfall Prediction system, Machine Learning, Dataset,
Classification algorithms

I. INTRODUCTION
The natural environment's most significant characteristic, precipitation, has an impact on a variety of things, including
agriculture, water supply, and climate change. Decision-making in a range of businesses depends on accurate precipitation
forecasts. Regression analysis is a statistical method for predicting how different variables will interact. Regression analysis
may be used to forecast precipitation in this situation.
In a regression study, the best-fit line or curve that illustrates the connection between two or more variables is sought after. The
variables that matter for predicting rainfall include the amount of rain, the passing of time, and other meteorological factors
including temperature, humidity, and wind speed. A regression model may be created to forecast future rainfall patterns by
examining the historical data for these factors. Obtaining pertinent data is the first stage in the process of utilizing regression
analysis to forecast rainfall. This information covers past rainfall patterns, timing, and other meteorological elements,
including temperature, humidity, and wind speed. In order to forecast future rainfall patterns, a regression model is created
using this data. A regression model may then be used to forecast future precipitation patterns. For instance, to forecast probable
weekly or monthly precipitation patterns, a regression model can be utilized. This information is useful for many industries,
including agriculture.[2][6] It is a tool that farmers can use to plan planting and harvesting operations. Although it may be
used to accurately anticipate precipitation patterns, regression analysis is not a perfect method. Forecast accuracy is impacted
by a number of factors, including changing climatic patterns and the limitations of the data used to build the model.
To categorize the input data and forecast when it would rain, linear regression was applied. The suggested model may be used
to forecast precipitation, lessen different social effects, and proactively plan for disaster aid. Both the categorization of images
and the forecasting of precipitation employ a linear regression methodology.[4][8] This section's remaining content is as
follows: Part II contains the literature review, Section III contains the articles' techniques, and Part II contains their diverse
outcomes. The report's conclusion outlines significant future research that might be incorporated into or added to the
suggested study.
II. LITERATURE REVIEW
Several researchers have worked to increase the precision of the machine learning algorithms used in weather forecasting
during the past 20 years. Here, a few pertinent research articles are mentioned. The researcher's ANN-based technique for
forecasting atmospheric conditions was presented in [18]. Several meteorological variables, including humidity, temperature,
and wind speed, were included in the dataset used for forecasting.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1934
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

Hu (1964) was the first to develop his ANN, a crucial soft computing method for forecasting the weather.[1][9] During the
past two decades, significant improvements in the variety of ANNs have led to the development of novel techniques for
forecasting environmental occurrences (Gardener and Darling, 1998; Hsiesh and Tang, 1998). Michaelides et al. (1995)
examined the efficacy of ANNs using multiple linear regression to extrapolate the missing precipitation data for Cyprus.
Kalogirou et al. (1997) employed ANN to replicate the precipitation over time dataset in Cyprus. Lee et al. (1998) split the
available data into uniform subpopulations for the purpose of forecasting rainfall. The ambiguous rule base was developed by
Wong et al. (1999) using backpropagation neural networks and SOM.[14] The rule base was then utilized to develop a
spatially interpolated prec1p1tation forecast model for Switzerland. Toth et al. (2000) thought about using a model for
forecasting short-term precipitation to predict floods in real time. was found to be accurate for lead times greater than 3 hours
but insufficient to reproduce low rainfall. [17] Several structures of autoregressive moving average models (ARMA), ANNs,
and nearest neighbor methods with lead times of 1 to 6 hours were used to forecast storms over the Sieve river basin in Italy
between 1992 and 1996. Using information from weather stations, radars, satellites, and the Asian Spectral Model of the
Japan Meteorological Agency (JMA), Koizumi (1999) created an ANN model. He used his data from the previous year to
train the model. We discovered that for the prediction of precipitation, linear regression, and persistence, the ANN feature
performs better than numerical models (after 3 hours). As the ANN model was trained with only 1 year of data, the results were
limited. The authors predicted that as more training data became available, neural network performance would increase. It
is still unknown how much each predictor impacted the prognosis and how much recent data had an impact.
Abraham et al. in (2001) used the scaled conjugate gradient algorithm (ANN-SCGA) and ANN with evolving fuzzy neural
networks to predict precipitation time series (EfuNN). In this work, the training model's input data set included monthly
precipitation. The authors looked at his 87-year precipitation data in Kerala, the southernmost state on the Indian peninsula.
According to empirical results, pure neural network techniques are outperformed by neuro-fuzzy systems in terms of run time
and error rate (5). Precipitation, however, is one of the twenty most complicated and challenging components of the
hydrological cycle to comprehend and predict due to its extremely changeable unpredictability over a wide variety of
geographic and temporal scales (French et al., 1992).
Research on Precipitation Prediction in Chennai Using Multiple Regression Analysis by S. Sivasankari and M. Punithavallis,
2019: In this work, precipitation in Chennai, India, was predicted using multiple regression analysis. The authors' three-
variable multiple regression model, which takes into account the monsoon, the southwest monsoon, and the northeast
monsoon, produced the best results when used with data from the Indian Meteorological Department. " Precipitation
Forecasting Using Multiple Linear Regression and Artificial Neural Networks "by J. C. Olaniyan and O.A. Ajayis (2020):
This study tested the efficacy of multivariate linear regression and artificial neural networks to forecast precipitation in
Nigeria. In tenns of prediction, the authors found that the artificial neural network model performed better than the multiple
linear regression model. Utilizing artificial neural networks and linear regression analysis, the study "Vaticination of Rainfall,"
D.K. Singh and A. Kumar, 2019. [11] In this study, the Indian state of Uttar Pradesh's demise was predicted using artificial
neural networks and direct retrogression analysis. According to the authors, neither of the two models exhibited significantly
greater performance than the artificial neural network model, although both models had excellent prediction. 2018 saw the
publication of O.A. Adeoye and A.A. Olawale's essay, "Rainfall Prediction Using Retrogression Analysis: A Case Study of
Nigeria." The decline of Nigeria was described in this paper using retrogression analysis. According to the authors, when
comparing the individual performances of the different models, the boxy model fared better than the direct, quadratic, and
boxy models.
III. RELATED WORK
Meteorology and hydrology have both employed regression analysis to forecast precipitation. In order to develop an accurate
and trustworthy precipitation forecast model, several investigations have been carried out.
The following are some relevant efforts in this area:
Utilizing various machine learning approaches, several research proposals are being made by various scientists. Artificial
neural networks were used in research by Deepak Ranjan Nayak [15] to forecast rain. At Pondicherry, rain was forecast by
Akash D. Dubey [16]. The quantity of tumor cells in the liver was detennined using data from a study by Rui Lu et al. [15].
Using CT images, tumor cell borders were identified. Although this technique requires a lot of computing time, it is believed
to be quite effective at slicing tumor cells and determining their volume. Kostas Haris wrote a piece for Hybrid His Image in
[16]. His segmentation method, which blends catchment morphological procedures with edge-based and region-based
methodologies,

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1935
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1936
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

The rainfall dataset is used as input during the pre­ processing stage of the linear regression model.
The feature is extracted using a linear regression model. [20] The main goal of this research is to evaluate the several
methodologies presented by the authors in order to create a real-time rainfall forecast system that corrects the flaws in earlier
approaches and provides the most accurate solution.

Fig. 4. (ii). Architecture of Proposed Model

We locate, classify, and then create a list of the model's frequently occurring values or item sets. These models may be used to
access a wide range of data, such as the local climate's temperature, humidity, and rainfall. The suggested paradigm's overall
layout and movement are depicted in Figure 2.

We first computed the difference value and then squared it in order to use this formula to obtain the predicted values. Some
values do not satisfy the system when a different value is given. The term "mean squared error" is used to describe this type of
procedure.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1937
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1938
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

3) Random Forest: With the use of data samples, the supervised learning technique Random Forest builds decision trees for
classification and regression.
The first stage is selecting randomly selected samples from a certain dataset.
Step 2: Each data sample's decision tree is created, and then each decision tree is used to make a prediction.
Step 3: Voting will then be held on each predicted result.
Step 4: Choose the forecasted result that will earn the most votes.

4) Decision Tree: The decision tree algorithm is a classification method that functions on both categorical and numerical
data. It creates tree-like structures and analyzes the information in a graph that resembles a tree. This method aids in
splitting the data into two or more groups that are connected based on the most important indications. To divide the data
into predictors with the largest information gain or lowest entropy, we first calculate the entropy of each characteristic.
The results are easier to read and comprehend. This method outperforms others in terms of accuracy since it examines the
dataset in a tree-like graph. The decision tree method is applied to both classification and regression in machine learning.
In a decision tree, each branch knot stands for an essential decision, whereas splint bumps signify a resolution. Decision
Wood effectively uses both categorical and continuous variables, despite the fact that our item variable (downfall) in the
current study is a double categorical variable. It is known that decision trees are built using the methods C5.0, Chi-
squared Automatic Interaction Detection (CHID), ID3, Quest, Bracket and Retrogression Trees (WAIN), and C4.5.[12]
The C5.0 was picked for the current discussion and used with the three training-to-testing rates. The C5.0 algorithm is a
more complex version of the TD3 and C4.5 algorithms.
5) Multiple linear Regression: Is the process of creating a regression model to forecast the dependent variable using a large
number of independent variables (rainfall). The model assumes a linear relationship between the independent factors and
the dependent variable. The magnitude and axis of the connection between the independent variables and rainfall are
displayed in the model's coefficients. Statistics like R-squared and modified R-squared can be used to evaluate the model's
accuracy.
6) Polynomial Regression: Using this technique, a polynomial equation may be fitted to the rainfall data by adding
polynomial components to the linear regression model. The higher-order tenns can reflect the more complex relationships
between the independent variables and rainfall. In the case of overfitting, which occurs when there are too many
polynomial terms incorporated, the model may perform well on training data but poorly on test data.
7) A method known as "time series analysis" looks at previous trends in rainfall data over time to predict future
precipitation. With the data, time-series models may spot seasonality, trends, and cyclical patterns. These models may be
evaluated using metrics such as mean absolute error (MAE), mean square error (MSE), and root mean square error
(RMSE).

D. Evaluation
1) Accuracy: It is the ratio of number of correct outputs to the total number of input samples.
2) Precision: It is the number of correct positive correct results divided by the number of positive results predicted by the
classifiers.
The efficacy of various algorithms may be examined using a wide range of assessment criteria. [18] The current study focuses
on the confusion matrix, which serves as the foundation for the metrics mentioned above as well as accuracy, precision,
recall, and f­ measure. As a result, the measures are characterized as follows:

E. Confusion Matrix
A matrix that summarizes the effectiveness of the model is the output of the confusion matrix, as shown in Table 1. Where:
1) The TN represents the total quantity of negative data that has been erroneously classified.
2) The entire amount of accurately recognized positive data that has been incorrectly classified as "negative" is known as
FN.
3) FP stands for the total amount of data that is wrongly classified as "positive" but is actually negatively classified.
4) Total Positive Data (TP) is the sum of all correctly classified and favourably classified data.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1939
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

Table 1. Class of Confusion Matrix

Inferring the mathematical formula for assessment measures like recall, precision, accuracy, and the Fl score from equations 7,
8, and 9 and 10 suffices.

V. EXPERIMENTAL RESULTS
The experimental findings for the provided model were acquired using Jupyter Notebook. Using historical rainfall data for the
years FY 2018-2022, the model was trained. It also included a number of other weather-related details from various years.

The model-building procedure is divided into four steps, which take place in that order:
1) Choose the input and output data for the supervised learning.
2) Normalizing the data both at the input and output.
3) Training for linear regression using the corrected data.
4) Assessing the model's degree of fit
5) Making a comparison between the desired outcome and the anticipated result.

Fig. 5. (i). Rainfall Prediction in terms of Humidity

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1940
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1941
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1942
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue III Mar 2023- Available at www.ijraset.com

Each approach has its own benefits and drawbacks, and the selection of a technique depends on the specifics of the research
question and the qualities of the data. In addition to regression analysis, various machine learning approaches such as
Artificial Neural Networks (ANN), Decision Trees, and Random Forest may also be used to forecast rainfall. These algorithms
have been widely applied in meteorology as they have been demonstrated to be good in forecasting rainfall in diverse regions.
Overall, the ability to forecast rainfall through the use of regression analysis or other machine learning algorithms is a crucial
field of study with implications in agriculture, water resource management, and disaster planning.

REFERENCES
[1] M.J.C., Hu, Application of ADALINE system to weather forecasting,Technical Report, Stanford Electron, 1964. PP- 2
[2] Michael ides, S. C., Neocleous, C. C. & Schizas, C. N. "Artificial neural networks and multiple linear regression in estimating missing rainfall data." In:
Proceedings of the DSP95 International Conference on Digital Signal Processing, Limassol, Cyprus. 1995. PP- 1
[3] Kalogirou, S. A., Neocleous, C., Constantinos, C. N., Michaelides, S. C.& Schizas, C. N.,"A time series construction of precipitation records using
artificial neural networks. In: Proceedings of EUFIT '97 Conference, 8-11 September, Aachen, Gernrnny. 1997.PP-5
[4] Lee, S., Cho, S.& Wong, P.M.,"Rainfall prediction using artificial neural network.",J. Geog. Inf. Decision Anal. 1998. PP- 2
[5] Wong, K. W., Wong, P. M., Gedeon, T. D. & Fung, C. C., "Rainfall Prediction Using Neural Fuzzy Technique." 1999. PP- 3
[6] Koizumi, K.: "An objective method to modify numerical model forecasts with newly given weather data using an artificial neural network",
Weather Forecast., 1999. PP- 1
[7] Ben Krose and Patrick van der Smagt , "An introduction to neural networks", Eighth edition, November 1996. PP- 5
[8] Ajith Abraham, Dan Steinberg and Ninan Sajeeth Philip,"Rainfall Forecasting Using Soft Computing Models and Multivariate Adaptive
Regression Splines", 2001. PP- 2
[9] Paras, Sanjay Mathur, Avinash Kumar, and Mahesh Chandra, "A feature based on weather prediction using ANN"World Academy of
Science, Engineering and Technology 2007. PP- 2
[10] E.Toth, A.Brath, A.Montanari," Comparison of short-term rainfall prediction models for real-time flood forecasting", Journal of Hydrology 239
(2000). PP- 3
[11] L. L. Lai, H. Braun, Q. P. Zhang, Q. Wu, Y. N. Ma, W. C. Sun, and L. Yang, "Intelligent weather forecast," in Proc. IEEE 2004 International
Conference on Machine Leaming and Cybernetics, 2004. PP- 2
[12] N. Hasan, M. T. Uddin, and N. K. Chowdhury, "Automated weather event analysis with machine learning," in Proc. IEEE 2016 International
Conference on Innovations in Science, Engineering and Technology (ICISET), 2016. PP- 2
[13] 1. Rahman, M. M., Bhattacharya, P., & Desai, B. C. (2007). A framework for medical image retrieval using machine learning and statistical
similarity matching techniques with relevance feedback. IEEE Transactions on Information Technology 111 Biomedicine, 11(1). PP- 6
[14] Delhi Weather Data. [Online]. Available: https://www.kaggle.com/mahirkukreja/delh i­ weatherdata/home. PP- 3
[15] Morales, M., Tapia, L., Pearce, R., Rodriguez, S., & Amato,N. M. (2004). A machine learning approach for featuresensitive motion planning. In
Algorithmic Foundations of Robotics VI. Springer, Berlin, Heidelberg. PP- 4
[16] Ireland, G., Volpi, M., & Petropoulos, G. P. (2015). Examining the capability of supervised machine learning classifiers in extracting flooded
areas from Landsat TM imagery: a case study from a Mediterranean flood. Remote sensing, 7(3). PP- 5
[17] N. Q. Hung, M. S. Babel, S. Weesakul, and N. K. Tripathi "An Artificial Neural network Model for rainfall Forercastingin Bangkok,Thailand",
Hydro!. Earth Syst. Sci., 2009. PP- 6
[18] Kyaw Kyaw Htike and Othman 0. Khalifa, "Research paper on ANN model using focused time delay learning", Internationa l Conference on
Computer and Communication Engineering (ICCCE 2010), 11-13 May 2010, Kuala Lumpur. PP- 3
[19] Dr S. Santosh Baboo and I. Khadar Shareef, "An efficient Weather Forecasting Model using Artificial Neural Network", International Journal of
Environmental Science and Development, Vol. I, No. 4,October 2010. PP- 2
[20] Enireddy Vamsidhar et. al.,"Prediction of rainfall Using Backpropagation Neural Network Model", International Journal on Computer Science
and Engineering Vol. 02, No. 04, 2010. PP- 3
[21] A. G. Salman, B. Kanigoro, and Y. Heryadi, "Weather forecasting using deep learning techniques," in Proc. IEEE 2015 International Conference
on Advanced Computer Science and Information Systems (ICACSIS), 2015. PP- 3

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1943

You might also like