0% found this document useful (0 votes)
6 views4 pages

ARIMA

This paper presents a weather forecasting model that utilizes data mining techniques and the ARIMA model to analyze and predict weather patterns based on time series data. The model aims to improve the accuracy of weather predictions by identifying relationships between various climatic variables and their correlations. The results demonstrate the effectiveness of the proposed model in forecasting weather conditions for a specified period.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

ARIMA

This paper presents a weather forecasting model that utilizes data mining techniques and the ARIMA model to analyze and predict weather patterns based on time series data. The model aims to improve the accuracy of weather predictions by identifying relationships between various climatic variables and their correlations. The results demonstrate the effectiveness of the proposed model in forecasting weather conditions for a specified period.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Computer Applications (0975 – 8887)

Volume 120 – No.11, June 2015

An Integrated Approach for Weather Forecasting


based on Data Mining and Forecasting Analysis
G.Vamsi Krishna
Research scholar, Department of CSE,
GITAM University

ABSTRACT Stochastic approaches are also projected in the literature [7],


[8]. Genetic Algorithm based approaches together with Neural
Weather prediction is a real time challenging issue witnessed
Networks are also projected. [9], [10], [11]. Models based on
by the world in the last decade. The prediction is becoming
statistical approach combining Genetic Algorithms and
more complex due to the ever changing weather conditions.
ARIMA models are also showcased in the literature [12],[13].
Many models have been discussed for predicting the weather
In the models suggested by the
data assuming the related attributes as independent variables.
researchers[2],[3],[4],[5],[6],[7],[8] effective forecasting
For effective analysis of the weather, it is necessary to
analysis could not be achieved due to the complex data systems
understand various influencing factors that cause the weather
of the weather categorical and continuous patterns of the
changes. It is therefore necessary to identify the relationship
weather, noisy data and high dimensionality of data. Therefore
between these attributes for better understanding of the weather
effective models for predicting the weather are to be analyzed.
data. In this article, a weather prediction model based on the
spatial and temporal dependencies among the climatic variables In this paper a weather forecasting model is built for predicting
together with forecasting analysis. the weather effectively. In this model the weather data in the
form of time series is considered and this data is converted into
Keywords information where effective knowledge is derived by using the
Weather data, forecasting, spatial data, dependent and concepts of data mining techniques. Data mining techniques are
independent variables. used to unearth the hidden patterns and associate a linkage
between the various attributes associated with the weather
1. INTRODUCTION conditions In reality, the meteorological data exhibits a series
Weather forecasting is considered as the most challenging over a period of time and hence, weather prediction can be well
problem witnessed by the world in the last decade. This analyzed by using time series mining. This article is presented
indirectly had an impact on effective prediction of the weather by using the auto regressive integrated moving average
data. Due to the latest technological updates, the capabilities of (ARIMA) model to forecast the future value. In this model,
retrieving and storing has increased; resulting in the availability initially, a mathematical model is generated by considering
of massive meteorology data in different formats. This data is ordered group of data and then, the prediction is carried out by
generated both from the surface observation stations and aerial utilizing the model using the current values and the previous
study stations. With the increase in the number of weather data. Auto correlation models and partial auto correlation
stations, huge amount of data is available on daily, weekly, models are also considered for performing the interventional
monthly and yearly basis and the data is stored exponentially analysis. The rest of the paper is organized as follows section-
[1]. This data is stored and is made available for effective 3 of the paper presents an insight about weather forecasting
analysis of weather prediction, catastrophe forecasting and for using time series and auto regression models. Section-4 of the
the usage by other departments. In the last decade, with the paper deals with a brief methodology of the proposed model.
advancements in science and technology, both empirical Section-5 of the paper deals with the results derived together
approaches and dynamical approaches were developed for the with the conclusions.
prediction of weather. In these models, the analysis of weather
data is carried out using the time series analysis by considering 3. TIME SERIES AND AUTO
few variables, called attributes for the evaluation of the data, CORRELATION MODELS
neglecting its importance. Most of the meteorologists have The time series analysis is a methodology for building effective
made significant strides in forecasting the weather using models by using the values of the variables which are placed at
models based on time series. However, to analyze the related regular intervals [8]. The study of time series data helps to
data from this massive data, mining techniques play a vital role. understand the hidden patterns of the data and helps in better
To have an effective prediction; it is needed to identify the analysis by fitting a model for effective forecasting. Time
correlation between the attributes of weather, which indirectly series Models and forecasting methods are classified into two
have a role in the weather changes. Hence, in this article a groups, Univariate Model based and multivariate based. The
model is proposed for effective weather prediction by way the observations are placed differentiates the models. A
considering various attributes together with their correlations general supposition in the case of time series methods is the
together with data mining techniques. assumption of the data as stationary. Several models such as
least square methodology and linear regression are some of the
2. RELATED WORK methods used in the time series analysis. Here the dependent
A good amount of literature is witnessed basing on Neural and independent variables are considered as numeric data and
Networks approach [2], [3], [4], [5], [6]. However these treated as time series. An autoregressive model generates a
approaches failed to identify the abnormal patterns of the linear regression series based on the current value and the
weather. Models based on, Support Vector Machine and previous available information.

26
International Journal of Computer Applications (0975 – 8887)
Volume 120 – No.11, June 2015

The ARIMA process analyze and forecasts uniformly spaced between various attributes of the weather data is considered
univariate time series data, transfer function data, and and their association is generated. The relationship among these
intercession data using the Autoregressive Integrated Moving- associations helps in effective analysis of the weather. If the
Average (ARIMA) or autoregressive moving-average (ARMA) weather changes are not understood, several impacts such as
model. An ARIMA model predicts a value in a response time coastal erosion, agricultural and human health, damage
series as a linear combination of its own past values, past errors infrastructure, agriculture and land will be at stake. Therefore in
and current and past values of other time series. It is fitted this article the hidden associations among the attributes are
using a random walk model, and the equation for the model is considered based on the Time series model. The initial
given by estimates are identified based on the forecasting model called
auto correlation and the intermediate weather changes are
Yˆ (t )  Y (t  1)   (1)
estimated using moving averages i.e. by using partial auto

where  is the mean


correlation. The data set available from the Indian Meteorology
of the first difference, rearranging department is used for the analysis of the model, from
equation(1), we have http://imdtvm.gov.in

Yˆ (t )  Y (t  1)   (2)
The brief procedure is presented below:
1. Preprocess the data to remove missing values.
The prediction process is carried out by summing the last
period’s value with a constant, this indirectly help to estimate 2. Calculate the regression values and auto regression
the prediction changes on an average at particular intervals of values using ARIMA model.
time. 3. Consider different time lags to model the data.
ARIMA (p,q,r) models, are used to identify the various 4. Using the correlation analysis, correlate the data and
seasonal changes , in general, ARIMA(0,1,0) mode, is used to rank the data according to highest correlation.
estimate the non-seasonal difference and a constant term. In
this paper we have used ARIMA(1,1,0) model is used, since it 5. The data with highest correlation is considered to be
helps for better prediction of weather based on the most likely weather change and it is assumed to be
autocorrelation of previous data ie lag 1 and the general producing destructive effects.
equation used for fitting the model is presented in equation( 3)
 5. RESULTS AND CONCLUSIONS
Y (t )    Y (t  1)   (Y (t  1)  Y (t  2)) (3) In this paper the weather data is considered with attributes,
such as wind pressure, humidity, Minimum and Maximum
Temperature, Forecast and Type, of Visakhapatnam city for a
4. METHODOLOGY period of 97days. The forecasting experiment is carried out to
In order to demonstrate the proposed model a data base is evaluate, the weather condition for the next 15 days by
generated from the meteorological department of India enabling the ARIMA model prediction algorithm model to
pertaining to Visakhapatnam district. This weather data set predict the forecasts. Initially the ARIMA (1, 1,0), model is
includes several attributes such as minimum temperature, considered .These two models are used to predict wind
maximum temperature, wind pressure, humidity, perception, pressure and humidity for the next 15days, the comparison
sunshine, evaporation and category. The category attribute between predicted results and real data is shown in Figure 1
decides the intensity and is categorized as normal, cloudy, and Figure 2
depression, severe depression and cyclonic storm. This
categorization is based on one of the attributes causing the
changes in the weather and wind pressure. The relationship

Table-1 Sample Input Data with Weather parameter

Day Tmin Tmax W F Wind H T C


1 14 31 3 28 10 33 1 td
2 15 31 1 28 12 38 1 td
3 16 30 1 28 11 43 1 td
4 15 31 1 28 11 42 1 td
5 15 31 1 28 10 40 1 td
6 16 31 1 28 8 44 1 td
7 17 31 1 28 8 48 1 td
8 22 31 1 34 8 53 1 td
9 22 32 1 33 4 47 1 td
10 22 32 1 34 9 51 1 td
11 21 32 1 33 8 45 1 td
12 21 31 1 33 10 51 1 td
13 21 32 1 32 13 40 1 td

27
International Journal of Computer Applications (0975 – 8887)
Volume 120 – No.11, June 2015

14 21 32 1 32 6 42 1 td
15 21 33 1 34 7 43 1 td
16 1 10 2 5 240 5 2 five
17 1 10 2 9 50 15 2 zero
18 1 10 2 9 55 15 2 zero
19 1 10 2 8 60 10 2 one
20 1 10 2 8 80 10 3 one
21 1 10 2 8 90 9 3 two
22 1 10 2 8 90 9 3 two
23 1 10 2 8 100 8 4 three
24 1 10 2 8 110 8 4 three
25 1 10 2 6 125 8 5 four
26 1 10 2 6 135 7 5 four
27 1 10 2 6 145 7 5 four
28 1 10 2 6 155 7 5 four
29 31 70 1 62 12.7 33 1 td
30 32 70 1 46 13.8 38 1 td
31 33 70 1 72 10.8 43 1 td

Table-2 Paritial Autocorrelations for Series


Lag Partial Autocorrelation Std. Error

1 .610 .102

2 .264 .102

3 .130 .102

4 .067 .102
Figure 1. Time series Data of the predicted
weather 5 .016 .102
Instinctive investigations show that with the increase of the
prediction step of length two sequences the calculated effect 6 -.027 .102
is getting inferior. The mean absolute percentage error 7 -.026 .102
(MAPE) and mean absolute error (MAE) are used for the
analysis and the results derived are presented in table -3 8 -.023 .102

9 -.009 .102

10 -.010 .102

11 -.002 .102

12 -.008 .102

13 -.373 .102

14 -.049 .102

15 .026 .102

16 .090 .102

Figure- 2. Autoregressive Lag

28
International Journal of Computer Applications (0975 – 8887)
Volume 120 – No.11, June 2015

Table 3. Error Analysis Table of Wind Pressure Inventory Management”, Journal of Computers, vol. 6,
no. 4, April (2011), pp. 784-791.
STEP MAE MAPE
[6] Z. Danping and D. Jin, “The Data Mining of the Human
1 0.21 0.062 Resources Data Warehouse in University Based on
2 0.39 0.067 Association Rule”, Journal of Computers, vol. 6, no. 1,
3 0.65 0.107 (2011) January, pp. 139-146.
4 0.87 0.119 [7] J. Jiang, B. Guo, W. Mo and K. Fan, “Block-Based
5 0.88 0.130 Parallel Intra Prediction Scheme for HEVC”, Journal of
6 1.00 0.145 Multimedia, vol. 7, no. 4, (2012) August, pp. 289-294.
7 1.09 0.176 [8] S.-Y. Yang, C.-M. Chao, P.-Z. Chen and C.-Hao,
8 1.03 0.177 “SunIncremental Mining of Closed Sequential Patterns
in Multiple Data Streams”, Journal of Networks, vol. 6,
9 1.12 0.181 no. 5, (2011) May, pp. 728-735.
10 1.29 0.173
[9] Z. Fu, J. Bai and Q. Wang, “A Novel Dynamic
11 1.16 0.165
Bandwidth Allocation Algorithm with Correction-based
12 1.21 0.208 the Multiple Traffic Prediction in EPON”, Journal of
13 1.30 0.213 Networks, vol. 7, no. 10, (2012) October, pp. 1554-1560.
14 1.23 0.209 [10] Z. Qiu, Z.-W. Lin and Y. Ma, “Research of Hadoop-
15 1.21 0.212 based data flow management system”, The Journal of
16 1.20 0.021 China Universities of Posts and Telecommunications,
vol. 18, (2011) February, pp. 164-168.
[11] J. Cui, T. S. Li and H. X. Lan, “Design and
As shown in the MAE column and MAPE column, as the
Development of the Mass Data Storage Platform
prediction step increases, the prediction error of humidity
Based on Hadoop”, Journal of Computer Research and
and the prediction error of wind increases.
Development, vol. 49, no. 12, (2012) May, pp. 12-18.
In this paper, a methodology for weather forecasting is
[12] P. Sethia and K. Karlapalem, “A multi-agent
presented using the data mining prediction algorithm-
simulation framework on small Hadoop cluster”,
ARIMA The proposal has the capability analyzing and
Engineering Applications of Artificial Intelligence, vol.
weather forecasting
24, no. 7, (2011) May, pp. 1120-1127.
6. REFERENCES [13] H. Yu, J. Wen, H. Wang and L. Jun, “An Improved
[1] Y. W. Dou, L. Lu, X. Liu and Daiping Zhang, Apriori Algorithm Based on the Boolean Matrix and
“Meteorological Data Storage and Management Hadoop”, Procedia Engineering, vol. 15, (2011) July, pp.
System”, Computer Systems & Applications, vol. 20, no. 1827-1831.
7, (2011) July, pp. 116-120.
[14] B. Dong, Q. Zheng and F. Tian, “Optimized approach for
[2] C. Zhang, W.-B. Chen, X. Chen, R. Tiwari, L. Yang storing and accessing small files on cloud storage”,
and G. Warner, “A Multimodal Data Mining Journal of Network and Computer Applications, vol. 35,
Framework for Revealing Common Sources of Spam no. 6, (2012) May, pp. 1847-1862.
Images”, Journal of multimedia, vol. 4, no. 5, (2009)
October, pp. 313-320. [15] G. Mao, “Theory and Algorithm of Data Mining”,
Beijing: Tsinghua University Press, (2007), pp. 121-142.
[3] C. Li, M. Zhang, C. Xing and J. Hu, “Survey and
Review on Key Technologies of Column Oriented [16] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh and D.
Database Systems”, Computer Science, vol. 37, no. 12, A. Wallach, “Bigtable: A distributed storage system for
(2011) February, pp. 1-8. structured data. Proc.of the 7th USENIX Symp.on
Operating Systems Design and Implementation, (2006),
[4] M. Zhang, “Application of Data Mining Technology in pp. 205-218.
Digital Library”, Journal of Computers, vol. 6, no. 4,
(2011) April, pp. 761-768. [17] S. Ghemawat, H. Gobioff and S.-T. Leung, “The
Google File System”, Proc. of the 19th ACM Symp on
[5] C.-W. Shen, H.-C. Lee, C.-C. Chou and C.-C. Cheng, Operating Systems Principles, (2003), pp. 29-43.
“Data Mining the Data Processing Technologies for

IJCATM : www.ijcaonline.org 29

You might also like