0% found this document useful (0 votes)

44 views15 pages

A Comparative Analysis of Deep Neural Networks For Hourly Temperature Forecasting

This paper presents a comparative analysis of deep neural networks (DNNs) for hourly temperature forecasting, focusing on four popular DNNs: simple recurrent network (SRN), gated recurrent unit (GRU), long-short term memory (LSTM), and convolutional neural network (CNN), along with two hybrid models. The study finds that the GRU-LSTM parallel network achieved the lowest RMSE of 1.691°C, while the CNN demonstrated the best computational efficiency. Additionally, robustness analysis across various geographical locations indicates that GRU is the most consistent algorithm for temperature forecasting.

Uploaded by

sajad.siba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views15 pages

A Comparative Analysis of Deep Neural Networks For Hourly Temperature Forecasting

Uploaded by

sajad.siba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received October 12, 2021, accepted November 22, 2021, date of publication November 30, 2021,

date of current version December 10, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3131533

A Comparative Analysis of Deep Neural Networks

for Hourly Temperature Forecasting
EHTASHAMUL HAQUE 1, SANZANA TABASSUM 1, (Student Member, IEEE),
AND EKLAS HOSSAIN 2 , (Senior Member, IEEE)
1 Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur 1704, Bangladesh
2 Department of Electrical Engineering and Renewable Energy, Oregon Renewable Energy Center (OREC), Oregon Tech, Klamath Falls, OR 97601, USA
Corresponding author: Eklas Hossain ([email protected])

ABSTRACT High-resolution temperature forecasting can often prove to be challenging for conventional
machine learning models as temperature is highly seasonal and varies with the time of the year as well as
with passing hours of the day. In most cases, only the daily extremes or mean temperatures are provided
by temperature forecasting methods. However, with the growing availability of data and the development
of deep neural networks (DNNs) capable of detecting complex relationships, high-resolution temperature
forecasting is becoming easier. Typically, historical temperature data along with multiple meteorological
sensor data is used for temperature forecasting which increases the complexity of the system making it
harder and costlier to implement physically. In this paper, high-resolution hourly temperature forecasting
is performed using only historical temperature data. The paper presents a comparative analysis among
four popular DNNs- simple recurrent neural network (SRN), gated recurrent unit (GRU), long-short term
memory (LSTM), convolutional neural network (CNN), and two hybrid models- CNN-LSTM parallel
network and GRU-LSTM parallel network trained on Beijing temperature dataset. Experimental results
showed GRU-LSTM parallel network obtained the lowest RMSE (1.691◦ C) whereas CNN has the best
computational efficiency obtaining a slightly worse RMSE (1.759◦ C). Additionally, a robustness analysis is
performed on temperature data from four additional geographically diverse locations (Toronto, Las Vegas,
Seattle, and Dallas) which reveals GRU to be the most consistent algorithm. Finally, the paper establishes
a correlation between the model performance and the dataset based on their variance and mean absolute
deviation with reference to the training dataset.

INDEX TERMS Deep neural network, CNN, LSTM, CNN-LSTM parallel, temperature forecasting, GRU,
RNN, GRU-LSTM parallel, robustness.

I. INTRODUCTION however, with the growing availability of data, higher reso-

Temperature forecasting is one of the most consistent areas of lution temperature predictions can be made which will aid
research owning to its direct impact on utility demand, living utilities with the scheduling, supply operation and preparation
conditions, agriculture, and various industries. Temperature for sudden load change to a great extent. In this regard, hourly
has a high correlation to electric load demand in particu- forecast of temperature is an important feature that can further
lar and therefore, temperature forecast is a prerequisite for improve the prediction horizon of many other applications.
many load forecasting schemes. These forecasts are usually To better schedule the generation scheme and avoid
provided by weathers stations in many countries, but often under-generation or overgeneration, many utilities require
only predict the daily extremes (maximum and minimum) hourly temperature data for short-term load forecasting
or average temperatures. Moreover, it does not specify what (STLF) [1]. A significant percentage of electric demand
time of the day this maximum or minimum will occur. The comes from heating, ventilation and air conditioning (HVAC)
extreme temperatures only help to predict the peak load; which consumes more than 40% of a building’s power on
average [2]. HVAC is highly temperature-dependent, and
The associate editor coordinating the review of this manuscript and STLF such as 1 hour ahead (h ahead), 2h ahead and 3h ahead
approving it for publication was Grigore Stamatescu . can be crucial for preparing the HVAC systems to adapt to

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
160646 VOLUME 9, 2021
E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

the change of load and enhance the operational safety of the conduction equation that assessed parameters such as heat
electric network. capacity, conductivity, current temperature, surface albedo,
Zhao and Liu [3] proposed a hybrid PLS-SVM model that solar irradiance, net longwave irradiance, ground conduc-
takes into account meteorological parameters and historical tive heat flux density, sensible and latent heat flux densi-
data to perform up to 3h ahead and 24h ahead load forecast- ties to derive the road surface temperature. Physics-based
ing to optimize HVAC operations. The authors showed that models require sensor measurements from multiple sources
accuracy of the hourly temperature forecast directly affects to compute the temperature; moreover, these values vary
the proposed model. Higher resolution such as 1h ahead, significantly across different locations. These models tend to
2h ahead and 3h ahead temperature forecasts yield higher work better for daily temperature forecasting rather than short
accuracy for the load forecasting model compared to using horizon predictions.
daily extreme temperature forecasts. Hourly temperature data
is also required to analyze test reference years (TRYs) and B. STATISTICAL MODELS
design summer years (DSYs) for energy use, to calculate Mathematical models started gaining momentum around the
plant sizing, and to simulate building performance during hot 1990s. Since the temperature forecasts at that time only pro-
summers [4]. vided maximum and minimum temperature without speci-
Shao and Lister [5] proposed a model which predicts fying what time of the day it will occur, the hourly electric
the hourly road surface temperature and state (wet/ice/dry) load curve had to be generated through interpolation of the
using meteorological data from seven countries. This model two extremes. Data-driven weather forecasting models are
is a short-term model that predicts up to 3h ahead which built using different statistical and machine learning algo-
integrates an hourly temperature forecasting scheme as a rithms. Such models can significantly decrease the setup
prerequisite feature for the next stage of the proposed fore- cost by trading off more historical data for additional sensor
casting model. A similar study by Bogren and Gustavsson [6] data. However, these models may require extensive historical
used hourly air temperature forecast to predict the road sur- data to yield good accuracy. Recently, with the increased
face temperature. In agriculture, Kim et al. [7] used hourly availability of precise data, data-driven models for weather
air temperature forecasts to estimate the duration of leaf forecasting have gained popularity and are actively being
hydration retainability. Hourly temperature can even affect studied. Statistical models such as, autoregressive integrated
biological parameters, such as the mortality burden of hourly moving average (ARIMA) use time-series analysis to predict
temperature variability which was studied extensively [8]. long-term change in data like daily and monthly time hori-
Another significant application of hourly temperature fore- zons [9]. ARIMA is one of the most common linear statistical
casting is in photovoltaic (PV) generation. For seamless grid techniques and a form of regression analysis used in time
integration, predicting hourly fluctuations in PV generation series forecasting. The auto-regressive component of ARIMA
is crucial. Since the output of a PV system is a function of regresses some of the lagged data, then integration is per-
temperature, hourly temperature forecasts are a prerequisite formed to make the data stationary, and the moving-average
in the solar industry. incorporates preceding error terms from a moving average
So, there are a plethora of applications for hourly tem- model applied to lagged observations. One of the biggest
perature forecasting. After addressing the necessity of high drawbacks of ARIMA is that it is negatively affected by sea-
resolution hourly forecasts, the discussion proceeds to assess sonality, and temperature is a highly seasonal dataset. If sta-
the hourly forecast techniques that have been used so far as tionarity is not confirmed in a trend, computation throughout
well as the state-of-the-art regarding this topic. the whole process might not be accurate [10]. So ARIMAs
are not the best choice for temperature forecasting.
II. TEMPERATURE FORECASTING METHODS
Weather forecasting mainly takes one of three routes- tra- C. NEURAL NETWORK MODELS
ditional physics-based, statistical and NN or DNN models. In recent times, NN models have become increasingly pop-
This section briefly explores the different techniques, their ular specially for short-term predictions such as hourly
advantages and drawbacks. and daily time horizons compared to long-term predictions
achieved through statistical models. Existing research mostly
A. PHYSICS BASED MODELS focus on temperature forecasting using consistent time unit
Physics-based weather forecasting is the traditional method data where both the input and target data are of the same time
and is still used by a number of public weather forecast unit, for example, using daily input data to forecast day-ahead
providers. These methods mainly take into account physi- temperature. However, with the increased availability of high-
cal parameters like solar irradiance, wind speed, humidity, resolution data, and continued development of processing
precipitation, cloud covers, etc. and use theoretical formulae units, it is now possible to predict a time frame of different
to calculate the future temperature. Zhao and Liu [3] pre- duration compared to the input. Both hourly and daily pat-
sented a purely physics-based temperature forecasting model terns can be employed to forecast daily temperatures, but as
to determine the temperature which is a prerequisite for the the data is abundant and detailed, it is essential to process
load forecasting part of their study. The study used a heat them efficiently and accurately. With the correct models,

VOLUME 9, 2021 160647

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

TABLE 1. Literature on temperature forecasting using statistical and neural network models.

hourly temperature data can even be used to predict the hourly predicts hourly horizon using DNN, achieving an hourly
temperature of the next day to a limit before the errors become average RMSE value of 2.10 using their proposed convLSTM
too significant. In this context, NN models have powerful model tested on a temperature dataset of Germany. However,
versatility to process large amounts of more detailed data, it uses five meteorological parameters as input. This not only
which this paper aims to present. increases computation cost, but requires expensive sensor
Existing research on temperature forecasting using statisti- data as well [18]. Univariate regression using NNs can mit-
cal and NNs are tabulated in Table 1. It can be observed that, igate this drawback. In addition, temperature patterns differ
earlier versions of temperature forecasting use different sta- significantly based on geographical location, so it will be
tistical models such as MLP, ARIMA or modified ARIMAs. interesting to observe how DNNs trained on a local tempera-
Some of these papers include hourly temperature forecast- ture pattern performs on a different region. It is apparent that
ing as the prerequisite of a load forecasting model [12]. a study comparing the performance of the most recent DNNs
More recent works started adopting NNs and DNNs that for hourly temperature forecasting, taking into account spatial
yield higher accuracy compared to statistical models, which diversity (local and geographically diverse) and robustness is
is discussed in [18]. However, most of these papers use yet to be explored.
NNs to predict daily extremes and average [19]. To the This study intends to address the existing research gap and
best of the authors knowledge, only one forecasting model make the following significant contributions:

160648 VOLUME 9, 2021

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

• Comparative analysis for hourly temperature forecasting

using four of the most popular DNNs (SRN, LSTM,
GRU, CNN) and two hybrid DNNs (CNN-LSTM par-
allel, GRU-LSTM parallel), with univariate time series
data.
• Comparison of hour-by-hour prediction and single run
prediction. In addition, explore the effect of normaliza-
tion of input data.
• A robustness analysis to check if the DNNs perform
similarly for four different regions and input patterns, FIGURE 1. Structure of a simple RNN (SRN) cell.
thus grading their ability to generalize.
• Correlation between model performance and different time, multiple modified versions of RNN have been pro-
input patterns based on variance and mean absolute posed, some of which have become very popular such as
deviation (MAD) of the dataset. LSTM and GRU.
The outcome of this study will be especially helpful
to determine which DNN might perform best for applica- B. LONG SHORT TERM MEMORY (LSTM)
tions that require hourly temperature forecasting, particularly LSTM is a modified version of RNN first proposed by
load forecasting along with other applications mentioned in Hochreiter and Schmidhuber [25] which was proposed to
Section I. The rest of the paper is organized as follows- mitigate the vanishing gradient problem of SRNs. LSTM
Section III gives a mathematical and illustrated overview can store previous data in its memory unit and add/discard
of the DNNs considered in this study. Section IV breaks information during the learning process. LSTM has proven to
down the methodology and implementation of the models. be very effective for sequential data such as signals, protein
Section V presents the outcome of the comparative analy- patterns, text data, time series forecasting etc. Instead of the
sis and Section VI discusses the robustness analysis of the single hyperbolic tangent layer in the recurrent unit of SRN,
models along with its correlation to different parameters of LSTM has four layers. The basic components of an LSTM
a dataset. Finally, Section VII concludes the paper with an unit are- a memory cell and three gating units- input gate
indication of future scopes. (it ), output gate (ot ) and forget gate (ft ) which are shared
by all cells in the block. In total, there are three inputs and
III. FORECASTING ARCHITECTURE two outputs. Each layer receives an input xt , previous hidden
A. SIMPLE RECURRENT NEURAL NETWORK (SRN) layer state ht−1 and previous cell state ct−1 . The hidden layer
Conventional feed-forward NNs are ineffective for prediction derives a hidden state vector ht and the output cell state ct .
using sequential data because it assumes all the units of input The purpose of the input gate is to determine if a cell ct
vector to be independent of time [21]. RNNs differ from should be updated by xt or not, ft decides if the previous cell
conventional feed-forward NNs as they are sequence-based ct−1 should be forgotten, and the output of ht depends on
models that allow the learning of time-based dependencies. ot to control which part of ct should be used. An activation
RNNs have the ability to create temporal correlation from function normalizes the state of the gates, 0 indicating no
past data with the present state [22]. RNN allows the signal to information flow and 1 indicating full flow of information
move forward and backward, and can make a loop in the NN. through the gate. The basic structure of an LSTM unit is
Thus RNNs work specially well on sequential data where the illustrated in Figure 2.
decision made at the previous time step (t − 1) is preserved The nodal outputs of a LSTM network are computed as
and utilized on the decision made at the current time step t. follows [26]:
SRN is the simplest form of RNN that takes two inputs-
it = σ (Wi · [ht−1 , xt ] + bi ) (2)
current state xt at time t and previous hidden state ht−1 , and
ft = σ Wf · ht−1, xt + bf

updates the values by a non-linear activation function. The (3)

recurrent unit has a single hyperbolic tangent (tanh) layer. C̃t = tanh Wc · ht−1, xt + bc (4)
The repeating module of SRN [23] can be expressed by the
Ct = ft ∗ Ct−1 + it ∗ C̃t (5)
following equation-
ot = σ (Wo · [ht−1 , xt ]) + b0 (6)
ht = tanh (wc · [ht−1 , xt ] + b) (1) ht = ot ∗ tanh (ct ) (7)

where ht is the hidden neuron at time t, ot is the output vector where input variable at time step t is denoted by xt . ct and ht
and b is the bias value. are cell state and hidden state respectively. c̃t is referred to
Figure 1 illustrates a basic SRN unit. The main drawback as the candidate cell calculated in Eq.4 whose output through
of SRN is that it sometimes fails to converge to the optimum the tanh function has a value between -1 and 1. Wf , Wi , Wc ,
minima due to its vanishing gradient problem that might Wo denote different weight matrices for input vectors. The
arise during back propagation [24]. So over the course of σ represents the sigmoid activation function and ∗ symbol

VOLUME 9, 2021 160649

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

rt = σ (Wr · [ht−1 , xt ] + br ) (9)

h̃t = tanh (Wh · [r ∗ ht−1 , xt ] + bh ) (10)
ht = (1 − zt ) ∗ ht−1 + zt ∗ ht (11)
where h˜t and ht are the candidate activation and hidden state
at time t, respectively. Wz , Wr , Wh are the weight matrices
of update gates, reset gates and hidden states respectively.
The ‘‘*’’ is used to express element-wise multiplication and
σ is the sigmoid activation function. GRU is an updated
version of LSTM that has two gating units that hold the
flow of information but it does not have a separate memory
cell. As LSTM contains 12 parameters for each separate
unit, a fully connected LSTM layer becomes computationally
FIGURE 2. Structure of a basic long-short term memory (LSTM) cell. costly to implement, thus GRUs improve the computational
efficiency by combining two LSTM gates (the input and
denotes element-wise multiplication operation. Lastly, bi , bc , forget gates) into a single update gate [22] which might
bf and b0 refer to the bias values of the it , ct , ft and ot respec- compromise performance a little, but its improved training
tively. C̃t stores the state information and is updated by Eq. 7. time makes GRU faster than LSTM.
Eq. 4 and 7 uses a hyperbolic tangent operator to calculate the
memory cell and ot . This enables the LSTM network to retain D. CONVOLUTIONAL NEURAL NETWORK (CNN)
the useful information across different timescales. LSTM was CNN has become a standard, go-to model for computer
modified to avoid the vanishing gradient problem by allowing vision and image classification applications. CNN models are
gradients to flow unchanged. However, LSTM networks are capable of filtering and extracting complex patterns and fea-
still vulnerable to the exploding gradient problem [27]. tures from massive visual datasets (with ground-truth labels).
It works by automatically learning a large number of filters
C. GATED RECURRENT UNIT (GRU) in parallel specific to a training dataset and repeatedly apply-
Another popular modification of the RNN is the GRU pro- ing the same filter to an input which results in activations
posed by Cho et al. [28] with an aim to make the recur- known as a feature map. The fundamental concept utilizes
rent units adapt and capture the dependencies of different the mathematical operator called convolution to transform
timescales and sequences. The updated mechanism allows the two functions into a single function. Convolution can be
GRU to capture long-term dependencies. A GRU unit encom- performed on two functions at a time, but CNN is used up
passes two gates, the reset gate rt and the update gate zt . to 4D spatio-temporal processing [29].
The update gate is similar to the forget gate and input gate

FIGURE 4. Structure of a general convolutional neural network.

However, there was uncertainty regarding how CNN will

perform on 1D time series data, specially when the dataset is
not sufficient [30]. In the particular case of a 1D convolutional
layer, 1D pooling layers are used to create CNNs for signal
analysis as well as time series analysis. The internal struc-
FIGURE 3. Structure of a basic gated recurrent unit (GRU) cell. ture of CNN encompasses three layers- convolutional layer,
in LSTM as it controls storing or erasing potential features dense layer and pooling layer. The convolution layers perform
from the previous state that can be useful later. Meanwhile, convolution operation with the help of linear activation to
the reset gate controls the amount of information that should extract the local features. The forward and back propagation
be discarded. The reset gate mechanism helps the efficiency are detailed in the following equations [30]:
of GRU model capacity by allowing it to reset features that Nl−1
X
ik , si )
conv1D(wl−1 l−1
are detected to no longer be useful. The basic unit of a GRU xkl = blk + (12)
is illustrated in Figure 3. i=1
The equations for the input and output of a GRU model are:
where wl−1 th
ik denotes the kernel between i neuron at layer l-1
zt = σ (Wz · [ht−1 , xt ] + bz ) (8) l−1
and the k neuron at layer l. si is the output from the ith
th

160650 VOLUME 9, 2021

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

neuron at layer l − 1. xkl and blk are the input and the bias of of CNN-LSTM parallel network considered in this study is
the k th neuron at layer l, respectively. In order to perform 1D shown in Figure 5.
convolution without zero padding, the conv1D(., .) function
was used. This implies that the dimension of sl−1 i (output F. GRU-LSTM PARALLEL NETWORK
arrays) are higher than the dimension of xkl (input arrays). The GRU-LSTM hybrid models have previously been proposed
intermediate output ylk is obtained by applying an activation for series configuration. To the best of our knowledge, we are
function f (.) on the input xkl using the following equation: the first to implement a GRU-LSTM parallel network for time
series prediction. The series configuration was also trained,
ykl = f (xlk ) and skl = ykl ↓ ss (13)
but the parallel GRU-LSTM yielded better results which is
where ↓ ss denotes a down-sampling operation with a scalar why it is considered for this study. The concept is similar
factor, ss [30]. Down-sampling of the feature map is per- to that of CNN-LSTM; in order to avoid the output of one
formed in this layer which reduces several values into one network adding any bias to the output of another, the series
value keeping the integrity of the input data unchanged [19]. configuration was replaced with a parallel network where
The last layer is the dense layer which receives the flattened each DNN has separate paths for training the data. GRU and
data of the pooling stage and makes it a 1D output sequence. LSTM have a similar working mechanism, with GRU being
An attractive feature of 1D CNN is that low-cost hardware a little faster than LSTM as it has two gates where LSTM
implementation is possible as 1D CNNs only perform 1D has three. Combining the two models have shown promising
convolutions, which is basically additions and scalar mul- results.
tiplications. A basic internal structure of CNN is shown in
Figure 4.

E. CNN-LSTM PARALLEL NETWORK

Hybrid CNN-LSTM networks are often configured in series,
where CNN is used to extract features from the input data and
subsequently, the output of the CNN is fed into the LSTM as
an input. Combining CNN and LSTM can make use of their FIGURE 6. Model structure of GRU-LSTM parallel network for
complementary characteristics such as, CNN being used for temperature forecasting.
feature extraction that expresses spatial locality and LSTM Similar to CNN-LSTM parallel network, it has to be
being implemented for time series data analysis for tempo- ensured that the vector outputs of the two separate NN paths
ral feature detection. However, an obvious query to series are of the same size before summing them. The combined
CNN-LSTM configurations is, to what extent the accuracy output enters three dense layers to prepare the data for pre-
of the CNN model affects the training of the LSTM model. diction. The GRU-LSTM parallel network considered in our
To avoid this confusion completely, CNN-LSTM parallel study is illustrated in Figure 6.
networks can be used where each NN will have its own path
without intersecting or affecting each other [31]. The LSTM IV. METHODOLOGY
follows a conventional path and outputs a 1D array. For the Two different approaches were taken using the six DNNs to
CNN path, the convolution layer and pooling layer outputs a perform regression-
2D array.
1) Considering each hour of the prediction horizon (6h)
as an individual regression problem (hour-by-hour
prediction).
2) Considering the total prediction horizon (6h) as a single
regression problem (full prediction in single run), and
Both approaches are used to predict up to 6h ahead hourly
temperature. All six DNN models are evaluated using both
approaches. Additionally, the models are trained on data with
normalization and compared to the same models trained on
data without normalization to see how normalization affects
FIGURE 5. Model structure of CNN-LSTM parallel network for DNNs for univariate time series data. Finally, datasets from
temperature forecasting. four other regions are used to perform a robustness analysis of
However, it has to be ensured that the vector output of the DNN models. To increase the resolution of the data, a hopping
two paths are of the same dimension before they are added. window of hop size equal to 1h is used to divide the data
A flatten layer stacks the 2D output of the pooling layer into into overlapping blocks of 30h each, for both train and test
a 1D array and the dense layer ensures equal number of ele- sets. From each of these blocks, the first 24h is taken as the
ments from both the pathways. One significant drawback of input sequence and the remaining 6h is taken as the output
this network is the increased computational cost. The model sequence. This approach is followed for the full prediction

VOLUME 9, 2021 160651

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

TABLE 2. Model based hyperparameters of the considered DNNs.

FIGURE 7. The train-test split considered for the Beijing hourly

temperature dataset.

in a single run. For the hour-by-hour case, the 6h prediction

horizon is considered as six individual regression problems,
while the 24h input is kept the same. The models are trained
on both normalized data and data without normalization to
compare the raw performance of the DNNs, and also observe
how normalization affects the performance of the models.

A. DATA COLLECTION
The temperature data is collected from a dataset uploaded
by Zhang S. et al. titled ‘‘Cautionary Tales on Air-Quality
Improvement in Beijing’’ [32]. The original data contained
various air quality readings from twelve nationally controlled error. A common practice is to use rule-of-thumb param-
monitoring sites. From the whole dataset, the Aoti Zhongxin eters or combinations that have previously performed well
area is taken for its relatively low number of missing values. for other papers. However, we have carefully chosen all the
The Aoti Zhongxin is considered in this study to represent hyperparameters after manually testing from a wide range
overall Beijing temperature because of the low variation in of values. A validation run is conducted for each model
readings from other centers. to decide the hyperparameters for best performance and
The dataset consisted of hourly temperature data from fitting before training the final models. The train set is
2013-03-01 00:00:00 to 2017-02-28 23:00:00 giving us a split 90-10 for the validation run. The layer-based hyper-
total of 35064 hourly readings. The dataset is at first sorted parameters determined from this run, are provided in the
according to datetime. There were 20 missing temperature Table 2.
values and because of the relatively small size of the miss- General parameters such as optimizer, learning rate and the
ing data, it is filled using the forward fill method instead number of epochs are also important to improve the over-
of other complex imputation methods. Then maintaining all performance and speed of the models. Commonly used
the order, first 90% of the data is selected for training optimizers include root mean square propagation (RMSprop),
from 2013-03-01 00:00:00 to 2016-10-04 18:00:00 and the stochastic gradient descent (SGD), the adaptive gradient algo-
remaining is taken for testing from 2016-10-04 19:00:00 to rithm (AdaGrad), and adaptive moment estimation (Adam).
2017-02-28 23:00:00. The train-test split can be visualized In this paper, after the validation run, the Adam opti-
from Figure 7. mizer is chosen which is computationally efficient and
The dataset for the robustness analysis titled ‘‘Histori- showed slightly better results during testing. The batch size
cal Hourly Weather Data 2012-2017’’ [33] contains 5 years of all the models is taken as 64 and the loss functions
of high resolution (hourly measurements) temporal data of considered are- mean square error (MSE), cosine similar-
various weather attributes from January 2012, 12:00:00 to ity (for full time single run) and MSE for hour-by-hour
December 2017, 00:00:00, out of which the temperature data prediction.
is extracted. This data is available for 30 US and Canadian
cities. Toronto, Seattle, Dallas and Las Vegas were chosen V. RESULT ANALYSIS
for the robustness analysis because of their considerably scat- A. FORECASTING OUTCOMES
tered geographical locations so that the temporal data vary as The trained models were used to predict hourly temperatures
much as possible. up to 6h ahead. The prediction is carried out for hour-by-
hour basis as well as the whole time horizon in a single
B. MODEL CONSTRUCTION AND HYPERPARAMETER run. The training and testing period has been mentioned
TUNING in section IV-A. It is observed that the models trained on
Hyperparameter tuning is an important part of NN con- unnormalized data perform better than models trained on
struction, which is usually done through extensive trial and normalized data, and so only the prediction graphs of models

160652 VOLUME 9, 2021

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

FIGURE 8. Workflow of modeling the considered deep neural networks.

trained on data without normalization are included in the

paper (Figure 9 to Figure 20). The error metrics values are
tabulated and RMSE is plotted for both with and without
normalization.

B. EVALUATION METRICS
The performance of the DNNs are evaluated in terms of
three error metrics. The error metrics taken into account
are the conventional root mean squared error (RMSE)
and mean average error (MAE) and additionally, the
coefficient of determination R2 . The mathematical expres-
sions of the above error metrics are given as
follows: FIGURE 9. Curve of actual temperature and predicted results for
hour-by-hour prediction using SRN.
v
u n
u1 X 2
RMSE = t Ft − At (14)
n
i=1
n
1 X
MAE = | Ft − At | (15)
n
i=1
Pn 2
i=1 Ft − At
R2 = 1 − 2 (16)
Pn
i=1 Ft − Āt

where n is the number of data in forecasted temperature,

Ft is the forecasted hourly temperature and At is the actual
temperature at instant i. For R2 , Āt is the mean value of FIGURE 10. Curve of actual temperature and predicted results single run
the observations. The R2 value indicates how good a model prediction using SRN.

fits the dataset. The maximum value of R2 is 1, where val-

ues closer to 1 indicate higher prediction accuracy. RMSE C. PERFORMANCE ASSESSMENT BASED ON EVALUATION
puts more emphasis on higher errors compared to the lower METRICS
ones. Lower values of RMSE and MAE indicate better The performance of the DNNs is assessed mainly based
performance. on their RMSE values. The effect of normalization is also

VOLUME 9, 2021 160653

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

FIGURE 14. Curve of actual temperature and predicted results for single
FIGURE 11. Curve of actual temperature and predicted results for
run prediction using GRU.
hour-by-hour prediction using LSTM.

FIGURE 12. Curve of actual temperature and predicted results for single FIGURE 15. Curve of actual temperature and predicted results for
run prediction using LSTM. hour-by-hour prediction using CNN.

FIGURE 13. Curve of actual temperature and predicted results for

hour-by-hour prediction using GRU. FIGURE 16. Curve of actual temperature and predicted results for single
run prediction using CNN.

observed. The error metrics values for models trained on

unnormalized data and models trained on normalized are GRU. This also reflects the previously mentioned claim that
tabulated in Table 3 and Table 4, respectively. The following LSTM is more suitable for detecting long-term dependencies
observations can be extracted from the results: rather than high resolution short-term outputs. CNN per-
formed similar to LSTM in both hour-by-hour (1.77) and
1) OVERVIEW OF MODEL PERFORMANCE full-time prediction (1.76) cases.
It can be observed from Figure 21 that SRN had the highest
RMSE, followed by GRU and LSTM, which is expected. 2) RNNs VS CNN FOR TIME SERIES FORECASTING
LSTM is the modified version of SRN, and despite GRU In general, RNNs (SRN, LSTM, GRU) are known to work
being proposed after LSTM, its main purpose is to reduce better on text classification whereas CNN is the standard
computational cost while retaining accuracy as much as for image classification. According to literature, RNNs work
possible. Thus, SRN (1.79 for hour-by-hour, 1.88 for full well with sequential data which makes it ideal for predicting
time) and GRU (1.81, 1.79) showed the poorest performance. values in a sequence (such as time series) while CNN is
LSTM (1.77, 1.77) performed slightly better than SRN and excellent for feature extractions. However, it can be observed

160654 VOLUME 9, 2021

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

FIGURE 17. Curve of actual temperature and predicted results for

hour-by-hour prediction using CNN-LSTM parallel network. FIGURE 19. Curve of actual temperature and predicted results for
hour-by-hour prediction using GRU-LSTM parallel network.

FIGURE 18. Curve of actual temperature and predicted results for single
run prediction using CNN-LSTM parallel network.
FIGURE 20. Curve of actual temperature and predicted results for single
run prediction using GRU-LSTM parallel network.
from Table 3 and Table 4 that they perform similarly on
univariate time series predictions. A deciding argument in the case of Figure 22, almost every model including SRN,
this regard can be the computation time. CNNs have a huge LSTM, GRU-LSTM and CNN-LSTM performed inconsis-
advantage of being very fast compared to RNNs. In our study, tently. Although normalized data are expected to yield good
the CNN model ran 5 times faster than LSTM, 4 times faster results on time series forecasting using DNNs, it performed
than GRU and twice as fast as SRN. poorly on temperature data.

3) SINGULAR MODELS VS HYBRID MODELS 5) COMPARISON WITH EXISTING WORKS

An interesting case is observed for the hybrid models In Section II, only one paper was found to predict hourly
CNN-LSTM parallel and GRU-LSTM parallel network. Both temperature using DNN. They have proposed a convLSTM
models outperformed single models for single-run predic- model which achieved an hourly average RMSE of 2.1◦ C on
tions, yet both showed the worst performance for hour- a temperature dataset of Germany. Although all the models
by-hour predictions. GRU-LSTM exhibited the best RMSE included in this paper have achieved better RMSE (<2.1◦ C)
(1.69) for single run, but the second worst hour-by-hour for single run prediction, our work cannot be conclusively
RMSE (1.9) out of all the DNNs. Similarly, CNN-LSTM compared as the works are based on two different datasets.
yielded the second best RMSE (1.74) for single run, and the To summarize this section, the following conclusions were
worst RMSE (2.2) for hour-by-hour prediction. The highly reached:
inconsistent performance for the hybrid models, in hour-by- 1) GRU-LSTM parallel network shows superior perfor-
hour prediction can be observed in the form of random spikes mance out of all six models (for full time single run
in Figure 21 and Figure 22. However, it should be noted that prediction, with/without normalization).
the hybrid models are computationally more expensive than 2) All DNNs perform better on hourly temperature
single models. data without normalization compared to data that is
normalized.
4) EFFECT OF NORMALIZATION 3) Full time single run predictions are preferable to hour-
It is evident from Figure 21 and Figure 22 that the mod- by-hour predictions, as hour-by-hour exhibited random
els trained with normalized data performed worse than the spikes. Not to mention the obvious drawback, the hour-
models trained without normalization. In Figure 21, only by-hour run requires 6 times more computational cost
CNN-LSTM parallel network showed random spikes, but in than single runs.

VOLUME 9, 2021 160655

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

TABLE 3. Evaluation metrics of the considered DNNs trained on Beijing data without normalization.

TABLE 4. Evaluation metrics of the considered DNNs trained on Beijing data with normalization.

160656 VOLUME 9, 2021

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

FIGURE 21. RMSE statistics for considered models trained on Beijing dataset without normalization.

FIGURE 22. RMSE statistics for considered models trained on Beijing dataset with normalization.

4) In terms of computational cost, CNN is much faster 1 March 2013, 00:00:00 to 28 February 2017, 23:00:00 (same
than any other model while sustaining good perfor- as Beijing dataset). The result obtained from the predic-
mance. tions are summarized in Table 5. The models were run with
both normalization and without normalization, also hour-by-
VI. ROBUSTNESS ANALYSIS hour and single-run approaches. Similar to the previous case,
In section V, a conclusion is drawn from the performance models without normalization in a single run yielded better
of the models by testing them on the same dataset as they results, so the discussion will be limited to this. To grasp the
trained on. In this section, the robustness of the models are changes easier, the comparative RMSE of the DNN models
analyzed by testing the models on new datasets from different is illustrated in Figure 23.
geographical locations having uncorrelated climatic charac- Figure 23 depicts that all the models performed satisfacto-
teristics. The robustness is a model’s ability to generalize rily on untrained, unrelated datasets from different locations.
trends and output satisfactory performance on different or The RMSE of all the models did increase, but the increase is
altered datasets. The previous three error metrics are com- comparatively low, indicating a model’s robustness and relia-
pared among different DNNs to assess their robustness in bility. From Table 5, it can be observed that GRU has achieved
each location. the lowest average RMSE (2.0042◦ C), which indicates that
Four cities from different geographical locations were GRU is the most robust DNN.
chosen for the robustness analysis, discussed in section IV. To draw a correlation between a model’s performance and
The time period considered for the prediction is from different types of temperature datasets from different regions,

VOLUME 9, 2021 160657

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

TABLE 5. Robustness analysis of considered DNNs evaluated on four different geographical locations.

TABLE 6. Variance and MAD of the considered geographical locations.

FIGURE 23. RMSE statistics for models evaluated on four different

geographical locations.

various parameters were initially considered, such as distri-

bution plot, autocorrelation function (ACF), partial autocor-
relation function (PACF), variance, mean absolute deviation
(MAD), etc. These parameters did not have any apparent
correlation, except the variance and MAD which showed a
negative correlation with model performance. Finally, it is
observed that the model performance on different datasets is
best explained by the product of the variance and MAD.
Table 6 lists the MAD and variance values of the datasets. FIGURE 24. Correlation between model performance and MAD*variance.
It indicates that the RMSE value has a positive correlation
with the variance and the MAD value of a particular dataset. the four regions. So, the best fit for correlating the model
The MAD value is calculated keeping the Beijing data as a performance with the type of regional temperature dataset
point of reference. Initially, only the variance was considered are considered as the product of the variance and MAD
to draw a correlation. Higher variance in the data caused the values. The RMSE of all six models are plotted against the
models to perform poorly. However, Toronto is an exception MAD∗variance in Figure 24.
where the models performed better despite encountering a Figure 24 illustrates that LSTM, GRU, CNN, CNN-LSTM,
very high variance. This can be explained by the second GRU-LSTM all perform worse as the MAD*variance
parameter, MAD. Toronto has the lowest MAD value among increases indicated by the upward trend of the RMSE values

160658 VOLUME 9, 2021

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

(except SRN which produced an outlier). Another important [3] J. Zhao and X. Liu, ‘‘A hybrid method of dynamic cooling and heating
point to note is that, for Seattle, all the models have yielded load forecasting for office buildings based on artificial intelligence and
regression analysis,’’ Energy Buildings, vol. 174, pp. 293–308, Sep. 2018.
a lower RMSE value compared to Beijing, as it has a lower [4] D. H. C. Chow and G. J. Levermore, ‘‘New algorithm for generating hourly
variance. This implies that the models are able to achieve a temperature values using daily maximum, minimum and average values
degree of generality. On the other hand, the specificity of from climate models,’’ Building Services Eng. Res. Technol., vol. 28, no. 3,
pp. 237–248, Aug. 2007.
the models can be understood from the positive correlation [5] J. Shao and P. J. Lister, ‘‘An automated nowcasting model of road surface
of RMSE to the MAD value. This opens the scope of using temperature and state for winter road maintenance,’’ J. Appl. Meteorol.,
transfer learning for datasets that have little correlation to the vol. 35, no. 8, pp. 1352–1361, Aug. 1996.
[6] J. Bogren and T. Gustavsson, ‘‘Site specific road surface temperature
dataset models were trained on. forecast improvements by use of radiation measurements,’’ in Proc. 11th
SIRWEC Conf., 2002, pp. 1–5.
VII. CONCLUSION [7] K. S. Kim, S. E. Taylor, M. L. Gleason, and K. J. Koehler, ‘‘Model to
enhance site-specific estimation of leaf wetness duration,’’ Plant Disease,
This study has carried out a comparative analysis on six vol. 86, no. 2, pp. 179–185, Feb. 2002.
DNN models to observe which performs the best for [8] J. Cheng, Z. Xu, H. Bambrick, H. Su, S. Tong, and W. Hu, ‘‘The mor-
high-resolution hourly temperature forecasting on Beijing tality burden of hourly temperature variability in five capital cities, Aus-
tralia: Time-series and meta-regression analysis,’’ Environ. Int., vol. 109,
temperature data. The study has also presented an in-depth pp. 10–19, Dec. 2017.
robustness analysis to see the change in performance param- [9] G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, ‘‘Predictability
eters of these DNNs when tested on a geographically diverse of monthly temperature and precipitation using automatic time series fore-
dataset. The comparative analysis has revealed GRU-LSTM casting methods,’’ Acta Geophys., vol. 66, no. 4, pp. 807–831, Aug. 2018.
[10] J. Kihoro, R. Otieno, and C. Wafula, ‘‘Seasonal time series forecasting: A
parallel network to provide the best performance when tested comparative study of ARIMA and ANN models,’’ Meru Univ., Nairobi,
on the Beijing data at 1.691◦ C RMSE. CNN on the other hand Kenya, Tech. Rep., 2004.
performs slightly worse at 1.759 ◦ C RMSE ranking 3rd in [11] A. Khotanzad, R. Afkhami-Rohani, T.-L. Lu, A. Abaye, M. Davis, and
D. J. Maratukulam, ‘‘ANNSTLF—A neural-network-based electric load
terms of accuracy but has by far the best computational time. forecasting system,’’ IEEE Trans. Neural Netw., vol. 8, no. 4, pp. 835–846,
The study has also found out that single-run models are better Jul. 1997.
and more consistent for prediction instead of single-point [12] H. Shah, R. Ghazali, and N. M. Nawi, ‘‘Using artificial bee colony algo-
rithm for MLP training on earthquake time series data prediction,’’ 2011,
regression models. The comparative analysis further revealed arXiv:1112.4628.
that the models perform poorly on normalized temperature [13] H. S. Hippert, C. E. Pedreira, and R. C. Souza, ‘‘Combining neural
data which is unusual as neural network models generally networks and ARIMA models for hourly temperature forecast,’’ in Proc.
IEEE-INNS-ENNS Int. Joint Conf. Neural Netw. IJCNN Neural Comput.,
tend to perform better on normalized data. In short, this study New Challenges Perspect. New Millennium, Jul. 2000, pp. 414–419.
aimed to act as a benchmark for high-resolution temperature [14] K. Methaprayoon, W. J. Lee, S. Rasmiddatta, J. R. Liao, and R. J. Ross,
forecasting with only historical temperature data using neural ‘‘Multistage artificial neural network short-term load forecasting engine
nets that yield sufficient accuracy and are computationally with front-end weather forecast,’’ IEEE Trans. Ind. Appl., vol. 43, no. 6,
pp. 1410–1416, Nov. 2007.
inexpensive. [15] V. Vamitha, M. Jeyanthi, S. Rajaram, and T. Revathi, ‘‘Temperature predic-
From the robustness analysis, the study was able to map tion using fuzzy time series and multivariate Markov chain,’’ Int. J. Fuzzy
a correlation between model performance and the product of Math. Syst., vol. 2, no. 3, pp. 217–230, 2012.
[16] T. T. K. Tran, T. Lee, J.-Y. Shin, J.-S. Kim, and M. Kamruzzaman, ‘‘Deep
MAD and variance of the dataset. It was further found that learning-based maximum temperature forecasting assisted with meta-
the GRU-based model was able to generalize the most over learning for hyperparameter optimization,’’ Atmosphere, vol. 11, no. 5,
various geographical locations although it performed poorly p. 487, May 2020.
[17] Z. Zhang and Y. Dong, ‘‘Temperature forecasting via convolutional recur-
on Beijing data. This was explained by the high variability rent neural networks based on time-series data,’’ Complexity, vol. 2020,
of temperature data across the globe. To perform well on pp. 1–8, Mar. 2020.
temperature data of a particular location, the models had to [18] D. Kreuzer, M. Munz, and S. Schlüter, ‘‘Short-term temperature fore-
casts using a convolutional neural network—An application to different
trade off robustness for a certain level of specificity. This weather stations in Germany,’’ Mach. Learn. With Appl., vol. 2, Dec. 2020,
has indicated a future scope of work where transfer learning Art. no. 100007.
can be adopted so that models trained on one dataset can [19] S. Lee, Y.-S. Lee, and Y. Son, ‘‘Forecasting daily temperatures with dif-
ferent time interval data using deep neural networks,’’ Appl. Sci., vol. 10,
perform well on new data with little correlation with the
no. 5, p. 1609, Feb. 2020.
previous dataset. Moreover, this study can be incorporated [20] T. Toharudin, R. S. Pontoh, R. E. Caraka, S. Zahroh, Y. Lee, and R. C. Chen,
with research on embedded systems equipped with artificial ‘‘Employing long short-term memory and Facebook prophet model in
intelligence processing capabilities to be used in the future to air temperature forecasting,’’ Commun. Statist. Simul. Comput., pp. 1–24,
Jan. 2021.
implement portable, compact devices for on-spot temperature [21] Z. C. Lipton, J. Berkowitz, and C. Elkan, ‘‘A critical review of recurrent
forecasting. neural networks for sequence learning,’’ 2015, arXiv:1506.00019.
[22] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, ‘‘Gradient
flow in recurrent nets: The difficulty of learning long-term dependencies,’’
REFERENCES Université de Montréal, Montréal, QC, Canada, Tech. Rep., 2001.
[1] S. S. Sharif and J. H. Taylor, ‘‘Real-time load forecasting by artificial [23] J. L. Elman, ‘‘Finding structure in time,’’ Cognit. Sci., vol. 14, no. 2,
neural networks,’’ in Proc. Power Eng. Soc. Summer Meeting, Jul. 2000, pp. 179–211, Mar. 1990.
pp. 496–501. [24] R. K. Agrawal, F. Muchahary, and M. M. Tripathi, ‘‘Long term load
[2] J. Verhelst, G. Van Ham, D. Saelens, and L. Helsen, ‘‘Model selec- forecasting with hourly predictions based on long-short-term-memory net-
tion for continuous commissioning of HVAC-systems in office buildings: works,’’ in Proc. IEEE Texas Power Energy Conf. (TPEC), Feb. 2018,
A review,’’ Renew. Sustain. Energy Rev., vol. 76, pp. 673–686, Sep. 2017. pp. 1–6.

VOLUME 9, 2021 160659

E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting

[25] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural SANZANA TABASSUM (Student Member, IEEE)
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. is pursuing the B.Sc. degree in electrical and
[26] P. Liu, X. Qiu, X. Chen, S. Wu, and X. Huang, ‘‘Multi-timescale long electronic engineering with the Islamic University
short-term memory neural network for modelling sentences and docu- of Technology, Gazipur, Bangladesh. Her main
ments,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, research interests include renewable energy, smart
pp. 2326–2335. grid, and machine learning.
[27] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learning
with neural networks,’’ 2014, arXiv:1409.3215.
[28] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, ‘‘On the
properties of neural machine translation: Encoder-decoder approaches,’’
2014, arXiv:1409.1259.
[29] C. Choy, J. Gwak, and S. Savarese, ‘‘4D spatio-temporal ConvNets: EKLAS HOSSAIN (Senior Member, IEEE)
Minkowski convolutional neural networks,’’ in Proc. IEEE/CVF Conf.
received the B.S. degree in electrical and electronic
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3075–3084.
engineering from the Khulna University of Engi-
[30] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman,
‘‘1D convolutional neural networks and applications: A survey,’’ Mech. neering and Technology, Bangladesh, in 2006,
Syst. Signal Process., vol. 151, Apr. 2021, Art. no. 107398. the M.S. degree in mechatronics and robotics
[31] B. Farsi, M. Amayri, N. Bouguila, and U. Eicker, ‘‘On short-term load engineering from the International Islamic Uni-
forecasting using machine learning techniques and a novel parallel deep versity of Malaysia, Malaysia, in 2010, and
LSTM-CNN approach,’’ IEEE Access, vol. 9, pp. 31191–31212, 2021. the Ph.D. degree from the College of Engi-
[32] S. Zhang, B. Guo, A. Dong, J. He, Z. Xu, and S. X. Chen, ‘‘Cautionary neering and Applied Science, University of
tales on air-quality improvement in Beijing,’’ Proc. Roy. Soc. A, Math., Wisconsin–Milwaukee (UWM). He was working
Phys. Eng. Sci., vol. 473, no. 2205, Sep. 2017, Art. no. 20170457. in the area of distributed power systems and renewable energy integration
[33] R. Tatman. (Nov. 2017). R vs. Python: The Kitchen Gadget for last ten years and has published a number of research papers and posters
Test, Version 1. Accessed: Dec. 20, 2017. [Online]. Available: in this field. Since 2015, he has been involved with several research projects
https://www.kaggle.com/rtatman/r-vs-python-the-kitchen-gadget-test on renewable energy and grid tied microgrid system at the Department of
Electrical Engineering and Renewable Energy, Oregon Tech, as an Assistant
Professor. He is currently working as an Associate Researcher at the Oregon
Renewable Energy Center (OREC). His research interests include modeling,
EHTASHAMUL HAQUE was born in Dhaka, analysis, design, and control of power electronic devices; energy storage
Bangladesh. He is currently pursuing the B.Sc. systems; renewable energy sources; integration of distributed generation sys-
degree in electrical and electronic engineer- tems; microgrid and smart grid applications; robotics, and advanced control
ing with the Islamic University of Technology, systems. He is a Senior Member of the Association of Energy Engineers
Gazipur, Bangladesh. His main research interests (AEE). He is a Registered Professional Engineer (PE) in OR, USA. He is also
include smart grid and machine learning. a Certified Energy Manager (CEM) and a Renewable Energy Professional
(REP). He is the Winner of the Rising Faculty Scholar Award from the
Oregon Institute of Technology for his outstanding contribution in teaching,
in 2019. He is serving as an Associate Editor for IEEE ACCESS.

160660 VOLUME 9, 2021

Listening 2
0% (1)
Listening 2
5 pages
LOCG GEN Guideline 001 Rev 0 Loadout
100% (3)
LOCG GEN Guideline 001 Rev 0 Loadout
20 pages
Hybrid Artificial Neural Networks For Electricity Consumption Prediction
No ratings yet
Hybrid Artificial Neural Networks For Electricity Consumption Prediction
8 pages
Poručnićki - Engleski
100% (1)
Poručnićki - Engleski
2 pages
Crystallization (Latest)
100% (1)
Crystallization (Latest)
31 pages
Aviation Weather Formats: Metar/Taf: Where, When, and Wind
100% (1)
Aviation Weather Formats: Metar/Taf: Where, When, and Wind
11 pages
English 7 Module 3 SAMPLE
100% (1)
English 7 Module 3 SAMPLE
3 pages
New Algorithm For Down Scaling Temperature Values Chow2007BSERT
No ratings yet
New Algorithm For Down Scaling Temperature Values Chow2007BSERT
14 pages
A Comparison of Regularization Techniques in Deep
No ratings yet
A Comparison of Regularization Techniques in Deep
18 pages
Building and Environment: Min Hee Chung, Young Kwon Yang, Kwang Ho Lee, Je Hyeon Lee, Jin Woo Moon
No ratings yet
Building and Environment: Min Hee Chung, Young Kwon Yang, Kwang Ho Lee, Je Hyeon Lee, Jin Woo Moon
11 pages
Sciencedirect: M. Pondini, A. Signorini, V. Colla, S. Barsali M. Pondini, A. Signorini, V. Colla, S. Barsali
No ratings yet
Sciencedirect: M. Pondini, A. Signorini, V. Colla, S. Barsali M. Pondini, A. Signorini, V. Colla, S. Barsali
6 pages
Researchpaper For TP2
No ratings yet
Researchpaper For TP2
6 pages
Backpropagation Neural Network Algorithm For Forecasting Soil Temperatures Considering Many Aspects: A Comparison of Different Approaches
No ratings yet
Backpropagation Neural Network Algorithm For Forecasting Soil Temperatures Considering Many Aspects: A Comparison of Different Approaches
9 pages
درجة ثاننية
No ratings yet
درجة ثاننية
14 pages
Sustainability 13 13735 v2
No ratings yet
Sustainability 13 13735 v2
15 pages
Demand Forecasting - Lecture Notes
100% (1)
Demand Forecasting - Lecture Notes
30 pages
Temperature Forecasting For Dar Es Salaam City Using Artificial Neural Network PDF
No ratings yet
Temperature Forecasting For Dar Es Salaam City Using Artificial Neural Network PDF
7 pages
Temperature Prediction For Reheating Furnace by Gated Recurrent Unit Approach
No ratings yet
Temperature Prediction For Reheating Furnace by Gated Recurrent Unit Approach
8 pages
TPDS 2020 3040800 PDF
No ratings yet
TPDS 2020 3040800 PDF
13 pages
Hourly Temperature
No ratings yet
Hourly Temperature
14 pages
Deep Learning Neural Network Prediction System Enhanced With
No ratings yet
Deep Learning Neural Network Prediction System Enhanced With
14 pages
Methods For Simulating Heat Consumption
No ratings yet
Methods For Simulating Heat Consumption
6 pages
(IJCST-V12I1P6) :kaushik Kashyap, Rinku Moni Borah, Priyanku Rahang, DR Bornali Gogoi, Prof. Nelson R Varte
No ratings yet
(IJCST-V12I1P6) :kaushik Kashyap, Rinku Moni Borah, Priyanku Rahang, DR Bornali Gogoi, Prof. Nelson R Varte
5 pages
1 s2.0 S0960148112005939 Main
No ratings yet
1 s2.0 S0960148112005939 Main
7 pages
A Deep Learning Framework For Temperature Forecasting
No ratings yet
A Deep Learning Framework For Temperature Forecasting
6 pages
Docslide - Us - Statistics s1 Revision Papers With Answers PDF
No ratings yet
Docslide - Us - Statistics s1 Revision Papers With Answers PDF
22 pages
Etymology: Bagyo (Sometimes Spelled Bagyu) Is The Word For "Typhoon" or "Storm" in Most
No ratings yet
Etymology: Bagyo (Sometimes Spelled Bagyu) Is The Word For "Typhoon" or "Storm" in Most
5 pages
Weather Instruments, Maps
No ratings yet
Weather Instruments, Maps
16 pages
Ecm 116163
No ratings yet
Ecm 116163
14 pages
Hybrid Forecasting Model of Building Cooling Load Based On Combined Neural Network
No ratings yet
Hybrid Forecasting Model of Building Cooling Load Based On Combined Neural Network
15 pages
1 s2.0 S0360544223035636 Main
No ratings yet
1 s2.0 S0360544223035636 Main
10 pages
Plan OR Intention Voluntary Action Spontaneous Decision Prediction Promise
No ratings yet
Plan OR Intention Voluntary Action Spontaneous Decision Prediction Promise
1 page
Internship Report RMC
No ratings yet
Internship Report RMC
13 pages
Research Article: Temperature Forecasting Via Convolutional Recurrent Neural Networks Based On Time-Series Data
No ratings yet
Research Article: Temperature Forecasting Via Convolutional Recurrent Neural Networks Based On Time-Series Data
8 pages
Model Evaluation (ML)
No ratings yet
Model Evaluation (ML)
15 pages
Towards Optimal District Heating Temperature Contr-3
No ratings yet
Towards Optimal District Heating Temperature Contr-3
1 page
Fine Tuned ETC For Thermal Sensing
No ratings yet
Fine Tuned ETC For Thermal Sensing
18 pages
Icao Flight Plan
No ratings yet
Icao Flight Plan
2 pages
Hydrognomon Theory English 10
No ratings yet
Hydrognomon Theory English 10
11 pages
Temperature Predictionusing Regression Model
No ratings yet
Temperature Predictionusing Regression Model
11 pages
Multi-Time-Horizon Solar Forecasting
No ratings yet
Multi-Time-Horizon Solar Forecasting
7 pages
Khalil 34 2021
No ratings yet
Khalil 34 2021
6 pages
Forecasting Several-Hours-Ahead Electricity Demand Using Neural Network
No ratings yet
Forecasting Several-Hours-Ahead Electricity Demand Using Neural Network
7 pages
MatBal Version 2.0.23 Release Notes
No ratings yet
MatBal Version 2.0.23 Release Notes
7 pages
Forecasting of Residential Unit's Heat Demands: A Comparison of Machine Learning Techniques in A Real World Case Study
No ratings yet
Forecasting of Residential Unit's Heat Demands: A Comparison of Machine Learning Techniques in A Real World Case Study
35 pages
Naïve Model Period A. Demand Forecast F.Error Bs. Forecast Mse Percentage Error
No ratings yet
Naïve Model Period A. Demand Forecast F.Error Bs. Forecast Mse Percentage Error
6 pages
Reference - 8
No ratings yet
Reference - 8
17 pages
MAPE - Mean Absolute Percentage Error - Statistics of Fit - Reference Manual - NumXL
No ratings yet
MAPE - Mean Absolute Percentage Error - Statistics of Fit - Reference Manual - NumXL
3 pages
Benchmarkingcoretemperatureforecastingforlithium-Ionbattery Using Typical Recurrent Neural Networks
No ratings yet
Benchmarkingcoretemperatureforecastingforlithium-Ionbattery Using Typical Recurrent Neural Networks
12 pages
Advances in Artificial Intelligence and Security 2021 471 480
No ratings yet
Advances in Artificial Intelligence and Security 2021 471 480
10 pages
Druid Spell Ranges: 4TH 5TH 6TH 7TH 8TH 9TH 0 1ST 2ND 3RD
No ratings yet
Druid Spell Ranges: 4TH 5TH 6TH 7TH 8TH 9TH 0 1ST 2ND 3RD
8 pages
Why Does Trend Following Work
No ratings yet
Why Does Trend Following Work
3 pages
Co2 MMP
No ratings yet
Co2 MMP
3 pages
ML Report Final
No ratings yet
ML Report Final
42 pages
1 s2.0 S187661021735974X Main
No ratings yet
1 s2.0 S187661021735974X Main
7 pages
Master Thesis Satyaki Submissionversion
No ratings yet
Master Thesis Satyaki Submissionversion
95 pages
Managerial Economics: Class 13 Demand Forecasting
No ratings yet
Managerial Economics: Class 13 Demand Forecasting
25 pages
Project Document
No ratings yet
Project Document
15 pages
Project Document
No ratings yet
Project Document
15 pages
Presentation 5
No ratings yet
Presentation 5
14 pages
Optimizing Building Short-Term Load Forecasting A Comparative Analysis of Machine Learning Models
No ratings yet
Optimizing Building Short-Term Load Forecasting A Comparative Analysis of Machine Learning Models
26 pages
A Flexible and Lightweight Deep Learning Weather Forecasting Model
No ratings yet
A Flexible and Lightweight Deep Learning Weather Forecasting Model
12 pages
Weather Forecasting BiLSTM
No ratings yet
Weather Forecasting BiLSTM
3 pages
A Greenhouse Modeling and Control Using Deep Neural Networks
No ratings yet
A Greenhouse Modeling and Control Using Deep Neural Networks
26 pages
Applsci 10 01609 v2
No ratings yet
Applsci 10 01609 v2
24 pages
Electronics 12 01007
No ratings yet
Electronics 12 01007
19 pages
Thermostat With Machine Learning Algorithms
No ratings yet
Thermostat With Machine Learning Algorithms
8 pages
Diffusion Lab (Advanced)
No ratings yet
Diffusion Lab (Advanced)
4 pages
An Unexpected Visitor - Sophia (Original)
No ratings yet
An Unexpected Visitor - Sophia (Original)
3 pages
Telescopes and Instrumentation: FEROS, The Fiber-Fed Extended Range Optical Spectrograph For The ESO 1.52-m Telescope
No ratings yet
Telescopes and Instrumentation: FEROS, The Fiber-Fed Extended Range Optical Spectrograph For The ESO 1.52-m Telescope
44 pages
Trig Project
No ratings yet
Trig Project
9 pages
IEEE Report of BTP
No ratings yet
IEEE Report of BTP
10 pages
Ieee International Conference-Template
No ratings yet
Ieee International Conference-Template
4 pages
Daily Temperature Prediction Using Recurrent Neural
No ratings yet
Daily Temperature Prediction Using Recurrent Neural
10 pages
Ieeeee
No ratings yet
Ieeeee
6 pages
Energies 14 02737
No ratings yet
Energies 14 02737
13 pages
Report
No ratings yet
Report
5 pages
CNN LSTM 2
No ratings yet
CNN LSTM 2
14 pages
Sustainability 14 05924 v2
No ratings yet
Sustainability 14 05924 v2
19 pages
Sustainability 14 14446 v2
No ratings yet
Sustainability 14 14446 v2
14 pages
Using NEoWave To Forecast S&P 500 With Extreme Accuracy
No ratings yet
Using NEoWave To Forecast S&P 500 With Extreme Accuracy
5 pages
Sustainability 15 00757 v2
No ratings yet
Sustainability 15 00757 v2
15 pages
A Comparison of ML Algorithms
No ratings yet
A Comparison of ML Algorithms
18 pages
1 s2.0 S0360132318306875 Main
No ratings yet
1 s2.0 S0360132318306875 Main
8 pages
Weather Prediction Using CNN-LSTM For Time Series Analysis: A Case Study On Delhi Temperature Data
No ratings yet
Weather Prediction Using CNN-LSTM For Time Series Analysis: A Case Study On Delhi Temperature Data
6 pages
Analyzing The Performance of Diverse Deep Learning Architectures For Weather Prediction
No ratings yet
Analyzing The Performance of Diverse Deep Learning Architectures For Weather Prediction
9 pages
Application of Machine Learning Techniques in Temperature Forecast
No ratings yet
Application of Machine Learning Techniques in Temperature Forecast
6 pages
IDSS Forecast Points
No ratings yet
IDSS Forecast Points
1 page
A Practical Analysis of Sea Breeze Effects on Coastal Areas: (with Implications Associated with Renewable Energy Applications, Environmental Assessments, and Recreational Activities)
From Everand
A Practical Analysis of Sea Breeze Effects on Coastal Areas: (with Implications Associated with Renewable Energy Applications, Environmental Assessments, and Recreational Activities)
Rich Dunk, PhD, CCM
No ratings yet

A Comparative Analysis of Deep Neural Networks For Hourly Temperature Forecasting

Uploaded by

A Comparative Analysis of Deep Neural Networks For Hourly Temperature Forecasting

Uploaded by

Received October 12, 2021, accepted November 22, 2021, date of publication November 30, 2021,

date of current version December 10, 2021.

A Comparative Analysis of Deep Neural Networks

I. INTRODUCTION however, with the growing availability of data, higher reso-

VOLUME 9, 2021 160647

160648 VOLUME 9, 2021

• Comparative analysis for hourly temperature forecasting

VOLUME 9, 2021 160649

rt = σ (Wr · [ht−1 , xt ] + br ) (9)

FIGURE 4. Structure of a general convolutional neural network.

However, there was uncertainty regarding how CNN will

160650 VOLUME 9, 2021

E. CNN-LSTM PARALLEL NETWORK

VOLUME 9, 2021 160651

TABLE 2. Model based hyperparameters of the considered DNNs.

FIGURE 7. The train-test split considered for the Beijing hourly

in a single run. For the hour-by-hour case, the 6h prediction

160652 VOLUME 9, 2021

FIGURE 8. Workflow of modeling the considered deep neural networks.

trained on data without normalization are included in the

where n is the number of data in forecasted temperature,

fits the dataset. The maximum value of R2 is 1, where val-

VOLUME 9, 2021 160653

FIGURE 13. Curve of actual temperature and predicted results for

observed. The error metrics values for models trained on

160654 VOLUME 9, 2021

FIGURE 17. Curve of actual temperature and predicted results for

3) SINGULAR MODELS VS HYBRID MODELS 5) COMPARISON WITH EXISTING WORKS

VOLUME 9, 2021 160655

160656 VOLUME 9, 2021

VOLUME 9, 2021 160657

TABLE 6. Variance and MAD of the considered geographical locations.

FIGURE 23. RMSE statistics for models evaluated on four different

various parameters were initially considered, such as distri-

160658 VOLUME 9, 2021

VOLUME 9, 2021 160659

160660 VOLUME 9, 2021

You might also like