A Comparative Analysis of Deep Neural Networks For Hourly Temperature Forecasting
A Comparative Analysis of Deep Neural Networks For Hourly Temperature Forecasting
ABSTRACT High-resolution temperature forecasting can often prove to be challenging for conventional
machine learning models as temperature is highly seasonal and varies with the time of the year as well as
with passing hours of the day. In most cases, only the daily extremes or mean temperatures are provided
by temperature forecasting methods. However, with the growing availability of data and the development
of deep neural networks (DNNs) capable of detecting complex relationships, high-resolution temperature
forecasting is becoming easier. Typically, historical temperature data along with multiple meteorological
sensor data is used for temperature forecasting which increases the complexity of the system making it
harder and costlier to implement physically. In this paper, high-resolution hourly temperature forecasting
is performed using only historical temperature data. The paper presents a comparative analysis among
four popular DNNs- simple recurrent neural network (SRN), gated recurrent unit (GRU), long-short term
memory (LSTM), convolutional neural network (CNN), and two hybrid models- CNN-LSTM parallel
network and GRU-LSTM parallel network trained on Beijing temperature dataset. Experimental results
showed GRU-LSTM parallel network obtained the lowest RMSE (1.691◦ C) whereas CNN has the best
computational efficiency obtaining a slightly worse RMSE (1.759◦ C). Additionally, a robustness analysis is
performed on temperature data from four additional geographically diverse locations (Toronto, Las Vegas,
Seattle, and Dallas) which reveals GRU to be the most consistent algorithm. Finally, the paper establishes
a correlation between the model performance and the dataset based on their variance and mean absolute
deviation with reference to the training dataset.
INDEX TERMS Deep neural network, CNN, LSTM, CNN-LSTM parallel, temperature forecasting, GRU,
RNN, GRU-LSTM parallel, robustness.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
160646 VOLUME 9, 2021
E. Haque et al.: Comparative Analysis of Deep Neural Networks for Hourly Temperature Forecasting
the change of load and enhance the operational safety of the conduction equation that assessed parameters such as heat
electric network. capacity, conductivity, current temperature, surface albedo,
Zhao and Liu [3] proposed a hybrid PLS-SVM model that solar irradiance, net longwave irradiance, ground conduc-
takes into account meteorological parameters and historical tive heat flux density, sensible and latent heat flux densi-
data to perform up to 3h ahead and 24h ahead load forecast- ties to derive the road surface temperature. Physics-based
ing to optimize HVAC operations. The authors showed that models require sensor measurements from multiple sources
accuracy of the hourly temperature forecast directly affects to compute the temperature; moreover, these values vary
the proposed model. Higher resolution such as 1h ahead, significantly across different locations. These models tend to
2h ahead and 3h ahead temperature forecasts yield higher work better for daily temperature forecasting rather than short
accuracy for the load forecasting model compared to using horizon predictions.
daily extreme temperature forecasts. Hourly temperature data
is also required to analyze test reference years (TRYs) and B. STATISTICAL MODELS
design summer years (DSYs) for energy use, to calculate Mathematical models started gaining momentum around the
plant sizing, and to simulate building performance during hot 1990s. Since the temperature forecasts at that time only pro-
summers [4]. vided maximum and minimum temperature without speci-
Shao and Lister [5] proposed a model which predicts fying what time of the day it will occur, the hourly electric
the hourly road surface temperature and state (wet/ice/dry) load curve had to be generated through interpolation of the
using meteorological data from seven countries. This model two extremes. Data-driven weather forecasting models are
is a short-term model that predicts up to 3h ahead which built using different statistical and machine learning algo-
integrates an hourly temperature forecasting scheme as a rithms. Such models can significantly decrease the setup
prerequisite feature for the next stage of the proposed fore- cost by trading off more historical data for additional sensor
casting model. A similar study by Bogren and Gustavsson [6] data. However, these models may require extensive historical
used hourly air temperature forecast to predict the road sur- data to yield good accuracy. Recently, with the increased
face temperature. In agriculture, Kim et al. [7] used hourly availability of precise data, data-driven models for weather
air temperature forecasts to estimate the duration of leaf forecasting have gained popularity and are actively being
hydration retainability. Hourly temperature can even affect studied. Statistical models such as, autoregressive integrated
biological parameters, such as the mortality burden of hourly moving average (ARIMA) use time-series analysis to predict
temperature variability which was studied extensively [8]. long-term change in data like daily and monthly time hori-
Another significant application of hourly temperature fore- zons [9]. ARIMA is one of the most common linear statistical
casting is in photovoltaic (PV) generation. For seamless grid techniques and a form of regression analysis used in time
integration, predicting hourly fluctuations in PV generation series forecasting. The auto-regressive component of ARIMA
is crucial. Since the output of a PV system is a function of regresses some of the lagged data, then integration is per-
temperature, hourly temperature forecasts are a prerequisite formed to make the data stationary, and the moving-average
in the solar industry. incorporates preceding error terms from a moving average
So, there are a plethora of applications for hourly tem- model applied to lagged observations. One of the biggest
perature forecasting. After addressing the necessity of high drawbacks of ARIMA is that it is negatively affected by sea-
resolution hourly forecasts, the discussion proceeds to assess sonality, and temperature is a highly seasonal dataset. If sta-
the hourly forecast techniques that have been used so far as tionarity is not confirmed in a trend, computation throughout
well as the state-of-the-art regarding this topic. the whole process might not be accurate [10]. So ARIMAs
are not the best choice for temperature forecasting.
II. TEMPERATURE FORECASTING METHODS
Weather forecasting mainly takes one of three routes- tra- C. NEURAL NETWORK MODELS
ditional physics-based, statistical and NN or DNN models. In recent times, NN models have become increasingly pop-
This section briefly explores the different techniques, their ular specially for short-term predictions such as hourly
advantages and drawbacks. and daily time horizons compared to long-term predictions
achieved through statistical models. Existing research mostly
A. PHYSICS BASED MODELS focus on temperature forecasting using consistent time unit
Physics-based weather forecasting is the traditional method data where both the input and target data are of the same time
and is still used by a number of public weather forecast unit, for example, using daily input data to forecast day-ahead
providers. These methods mainly take into account physi- temperature. However, with the increased availability of high-
cal parameters like solar irradiance, wind speed, humidity, resolution data, and continued development of processing
precipitation, cloud covers, etc. and use theoretical formulae units, it is now possible to predict a time frame of different
to calculate the future temperature. Zhao and Liu [3] pre- duration compared to the input. Both hourly and daily pat-
sented a purely physics-based temperature forecasting model terns can be employed to forecast daily temperatures, but as
to determine the temperature which is a prerequisite for the the data is abundant and detailed, it is essential to process
load forecasting part of their study. The study used a heat them efficiently and accurately. With the correct models,
TABLE 1. Literature on temperature forecasting using statistical and neural network models.
hourly temperature data can even be used to predict the hourly predicts hourly horizon using DNN, achieving an hourly
temperature of the next day to a limit before the errors become average RMSE value of 2.10 using their proposed convLSTM
too significant. In this context, NN models have powerful model tested on a temperature dataset of Germany. However,
versatility to process large amounts of more detailed data, it uses five meteorological parameters as input. This not only
which this paper aims to present. increases computation cost, but requires expensive sensor
Existing research on temperature forecasting using statisti- data as well [18]. Univariate regression using NNs can mit-
cal and NNs are tabulated in Table 1. It can be observed that, igate this drawback. In addition, temperature patterns differ
earlier versions of temperature forecasting use different sta- significantly based on geographical location, so it will be
tistical models such as MLP, ARIMA or modified ARIMAs. interesting to observe how DNNs trained on a local tempera-
Some of these papers include hourly temperature forecast- ture pattern performs on a different region. It is apparent that
ing as the prerequisite of a load forecasting model [12]. a study comparing the performance of the most recent DNNs
More recent works started adopting NNs and DNNs that for hourly temperature forecasting, taking into account spatial
yield higher accuracy compared to statistical models, which diversity (local and geographically diverse) and robustness is
is discussed in [18]. However, most of these papers use yet to be explored.
NNs to predict daily extremes and average [19]. To the This study intends to address the existing research gap and
best of the authors knowledge, only one forecasting model make the following significant contributions:
where ht is the hidden neuron at time t, ot is the output vector where input variable at time step t is denoted by xt . ct and ht
and b is the bias value. are cell state and hidden state respectively. c̃t is referred to
Figure 1 illustrates a basic SRN unit. The main drawback as the candidate cell calculated in Eq.4 whose output through
of SRN is that it sometimes fails to converge to the optimum the tanh function has a value between -1 and 1. Wf , Wi , Wc ,
minima due to its vanishing gradient problem that might Wo denote different weight matrices for input vectors. The
arise during back propagation [24]. So over the course of σ represents the sigmoid activation function and ∗ symbol
neuron at layer l − 1. xkl and blk are the input and the bias of of CNN-LSTM parallel network considered in this study is
the k th neuron at layer l, respectively. In order to perform 1D shown in Figure 5.
convolution without zero padding, the conv1D(., .) function
was used. This implies that the dimension of sl−1 i (output F. GRU-LSTM PARALLEL NETWORK
arrays) are higher than the dimension of xkl (input arrays). The GRU-LSTM hybrid models have previously been proposed
intermediate output ylk is obtained by applying an activation for series configuration. To the best of our knowledge, we are
function f (.) on the input xkl using the following equation: the first to implement a GRU-LSTM parallel network for time
series prediction. The series configuration was also trained,
ykl = f (xlk ) and skl = ykl ↓ ss (13)
but the parallel GRU-LSTM yielded better results which is
where ↓ ss denotes a down-sampling operation with a scalar why it is considered for this study. The concept is similar
factor, ss [30]. Down-sampling of the feature map is per- to that of CNN-LSTM; in order to avoid the output of one
formed in this layer which reduces several values into one network adding any bias to the output of another, the series
value keeping the integrity of the input data unchanged [19]. configuration was replaced with a parallel network where
The last layer is the dense layer which receives the flattened each DNN has separate paths for training the data. GRU and
data of the pooling stage and makes it a 1D output sequence. LSTM have a similar working mechanism, with GRU being
An attractive feature of 1D CNN is that low-cost hardware a little faster than LSTM as it has two gates where LSTM
implementation is possible as 1D CNNs only perform 1D has three. Combining the two models have shown promising
convolutions, which is basically additions and scalar mul- results.
tiplications. A basic internal structure of CNN is shown in
Figure 4.
A. DATA COLLECTION
The temperature data is collected from a dataset uploaded
by Zhang S. et al. titled ‘‘Cautionary Tales on Air-Quality
Improvement in Beijing’’ [32]. The original data contained
various air quality readings from twelve nationally controlled error. A common practice is to use rule-of-thumb param-
monitoring sites. From the whole dataset, the Aoti Zhongxin eters or combinations that have previously performed well
area is taken for its relatively low number of missing values. for other papers. However, we have carefully chosen all the
The Aoti Zhongxin is considered in this study to represent hyperparameters after manually testing from a wide range
overall Beijing temperature because of the low variation in of values. A validation run is conducted for each model
readings from other centers. to decide the hyperparameters for best performance and
The dataset consisted of hourly temperature data from fitting before training the final models. The train set is
2013-03-01 00:00:00 to 2017-02-28 23:00:00 giving us a split 90-10 for the validation run. The layer-based hyper-
total of 35064 hourly readings. The dataset is at first sorted parameters determined from this run, are provided in the
according to datetime. There were 20 missing temperature Table 2.
values and because of the relatively small size of the miss- General parameters such as optimizer, learning rate and the
ing data, it is filled using the forward fill method instead number of epochs are also important to improve the over-
of other complex imputation methods. Then maintaining all performance and speed of the models. Commonly used
the order, first 90% of the data is selected for training optimizers include root mean square propagation (RMSprop),
from 2013-03-01 00:00:00 to 2016-10-04 18:00:00 and the stochastic gradient descent (SGD), the adaptive gradient algo-
remaining is taken for testing from 2016-10-04 19:00:00 to rithm (AdaGrad), and adaptive moment estimation (Adam).
2017-02-28 23:00:00. The train-test split can be visualized In this paper, after the validation run, the Adam opti-
from Figure 7. mizer is chosen which is computationally efficient and
The dataset for the robustness analysis titled ‘‘Histori- showed slightly better results during testing. The batch size
cal Hourly Weather Data 2012-2017’’ [33] contains 5 years of all the models is taken as 64 and the loss functions
of high resolution (hourly measurements) temporal data of considered are- mean square error (MSE), cosine similar-
various weather attributes from January 2012, 12:00:00 to ity (for full time single run) and MSE for hour-by-hour
December 2017, 00:00:00, out of which the temperature data prediction.
is extracted. This data is available for 30 US and Canadian
cities. Toronto, Seattle, Dallas and Las Vegas were chosen V. RESULT ANALYSIS
for the robustness analysis because of their considerably scat- A. FORECASTING OUTCOMES
tered geographical locations so that the temporal data vary as The trained models were used to predict hourly temperatures
much as possible. up to 6h ahead. The prediction is carried out for hour-by-
hour basis as well as the whole time horizon in a single
B. MODEL CONSTRUCTION AND HYPERPARAMETER run. The training and testing period has been mentioned
TUNING in section IV-A. It is observed that the models trained on
Hyperparameter tuning is an important part of NN con- unnormalized data perform better than models trained on
struction, which is usually done through extensive trial and normalized data, and so only the prediction graphs of models
B. EVALUATION METRICS
The performance of the DNNs are evaluated in terms of
three error metrics. The error metrics taken into account
are the conventional root mean squared error (RMSE)
and mean average error (MAE) and additionally, the
coefficient of determination R2 . The mathematical expres-
sions of the above error metrics are given as
follows: FIGURE 9. Curve of actual temperature and predicted results for
hour-by-hour prediction using SRN.
v
u n
u1 X 2
RMSE = t Ft − At (14)
n
i=1
n
1 X
MAE = | Ft − At | (15)
n
i=1
Pn 2
i=1 Ft − At
R2 = 1 − 2 (16)
Pn
i=1 Ft − Āt
FIGURE 14. Curve of actual temperature and predicted results for single
FIGURE 11. Curve of actual temperature and predicted results for
run prediction using GRU.
hour-by-hour prediction using LSTM.
FIGURE 12. Curve of actual temperature and predicted results for single FIGURE 15. Curve of actual temperature and predicted results for
run prediction using LSTM. hour-by-hour prediction using CNN.
FIGURE 18. Curve of actual temperature and predicted results for single
run prediction using CNN-LSTM parallel network.
FIGURE 20. Curve of actual temperature and predicted results for single
run prediction using GRU-LSTM parallel network.
from Table 3 and Table 4 that they perform similarly on
univariate time series predictions. A deciding argument in the case of Figure 22, almost every model including SRN,
this regard can be the computation time. CNNs have a huge LSTM, GRU-LSTM and CNN-LSTM performed inconsis-
advantage of being very fast compared to RNNs. In our study, tently. Although normalized data are expected to yield good
the CNN model ran 5 times faster than LSTM, 4 times faster results on time series forecasting using DNNs, it performed
than GRU and twice as fast as SRN. poorly on temperature data.
TABLE 3. Evaluation metrics of the considered DNNs trained on Beijing data without normalization.
TABLE 4. Evaluation metrics of the considered DNNs trained on Beijing data with normalization.
FIGURE 21. RMSE statistics for considered models trained on Beijing dataset without normalization.
FIGURE 22. RMSE statistics for considered models trained on Beijing dataset with normalization.
4) In terms of computational cost, CNN is much faster 1 March 2013, 00:00:00 to 28 February 2017, 23:00:00 (same
than any other model while sustaining good perfor- as Beijing dataset). The result obtained from the predic-
mance. tions are summarized in Table 5. The models were run with
both normalization and without normalization, also hour-by-
VI. ROBUSTNESS ANALYSIS hour and single-run approaches. Similar to the previous case,
In section V, a conclusion is drawn from the performance models without normalization in a single run yielded better
of the models by testing them on the same dataset as they results, so the discussion will be limited to this. To grasp the
trained on. In this section, the robustness of the models are changes easier, the comparative RMSE of the DNN models
analyzed by testing the models on new datasets from different is illustrated in Figure 23.
geographical locations having uncorrelated climatic charac- Figure 23 depicts that all the models performed satisfacto-
teristics. The robustness is a model’s ability to generalize rily on untrained, unrelated datasets from different locations.
trends and output satisfactory performance on different or The RMSE of all the models did increase, but the increase is
altered datasets. The previous three error metrics are com- comparatively low, indicating a model’s robustness and relia-
pared among different DNNs to assess their robustness in bility. From Table 5, it can be observed that GRU has achieved
each location. the lowest average RMSE (2.0042◦ C), which indicates that
Four cities from different geographical locations were GRU is the most robust DNN.
chosen for the robustness analysis, discussed in section IV. To draw a correlation between a model’s performance and
The time period considered for the prediction is from different types of temperature datasets from different regions,
TABLE 5. Robustness analysis of considered DNNs evaluated on four different geographical locations.
(except SRN which produced an outlier). Another important [3] J. Zhao and X. Liu, ‘‘A hybrid method of dynamic cooling and heating
point to note is that, for Seattle, all the models have yielded load forecasting for office buildings based on artificial intelligence and
regression analysis,’’ Energy Buildings, vol. 174, pp. 293–308, Sep. 2018.
a lower RMSE value compared to Beijing, as it has a lower [4] D. H. C. Chow and G. J. Levermore, ‘‘New algorithm for generating hourly
variance. This implies that the models are able to achieve a temperature values using daily maximum, minimum and average values
degree of generality. On the other hand, the specificity of from climate models,’’ Building Services Eng. Res. Technol., vol. 28, no. 3,
pp. 237–248, Aug. 2007.
the models can be understood from the positive correlation [5] J. Shao and P. J. Lister, ‘‘An automated nowcasting model of road surface
of RMSE to the MAD value. This opens the scope of using temperature and state for winter road maintenance,’’ J. Appl. Meteorol.,
transfer learning for datasets that have little correlation to the vol. 35, no. 8, pp. 1352–1361, Aug. 1996.
[6] J. Bogren and T. Gustavsson, ‘‘Site specific road surface temperature
dataset models were trained on. forecast improvements by use of radiation measurements,’’ in Proc. 11th
SIRWEC Conf., 2002, pp. 1–5.
VII. CONCLUSION [7] K. S. Kim, S. E. Taylor, M. L. Gleason, and K. J. Koehler, ‘‘Model to
enhance site-specific estimation of leaf wetness duration,’’ Plant Disease,
This study has carried out a comparative analysis on six vol. 86, no. 2, pp. 179–185, Feb. 2002.
DNN models to observe which performs the best for [8] J. Cheng, Z. Xu, H. Bambrick, H. Su, S. Tong, and W. Hu, ‘‘The mor-
high-resolution hourly temperature forecasting on Beijing tality burden of hourly temperature variability in five capital cities, Aus-
tralia: Time-series and meta-regression analysis,’’ Environ. Int., vol. 109,
temperature data. The study has also presented an in-depth pp. 10–19, Dec. 2017.
robustness analysis to see the change in performance param- [9] G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, ‘‘Predictability
eters of these DNNs when tested on a geographically diverse of monthly temperature and precipitation using automatic time series fore-
dataset. The comparative analysis has revealed GRU-LSTM casting methods,’’ Acta Geophys., vol. 66, no. 4, pp. 807–831, Aug. 2018.
[10] J. Kihoro, R. Otieno, and C. Wafula, ‘‘Seasonal time series forecasting: A
parallel network to provide the best performance when tested comparative study of ARIMA and ANN models,’’ Meru Univ., Nairobi,
on the Beijing data at 1.691◦ C RMSE. CNN on the other hand Kenya, Tech. Rep., 2004.
performs slightly worse at 1.759 ◦ C RMSE ranking 3rd in [11] A. Khotanzad, R. Afkhami-Rohani, T.-L. Lu, A. Abaye, M. Davis, and
D. J. Maratukulam, ‘‘ANNSTLF—A neural-network-based electric load
terms of accuracy but has by far the best computational time. forecasting system,’’ IEEE Trans. Neural Netw., vol. 8, no. 4, pp. 835–846,
The study has also found out that single-run models are better Jul. 1997.
and more consistent for prediction instead of single-point [12] H. Shah, R. Ghazali, and N. M. Nawi, ‘‘Using artificial bee colony algo-
rithm for MLP training on earthquake time series data prediction,’’ 2011,
regression models. The comparative analysis further revealed arXiv:1112.4628.
that the models perform poorly on normalized temperature [13] H. S. Hippert, C. E. Pedreira, and R. C. Souza, ‘‘Combining neural
data which is unusual as neural network models generally networks and ARIMA models for hourly temperature forecast,’’ in Proc.
IEEE-INNS-ENNS Int. Joint Conf. Neural Netw. IJCNN Neural Comput.,
tend to perform better on normalized data. In short, this study New Challenges Perspect. New Millennium, Jul. 2000, pp. 414–419.
aimed to act as a benchmark for high-resolution temperature [14] K. Methaprayoon, W. J. Lee, S. Rasmiddatta, J. R. Liao, and R. J. Ross,
forecasting with only historical temperature data using neural ‘‘Multistage artificial neural network short-term load forecasting engine
nets that yield sufficient accuracy and are computationally with front-end weather forecast,’’ IEEE Trans. Ind. Appl., vol. 43, no. 6,
pp. 1410–1416, Nov. 2007.
inexpensive. [15] V. Vamitha, M. Jeyanthi, S. Rajaram, and T. Revathi, ‘‘Temperature predic-
From the robustness analysis, the study was able to map tion using fuzzy time series and multivariate Markov chain,’’ Int. J. Fuzzy
a correlation between model performance and the product of Math. Syst., vol. 2, no. 3, pp. 217–230, 2012.
[16] T. T. K. Tran, T. Lee, J.-Y. Shin, J.-S. Kim, and M. Kamruzzaman, ‘‘Deep
MAD and variance of the dataset. It was further found that learning-based maximum temperature forecasting assisted with meta-
the GRU-based model was able to generalize the most over learning for hyperparameter optimization,’’ Atmosphere, vol. 11, no. 5,
various geographical locations although it performed poorly p. 487, May 2020.
[17] Z. Zhang and Y. Dong, ‘‘Temperature forecasting via convolutional recur-
on Beijing data. This was explained by the high variability rent neural networks based on time-series data,’’ Complexity, vol. 2020,
of temperature data across the globe. To perform well on pp. 1–8, Mar. 2020.
temperature data of a particular location, the models had to [18] D. Kreuzer, M. Munz, and S. Schlüter, ‘‘Short-term temperature fore-
casts using a convolutional neural network—An application to different
trade off robustness for a certain level of specificity. This weather stations in Germany,’’ Mach. Learn. With Appl., vol. 2, Dec. 2020,
has indicated a future scope of work where transfer learning Art. no. 100007.
can be adopted so that models trained on one dataset can [19] S. Lee, Y.-S. Lee, and Y. Son, ‘‘Forecasting daily temperatures with dif-
ferent time interval data using deep neural networks,’’ Appl. Sci., vol. 10,
perform well on new data with little correlation with the
no. 5, p. 1609, Feb. 2020.
previous dataset. Moreover, this study can be incorporated [20] T. Toharudin, R. S. Pontoh, R. E. Caraka, S. Zahroh, Y. Lee, and R. C. Chen,
with research on embedded systems equipped with artificial ‘‘Employing long short-term memory and Facebook prophet model in
intelligence processing capabilities to be used in the future to air temperature forecasting,’’ Commun. Statist. Simul. Comput., pp. 1–24,
Jan. 2021.
implement portable, compact devices for on-spot temperature [21] Z. C. Lipton, J. Berkowitz, and C. Elkan, ‘‘A critical review of recurrent
forecasting. neural networks for sequence learning,’’ 2015, arXiv:1506.00019.
[22] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, ‘‘Gradient
flow in recurrent nets: The difficulty of learning long-term dependencies,’’
REFERENCES Université de Montréal, Montréal, QC, Canada, Tech. Rep., 2001.
[1] S. S. Sharif and J. H. Taylor, ‘‘Real-time load forecasting by artificial [23] J. L. Elman, ‘‘Finding structure in time,’’ Cognit. Sci., vol. 14, no. 2,
neural networks,’’ in Proc. Power Eng. Soc. Summer Meeting, Jul. 2000, pp. 179–211, Mar. 1990.
pp. 496–501. [24] R. K. Agrawal, F. Muchahary, and M. M. Tripathi, ‘‘Long term load
[2] J. Verhelst, G. Van Ham, D. Saelens, and L. Helsen, ‘‘Model selec- forecasting with hourly predictions based on long-short-term-memory net-
tion for continuous commissioning of HVAC-systems in office buildings: works,’’ in Proc. IEEE Texas Power Energy Conf. (TPEC), Feb. 2018,
A review,’’ Renew. Sustain. Energy Rev., vol. 76, pp. 673–686, Sep. 2017. pp. 1–6.
[25] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural SANZANA TABASSUM (Student Member, IEEE)
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. is pursuing the B.Sc. degree in electrical and
[26] P. Liu, X. Qiu, X. Chen, S. Wu, and X. Huang, ‘‘Multi-timescale long electronic engineering with the Islamic University
short-term memory neural network for modelling sentences and docu- of Technology, Gazipur, Bangladesh. Her main
ments,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, research interests include renewable energy, smart
pp. 2326–2335. grid, and machine learning.
[27] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learning
with neural networks,’’ 2014, arXiv:1409.3215.
[28] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, ‘‘On the
properties of neural machine translation: Encoder-decoder approaches,’’
2014, arXiv:1409.1259.
[29] C. Choy, J. Gwak, and S. Savarese, ‘‘4D spatio-temporal ConvNets: EKLAS HOSSAIN (Senior Member, IEEE)
Minkowski convolutional neural networks,’’ in Proc. IEEE/CVF Conf.
received the B.S. degree in electrical and electronic
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3075–3084.
engineering from the Khulna University of Engi-
[30] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman,
‘‘1D convolutional neural networks and applications: A survey,’’ Mech. neering and Technology, Bangladesh, in 2006,
Syst. Signal Process., vol. 151, Apr. 2021, Art. no. 107398. the M.S. degree in mechatronics and robotics
[31] B. Farsi, M. Amayri, N. Bouguila, and U. Eicker, ‘‘On short-term load engineering from the International Islamic Uni-
forecasting using machine learning techniques and a novel parallel deep versity of Malaysia, Malaysia, in 2010, and
LSTM-CNN approach,’’ IEEE Access, vol. 9, pp. 31191–31212, 2021. the Ph.D. degree from the College of Engi-
[32] S. Zhang, B. Guo, A. Dong, J. He, Z. Xu, and S. X. Chen, ‘‘Cautionary neering and Applied Science, University of
tales on air-quality improvement in Beijing,’’ Proc. Roy. Soc. A, Math., Wisconsin–Milwaukee (UWM). He was working
Phys. Eng. Sci., vol. 473, no. 2205, Sep. 2017, Art. no. 20170457. in the area of distributed power systems and renewable energy integration
[33] R. Tatman. (Nov. 2017). R vs. Python: The Kitchen Gadget for last ten years and has published a number of research papers and posters
Test, Version 1. Accessed: Dec. 20, 2017. [Online]. Available: in this field. Since 2015, he has been involved with several research projects
https://www.kaggle.com/rtatman/r-vs-python-the-kitchen-gadget-test on renewable energy and grid tied microgrid system at the Department of
Electrical Engineering and Renewable Energy, Oregon Tech, as an Assistant
Professor. He is currently working as an Associate Researcher at the Oregon
Renewable Energy Center (OREC). His research interests include modeling,
EHTASHAMUL HAQUE was born in Dhaka, analysis, design, and control of power electronic devices; energy storage
Bangladesh. He is currently pursuing the B.Sc. systems; renewable energy sources; integration of distributed generation sys-
degree in electrical and electronic engineer- tems; microgrid and smart grid applications; robotics, and advanced control
ing with the Islamic University of Technology, systems. He is a Senior Member of the Association of Energy Engineers
Gazipur, Bangladesh. His main research interests (AEE). He is a Registered Professional Engineer (PE) in OR, USA. He is also
include smart grid and machine learning. a Certified Energy Manager (CEM) and a Renewable Energy Professional
(REP). He is the Winner of the Rising Faculty Scholar Award from the
Oregon Institute of Technology for his outstanding contribution in teaching,
in 2019. He is serving as an Associate Editor for IEEE ACCESS.