0% found this document useful (0 votes)
24 views12 pages

A Flexible and Lightweight Deep Learning Weather Forecasting Model

Uploaded by

mhlinder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

A Flexible and Lightweight Deep Learning Weather Forecasting Model

Uploaded by

mhlinder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Applied Intelligence (2023) 53:24991–25002

https://doi.org/10.1007/s10489-023-04824-w

A flexible and lightweight deep learning weather forecasting model


Gabriel Zenkner1 · Salvador Navarro‑Martinez1

Accepted: 20 June 2023 / Published online: 1 August 2023


© The Author(s) 2023

Abstract
Numerical weather prediction is an established weather forecasting technique in which equations describing wind, temperature,
pressure and humidity are solved using the current atmospheric state as input. This study examines deep learning to forecast
weather given historical data from two London-based locations. Two distinct Bi-LSTM recurrent neural network models
were developed in the TensorFlow deep learning framework and trained to make predictions in the next 24 and 72 h, given
the past 120 h. The first trained neural network predicted temperature at Kew Gardens with a forecast accuracy of ± 2 ◦ C in
73% of instances in a whole unseen year, and a root mean squared errors of 1.45 ◦ C. The second network predicted 72-h air
temperature and relative humidity at Heathrow with root mean squared errors 2.26 ◦ C and 14% respectively and 80% of the
temperature predictions were within ± 3 ◦ C while 80% of relative humidity predictions were within ± 20%. Both networks were
trained with five years of historical data, with cloud training times of over a minute (24-h network) and three minutes (72-h).

Keywords Recurrent Neural Network · Bi-LSTM · Weather Forecast

1 Introduction effects become dominant and the simulations demand large


computational resources and are exceedingly expensive [2]
Numerous sectors are heavily reliant on accurate weather Ensemble modelling is computationally demanding requiring
forecasting including renewable energy production, energy numerous runs of each model with different initial conditions.
consumption, agriculture and emergency services. Numeri- To make meaningful seasonal predictions, the number of runs
cal weather prediction is an established weather forecasting should be between 100 and 200 [3], increasing the cost 100-fold
technique in which the transport fluid equations momen- over deterministic approaches. Moreover, the multi-scale nature
tum, energy and scalar transport are solved using the current of the fluid equations and physical processes associated cre-
atmospheric state as an input. The output is the temperature, ates simplifications and the initial state approximation may be
humidity, pressure, etc. in a desired forecast length. Model- inaccurate [4]. Similarly, the acquisition of representative initial
ling large scale weather is notoriously difficult due to uncer- conditions is one of the biggest hurdles in numerical weather pre-
tain boundary conditions and the chaotic nature of the under- diction [5]. This characterisation process becomes increasingly
lying fluid mechanics equations. The accuracy of numerical challenging in cities where landscape drastically affects wind and
forecast predictions has improved steadily since the 1960s, temperature behaviour. Machine learning approach can comple-
carried mostly by the increase of computational power and ment existing numerical weather prediction, or in some cases
turbulence modelling techniques [1]. To reduce the uncer- even substitute it, thereby reducing the enormous computational
tainty of the predictions, expensive ensemble modelling demands associated with numerical weather prediction.
is used, where simulations are run many times with small The present work proposes to use historical data of
differences in initial conditions. Beyond five days, chaotic weather stations to produce short-term local forecasts. The
locality of the data and forecast simplifies the complexity of
spatial correlations that exist in turbulent fluid dynamics and
* Salvador Navarro‑Martinez reduces the size and training of the network. Moreover, local
[email protected]
data is attractive for Deep Learning, which can account for
Gabriel Zenkner the "unpredictability" of the local conditions.
[email protected]
The novelty of the method resides in the use of large his-
1
Department Mechanical Engineering, Imperial College torical data of nearby locations, to create simple input–output
London, Exhibition Road, London SW7 2AZ, UK network models independent of the date. The approach is

13
Vol.:(0123456789)
24992 G. Zenkner, S. Navarro‑Martinez

purely data-driven, without any kind of data assimilation or presented, while Section 5 concludes the paper and outlines
hybridisation. The model is tested using historical data from future research directions.
two London- based locations to train a Bi-LSTM recurrent
neural network to predict temperature and relative humidity.
The main contributions of this article are: 2 Related work

• The creation of a Deep Neural Network framework to Machine Learning (ML) is showing large potential in
use historical weather data to create forecasts of selected fluid mechanics [6, 7], where it can be used to model sub-
weather features over desired length. grid stress [8, 9] or extract turbulent structures [10]. One
• The development of two models to predict temperature of the first ML applications in weather forecasting was
and humidity hourly evolution over 24 and 72 h in two Schizas et al. [11] in 1991, where Artificial Neural Net-
locations in London. works (ANN), where used to predict minimum tempera-
• The study of forecasting errors investigating seasonal tures. Similarly, Ochiai et al. [12] used ANN in 1995 to
variations and forecast length. predict rainfall and snowfall. These models were able to
improve the forecasting accuracy compared to statistical
The rest of the paper is structured as follows. In Section 2, models [13]. However, the limited forecast of 30–180 min
the relevant literature related to the use of Machine Learn- and difficulties in obtaining solution convergence made
ing in Weather forecasting is discussed, while in Section 3, practical application impossible. Traditional machine
the architecture and the dataset used for testing is described. learning examples include support vector machine or lin-
In Section 4, the results with the two models developed are ear regression which are typically far less computationally

Fig. 1  Joint probability density functions of two features (off-diagonal) and single-feature probability density functions (diagonal) for the two locations

13
A flexible and lightweight deep learning weather forecasting model 24993

demanding than neural networks and have been investi- predict precipitation more accurately in 89% of instances
gated as forecasting candidates. For example, Ma et al. compared to existing weather prediction techniques. Hew-
[14] deployed a traditional machine learning model age et al. [13] report that their ML models predict weather
known as XGBoost, which are comprised of gradient conditions 12 h into the future with higher accuracy than
boosted decision trees, to predict air temperature and conventional weather forecasting.
humidity over a 3-h period with resulting root mean Neural networks have been identified as being particu-
square errors (RMSE) of temperature of 1.77 ◦ C . Despite larly promising in precipitation forecasting. A MetNet model
the relatively good result of traditional machine learning developed at Google [28] was shown to predict precipita-
approaches, there are several reasons why a deep learning tion accurately over the course of eight hours. In this hybrid
approach is preferred for weather prediction. Traditional approach, several models were used at different stages includ-
algorithms are unable to model non-linearity, which is ing LSTMs and CNNs. Despite its good performance, the
essential in predicting the evolution of the weather [15, model requires large volumes of data. An improvement
16]. Similarly, Shao et al. [17] reported that statistical was obtained by Met-Net2 [29], outperforming up to 12 h
and traditional ML techniques are not well-suited for state-of-the-art weather models operating in the Continental
complex wind forecasting and attribute this need to the United States. Fu et al. [30], upon evaluating many neural
turbulent and chaotic behaviour of wind. Recent efforts network architectures, settled on a combined Bidirectional-
have focused on using Support Vector Machines and vari- LSTM (Bi-LSTM) and a one-dimensional CNN to predict
ations for short term series forecasting and classification ground air temperature, relative humidity and wind speed
of non-linear data and time series [18–20]. Deep Learning over seven days. They used weather station data from ten
(DL) leverages the growing volume and accessibility of weather stations in Beijing and the final model contained
data. While traditional machine learning models reach more than a million nodes. Despite its size and complexity,
a point beyond which additional training data no longer the quantitative performance relative to the local weather
improves model performance, deep learning models have observations was uncertain. The latest trends among others,
been observed to benefit from the increase in data [21]. include the use of hybrid LSTM/GAN [31] to predict cloud
DL networks have been increasingly used in time series movement, LSTM/CNN for drought forecast [32]. Wind fore-
forecasting in several applications, examples include casting is of great importance in wind power and load estima-
finance [22], sugarcane yield prediction [23] and power tions and DL has been recently applied [33–36]. Most of the
load forecasting [24] among others. DL has the potential applications focused on short term which sped up prediction
to significantly improve the accuracy of weather forecast- by up to 24 h.
ing and its applications increased exponentially. Bauer The recent literatures shows that DL applications in weather
et al. [4] showed that their Convolutional Neural Network forecast are accelerating, with large-scale forecasts using CNN-
(CNN) ensemble forecasting model can predict anoma- variant architectures and LSTM dominating point forecast.
lies such as Hurricane Irma. Weyn et al. [25] increased However, there are clearly several research bottlenecks associ-
the accuracy of weather prediction by applying ensem- ated with short-term forecasting. Most applications have been
ble modelling of separate CNN models, each with dif- in wind-farm sites with "simple" weather patterns, while urban
ferent starting conditions and sets of weights. Roy et al. environments are more complex to predict as the turbulence
[26] evaluated a multilayer perceptron, a long short-term content of the signal is larger. Moreover, there is a deterioration
memory (LSTM) model and a hybrid CNN/LSTM model
and concludes that models with more complex architec-
tures in general improve performance, while Ravuri et al. Table 2  Parameters used in Model A including number of epochs and
[27] demonstrated that their neural network model can optimiser settings
Parameter Value
Table 1  Architecture of the Bi-LSTM used in Model A, which
Context Length 120 h
includes the number and type of layers and the number of nodes in
each layer Gradient Optimisation Adaptive moment esti-
mation (ADAM)
Layer Type Value Shape Parameters Learning rate 0.001
Input - - (120 × 6) 0 Model optimised metric Mean squared error
Hidden Bi-LSTM Tanh activa- (32 × 512) 538,624 Performance metric Root mean squared error
tion function Epochs 2
Hidden Dropout 0.25 (32 × 512) 0 Batch size 32
Output Linear - (32 × 6) 3,078 Runtime size 78 s
Total 541,702 Train, validate, test ratios 0.7, 0.15 and 0.15

13
24994 G. Zenkner, S. Navarro‑Martinez

Fig. 2  Comparison between predicted and measured temperature at Kew Gardens using the forecast length of one hour and a context length of
120 h. Scatter plot (left), one-year predictions (right)

of predictions after several hours and there is not an optimal extracted from the Centre for Environmental Data Analysis
forecast length, which seems to depend on application. [38] and contains weather information from 2015–2021 with
dozens of hourly weather parameters, hereinafter referred to
as features for consistency. However, not all features are avail-
able for all weather stations and so the selection was limited
3 Methodology and data processing to six unique features (three per weather station). The features
of particular interest are air temperature, relative humidity and
LSTMs are applied frequently in sequential problems as they wind speed at both Heathrow and Kew Gardens, see Fig. 1.
address the issue of loss of long-term memory [37]. The With the features selected, the dataset is normalised. This
Bi-LSTM recurrent neural network builds upon the LSTM is performed by using the mean and standard deviation for
structure. In a Bi-LSTM model a duplicate layer is produced, each feature. The mean and standard deviation are calculated
where sequential information flows in chronological order from the training dataset, as including data from the valida-
through the first layer while the duplicate layer is used for tion and test sets and may result in overfitting [39].
the same sequential information, but in reversed order. This The training, validation and test datasets are split up in frac-
provides the model with far more context as key information tions of 0.7, 0.15 and 0.15 respectively with the chronological
at both the start and end of the sequence is available. sequence of the data maintained. This corresponds to a sample
The training data is openly available by the Met Office size of 36,825, 7,891 and 7,892 observations respectively.
from two London weather observation stations: Kew Gardens Two networks were created, Model A, to forecast 24 h
(51.482, -0.294) and Heathrow (51.479, -0.451). The data was and Model B to predict 72 h. The same dataset with the

Fig. 3  Comparison between


predicted and measured tem-
perature at Kew Gardens using
the 24-h and 1-h temperature
predictions

13
A flexible and lightweight deep learning weather forecasting model 24995

Table 3  Root mean squared error (RMSE), mean average error the number of features and batch size. The total number of
(MAE) and maximum error between hourly and 24-h temperature parameters to be trained in the model is the sum of those in
predictions in Fig. 3
the hidden layer and output layer, totalling 541,702.
RMSE [◦ C] MAE [◦ C] Max. Error [◦ C] A dropout layer is included to minimise the impact of over-
fitting by randomly setting the weight of 25% of the units in
Single timestep 0.86 0.63 2.19
the hidden layer to zero. Dropout is a well-established tech-
Multi-timestep 1.74 1.33 4.76
nique in neural network modelling to overcome overfitting and
is considered a more practical approach than regularisation,
which is a common approach to reduce overfitting in tradi-
same split ratio for training, validation, and testing was tional machine learning problems (Table 2) [40].
used in both models. However, Model B is deeper, with a The training process was performed using Jupyter Note-
denser Bi-LSTM with more cells and an additional Feed book within a Google Colaboratory environment. The com-
Forward neural network (FNN) in the second hidden layer. plete runtime was 78 s after which predictions could be made
Model B was trained on the same dataset with the same within 10 s. The maximum memory usage during training was
split ratio for training, validation, and testing. less than 16 GB. The entire test dataset corresponds to roughly
The architecture of Model A is characterised in Table 1 one year of data in 2020 (while training is 2015–2019). The
and determines the number of calculations performed. The model uses 120 measured hourly data as input and the output is
input layer shape is defined by the length of the context and the desired forecast hours. A benefit of having a context length
the number of features. The hidden layer shape is defined greater than the forecast length is that some measured data
by the batch size and number of Bi-LSTM units; 256 for- will always be used in making the prediction. However, the
ward and 256 backward units. A batch size of 32 results in returns are diminished as the temporal gap between the meas-
1,151 observations per batch from a total of 36,825 train- ured data and forecast increases. A model with a larger context
ing observations with any difference subtracted from the of 240 h capture the data trend but failed to express the peaks
final batch. Finally, the output layer shape is defined by and troughs accurately. The approach was first tested by doing

Fig. 4  24-h forecast of the air temperature at Kew Gardens during four days in different seasons

13
24996 G. Zenkner, S. Navarro‑Martinez

Table 4  Root mean squared error (RMSE), mean average error error according to all three metrics is approximately twice
(MAE) and maximum errors for the 24-h temperature prediction as large as the single-step error.
(Fig. 4), values in parentheses are normalised RMSE
To quantify how well our 24-h model generalises to dif-
RMSE [◦ C] RMSE Naïve MAE [◦ C] Max. Error [◦ C] ferent time periods and seasons, four prediction windows
[◦ C] spaced 90 days apart are illustrated in Fig. 4. A benchmark
Summer 1.33 (0.91) 3.30 (2.27) 1.74 (1.20) 4.76 (3.27) model, naive mode, is used for comparison. The naive model
Autumn 1.12 (0.77) 10.4 (7.15) 1.36 (0.93) 2.39 (1.64) uses the last measured temperature for the entire 24-h fore-
Winter 1.64 (1.13) 7.30 (5.02) 2.03 (1.40) 4.01 (2.76) cast. The naive model does not made assumptions about the
Spring 1.73 (1.19) 3.00 (2.06) 2.25 (1.55) 4.24 (2.91) future state and is completely uninformed. The root mean
Mean 1.45 6.00 1.84 3.85 squared errors confirm the neural network performs signifi-
Std. dev 0.244 3.06 0.333 0.886 cantly better than the naive model in all instances (Table 4)
with an average error of 1.45◦ C and 6.00◦ C for the neural
network and naive forecast respectively.
a single-hour forecast (see Fig. 2). This process is repeated To contextualise the performance, the neural network
across the entire test dataset and 7,772 single-hour predictions was compared to performance metrics from the Met
are generated. The root mean, mean absolute and maximum Office. The 24-h predictions produced by the neural net-
errors were 0.89◦ C , 0.62◦ C and 12.81◦ C respectively. work were in 72.9% of all instances accurate to ±2◦ C .
By comparison, the Met Office states 92.5% of its 24-h
temperature predictions are accurate to ±2◦ C while 92%
4 Results of 24-h wind speed predictions are within 5 knots [41].
Note that measurements used in the weather stations were
4.1 24‑h temperature forecast acquired with a resolution of ± 0.1◦ C (Fig. 5).
A better statistical comparisons is done by looking at the
To predict 24-h, a comparison was initially made between probability density functions of the predicted and measured data.
the single-step (predict 24 h in one step) and multi-step pre- The 96 individual forecasts are derived from the four windows in
diction models to assess the impact of error propagation, see Fig. 4. These points were used to compute a distribution function
Fig. 3. Table 3 shows that the multi-step model prediction and are compared to the measured temperature distribution for

Fig. 5  Temperature probability


density functions at Kew Gar-
dens. Full temperature dataset
of 52,608 samples and two
predicted and measured distrin-
butions from 96 samples

13
A flexible and lightweight deep learning weather forecasting model 24997

Fig. 6  RMSE of Model A pre-


dictions against forecast length.
The error bar correspond to
min/max RMSE in the windows

the same period, while the entire yearly data was used to cre- optimisation, the model should not be used for predictions
ate a benchmark. The 96-sample measured temperature peak exceeding one day.
is wider than the predicted peak indicating that predictions are
conservative with both curves demonstrating bimodal behaviour. 4.2 72‑h temperature, relative humidity and wind
Nonetheless, the predicted and measured distribution agree very velocity forecasts
well, except tails on very hot days. Outlier temperatures above
40 ◦ C were measured that are not predicted. The Model B setup, is shown in Table 5. The main differ-
Using the same network (Model A), the length of fore- ence with Model A is the addition of a linear layer within
cast was varied next to understand the deterioration of the the hidden layer and a reduction in the dropout percentage
predictions without adapting the model and parameters. to 10%. The hyperparameters used in the optimised model
Ten different forecast lengths were tested ranging from one are recorded in Table 6.
to 168 h (seven days). The RMSE mean and standard devi- As the first model, an increase in the number of epochs
ation are plotted against forecast length in Fig. 6 to indicate resulted in a reduction in the error and increase of the
uncertainty for increasing forecast lengths. For consistency, r-square value. However, there was no direct correlation
each prediction was run with a single epoch rather than between optimisation of these two parameters and how the
attempting to optimise performance by identifying the most 72-h forecast performed over different time periods. There-
suitable number of epochs for each forecast length. The fore, once a capable architecture was identified, a similar
single hour prediction has the smallest mean and standard trial-and-error approach began to optimise the hyperparam-
deviation, both of which increase as the forecast length eters and context length based on the RMSE from the four
increases, but become more stable after 24 h. 1–24 h pre- windows. Initially, 120 h were used for the context length
dictions have a mean error less than 3◦ C . Beyond 24 h, the but later changed to 168 h as this gave optimal performance.
prediction uncertainty continues to increase before rapidly After upwards of twenty iterations with different conditions,
converging around 4 ◦ C . While there are many caveats to the hyperparameters listed in Table 6 resulted in the best
this information, the results suggest that, without further
Table 6  The finalised hyperparameters used to train Model B includ-
ing the number of epochs and optimiser settings

Table 5  Architecture of Bi-LSTM model, Model B, including the Parameter Value


number and type of layers and nodes in each layer
Context Length 168 h
Layer Type Value Shape Parameters Gradient Optimisation Adaptive Moment Esti-
mation (ADAM)
Input - - (168 × 12) 0
Learning rate 0.001
Hidden Bi-LSTM Tanh activation func- (4 × 640) 852,480
tion Loss: model training Mean squared error
Hidden Linear ReLU activation func- (4 × 256) 164,096 Metrics: test data evaluation Root mean squared error
tion Epochs 1
Hidden Dropout 0.10 (4 × 256) 0 Batch size 4
Output Linear - (4 × 12) 3,048 Run time 187 s
Total 1,019,660 Train, validate and test split 0.7, 0.15 and 0.15

13
24998 G. Zenkner, S. Navarro‑Martinez

Fig. 7  72-h forecast of the air temperature at Heathrow during four days in different seasons. Symbols same as Fig. 4

performance. * Once the model was trained, it was possible standard deviation 2.26◦ C and 0.316◦ C respectively, with
to make new predictions rapidly, within 15 s. The single- 79.5% of the temperature forecasts are within ±3◦ C when
step hourly prediction RMSE was 0.94◦ C, MAE 0.68◦ C and making a 72-h forecast (compared to 1.45◦ C and 0.244◦ C in
maximum error 14.94◦ C when calculated over the entire test single day prediction) (Table 7).
dataset. While the numbers are comparable to the single- Figure 8 shows predicted distribution for 72-h forecasts.
hour predictions generated in Model A, the model did not Despite the qualitatively good agreement, the modelled dis-
perform quite as well over three days as one day. This is to tribution has a narrower peak with extreme high tempera-
be expected as the forecast window is three times longer and tures underestimated (similarly to model A), showcasing the
the likelihood of error propagation is much higher. difficulty to represent the tails of the distribution.
The four windows in Fig. 7 illustrate how the Bi-LSTM The model takes in all features from both locations result-
and linear model is highly capable of making predictions ing in six unique features and 12 features in total. As before,
with excellent generalisability across different periods and it is possible to generate a prediction for any one of the fea-
seasons. The three day forecast resulted RMSE mean and tures introduced to the model in training. While the model

Table 7  Root mean squared RMSE [◦ C] MAE [◦ C] Max. Error [◦ C]


error (RMSE), mean average
error (MAE) and maximum Kew G Heathrow Kew G Heathrow Kew G Heathrow
errors for the 72-h temperature
prediction (Fig. 7), values in Winter 2.22 (0.78) 1.80 (0.63) 1.79 (0.63) 1.44 (0.51) 6.25 (2.2) 4.11 (1.45)
parentheses are normalised Autumn 3.02 (1.06) 2.27 (0.80) 2.51 (0.88) 1.89 (0.67) 7.64 (2.70) 5.24 (1.85)
RMSE Summer 3.41 (1.20) 2.69 (0.95) 2.80 (0.99) 2.03 (0.72) 7.10 (2.50) 6.66 (2.35)
Spring 2.70 (0.95) 2.31 (0.82) 2.12 (0.75) 1.87 (0.66) 5.25 (1.85) 5.31 (1.87)
Mean 2.83 2.26 2.31 1.81 6.56 5.33
Std. dev 0.436 0.316 0.383 0.221 0.910 0.904

13
A flexible and lightweight deep learning weather forecasting model 24999

Fig. 8  Temperature probability density functions (left) and scatter plot (right) at Heathrow

does take all inputs into consideration during training and all 12 features is used when minimising the loss, assigning
seeks to minimise the loss function with respect to all fea- different levels of importance to each feature. During the
tures, the performance arising from this approach does not training of Model B, the objective was to optimise the 72-h
necessarily translate into good generalisability across all temperature predictions, there was no guarantee that this
timescales. When training the model, the weighted sum of performance would translate into comparable performance

Fig. 9  72-h forecast of the relative humidity at Heathrow during four days in different seasons

13
25000 G. Zenkner, S. Navarro‑Martinez

Table 8  Root mean squared RMSE [%] MAE [%] Max. Error [%]
error (RMSE), mean average
error (MAE) and maximum Kew G Heathrow Kew G Heathrow Kew G Heathrow
errors for the 72-h relative
humidity prediction (Fig. 7), Winter 8.78 7.46 6.33 5.4 22.9 20.6
values in parentheses are Autumn 9.48 8.25 7.30 6.21 21.9 19.6
normalised RMSE Summer 28.0 29.1 22.2 23.43 58.6 61.6
Spring 11.9 11.5 8.43 8.51 36.2 33.4
Total 58.1 56.3 44.3 43.6 139.6 135.2
Average 14.5 14.0 11.1 10.9 34.9 33.8

for another feature, in this case relative humidity. The accu- humidity and wind speed. Despite a three-fold increase
racy of the results in Fig. 9 are a byproduct of the process to in the forecast length, the model was able to accurately
optimise the air temperature. If the relative humidity were predict air temperature with an RMSE of 2.26◦ C at Heath-
the focus of the optimisation, the forecast prediction would row and was able to predict the temperature accurately to
probably show considerable improvement (Table 8). within ±3◦ C in 79.5% of instance. It was able to predict
the relative humidity in the same location with an RMSE
of 14%. However, Model B was optimised with respect to
5 Conclusions and future work air temperature which impacted the accuracy.
The flexibility and speed of the model makes it attrac-
This paper presented a novel, flexible, deep learning local tive to short-term local forecast in locations where weather
weather forecasting. The approach is capable of rapidly stations are present but it maybe difficult to have accurate
predicting weather features and generating cheap, reliable weather predictions (due to topography, local effects, etc.)
short duration forecasts. The model is purely data-driven, The result show that predictions up to three days have accu-
in contrast with earlier approaches that required varying racy comparable to expensive numerical weather predic-
degrees of data assimilation or hybrid model. A total of tions. However, featured-based optimisation may be required
two models were trained and used to predict air tempera- to improve the accuracy of features such as wind speed or
ture and relative humidity. The dataset used to train the humidity. Future lines of research will be in this direction.
models contained six years of historical weather observa-
tions from Kew Garden and Heathrow weather observa-
Authors contribution G.Z and S.NM contributed to the conception of
tion stations in London. The objective of having multiple the presented idea. G.Z. wrote the main text, performed the simulations
locations is to infer a topographical representation for the and prepared all figures. S.NM revised the manuscript. All authors
model to learn from. As the two weather observation sta- reviewed the manuscript.
tions are positioned 11 km apart, it is expected that they
Data availability The weather hourly data that support the findings of
would share similar weather characteristics. Discrepancies this study are publicly available in the NCAS British Atmospheric Data
in wind speed and humidity between the location could be Centre, https://​catal​ogue.​ceda.​ac.​uk/
explained by local land features and artificial structures.
Kew Gardens is positioned near the river Thames in a built- Declarations
up area while the nearest body of water to Heathrow is sev-
Ethical and informed consent for data used Not Applicable.
eral kilometres away. Heathrow observation station is situ-
ated within the airport boundaries with few obstructions. Competing Interests The authors have no conflict of interest.
Model A is a 24-h prediction network designed to pre-
dict air temperature. This model was intended to demon- Open Access This article is licensed under a Creative Commons Attri-
strate proof of concept and was trained with wet bulb, air bution 4.0 International License, which permits use, sharing, adapta-
and dew point temperatures. The Model A achieved its tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
objective of establishing a baseline for further predictions. provide a link to the Creative Commons licence, and indicate if changes
It showed that air temperature could be predicted with were made. The images or other third party material in this article are
reasonable accuracy compared to the Met Office, predict- included in the article's Creative Commons licence, unless indicated
ing the air temperature within a range of 2 ◦ C in 72.9% otherwise in a credit line to the material. If material is not included in
the article's Creative Commons licence and your intended use is not
of instances with a maximum error of 3.85◦ C occurring permitted by statutory regulation or exceeds the permitted use, you will
mostly in very hot days. Model B is a 72-h prediction need to obtain permission directly from the copyright holder. To view a
network that attempted to predict air temperature, relative copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

13
A flexible and lightweight deep learning weather forecasting model 25001

References 22. Yang M, Wang J (2022) Adaptability of Financial Time Series


Prediction Based on BiLSTM. Procedia Comput Sci 199:18–25
23. Murali P, Revathy R, Balamurali S, Tayade AS (2020) Inte-
1. Lynch P (2008) The origins of computer weather prediction and gration of RNN with GARCH refined by whale optimization
climate modeling. J Comput Phys 227(7):3431–3444 algorithm for yield forecasting: a hybrid machine learning
2. Scher S, Messori G (2018) Predicting weather forecast uncertainty approach. J Ambient Intell Humaniz Comput. https://​doi.​org/​
with machine learning. Q J R Meteorol Soc 144(717):2830–2841 10.​1007/​s12652-​020-​01922-2
3. Rasp S, Dueben PD, Scher S, Weyn JA, Mouatadid S, 24. Nayak JR, Shaw B, Sahu BK (2022) A fuzzy adaptive symbiotic
Thuerey N (2020) WeatherBench: a benchmark data set for organism search based hybrid wavelet transform-extreme learn-
data-driven weather forecasting. J Adv Model Earth Syst ing machine model for load forecasting of power system: a case
12:e2020MS002203. https://​doi.​org/​10.​1029/​2020M​S0022​03 study. J Ambient Intell Humaniz Comput. https://​doi.​org/​10.​
4. Bauer P, Thorpe A, Brunet G (2015) The quiet revolution of 1007/​s12652-​022-​04355-1
numerical weather prediction. Nature 525(7567):47–55 25. Weyn JA, Durran DR, Caruana R (2019) Can machines learn to
5. Rihan FA, Collier CG, Roulstone I (2005) Four-dimensional predict weather? using deep learning to predict gridded 500-hpa
variational data assimilation for doppler radar wind data. J geopotential height from historical weather data. Adv Model
Comput Appl Math 176(1):15–34. https://​d oi.​o rg/​1 0.​1 016/j.​ Earth Syst 11(8):2680–2693
cam.​2004.​07.​003 26. Roy DS (2020) Forecasting the air temperature at a weather sta-
6. Brunton SL, Noack BR, Koumoutsakos P (2020) Machine learn- tion using deep neural networks. Procedia Comput Sci 178:38–46
ing for fluid mechanics. Annu Rev Fluid Mech 52(1):477–508 27. Ravuri S, Lenc K, Willson M, Kangin D, Lam R, Mirowski P,
7. Vinuesa R, Brunton SL (2022) Enhancing computational fluid Fitzsimons M, Athanassiadou M, Kashem S, Madge S, Prudden
dynamics with machine learning. Nat Comput Sci 2(6):358–366 R, Mandhane A, Clark A, Brock A, Simonyan K, Hadsell R,
8. Sarghini F, de Felice G, Santini S (2003) Neural networks based Robinson N, Clancy E, Arribas A, Mohamed S (2021) Skilful
subgrid scale modeling in large eddy simulations. Comput Flu- precipitation nowcasting using deep generative models of radar.
ids 32(1):97–108 Nature 597(7878):672–677
9. Prat A, Sautory T, Navarro-Martinez S (2020) A priori sub-grid 28. Casper, Espeholt L, Heek J, Dehghani M, Oliver A, Salimans
modelling using artificial neural networks. Int J Comput Fluid T, Agrawal S, Hickey J, Kalchbrenner N (2020) Metnet: A neu-
Dyna 34:6:397–417. https://​doi.​org/​10.​1080/​10618​562.​2020.​ ral weather model for precipitation forecasting. arXiv pre-print
17891​16 server. https://​arxiv.​org/​abs/​2003.​12140. Accessed Mar 2023
10. Milano M, Koumoutsakos P (2002) Neural Network Modeling 29. Espeholt L, Agrawal S, Sønderby C, Kumar M, Heek J, Bromb-
for Near Wall Turbulent Flow. J Comput Phys 182(1):1–26 erg C, Gazen C, Carver R, Andrychowicz M, Hickey J, Bell A,
11. Schizas C, Michaelides S, Pattichis C, Livesay R (1991) in 1991 Kalchbrenner N (2022) Deep learning for twelve hour precipi-
Second International Conference on Artificial Neural Networks, tation forecasts. Nat Commun 13(1):5145. https://​doi.​org/​10.​
pp. 112–114 1038/​s41467-​022-​32483-x
12. Ochiai K, Suzuki H, Shinozawa K, Fujii M, Sonehara N (1995) 30. Fu Q, Niu D, Zang Z, Huang J, Diao L (2019) in 2019 Chinese
in Proceedings of ICNN’95 - International Conference on Neu- Control Conference (CCC), pp. 3771–3775
ral Networks, vol. 2, pp. 1182–1187 31. Son Y, Zhang X, Yoon Y, Cho J, Choi S (2022) LSTM-GAN
13. Hewage P, Trovati M, Pereira E, Behera A (2021) Deep learn- based cloud movement prediction in satellite images for PV
ing-based effective fine-grained weather forecasting model. Pat- forecast. J Ambient Intell Humaniz Comput. https://​doi.​org/​10.​
tern Anal Appl 24(1):343–366 1007/​s12652-​022-​04333-7
14. Ma X, Fang C, Ji J (2020) Prediction of outdoor air temperature 32. Danandeh Mehr A, Rikhtehgar Ghiasi A, Yaseen ZM et al (2023)
and humidity using xgboost. IOP Conf Ser Earth Environ Sci A novel intelligent deep learning predictive model for meteoro-
427(1):012–013 logical drought forecasting. J Ambient Intell Human Comput
15. Slingo J, Palmer T (2011) Uncertainty in weather and cli- 14:10441–10455. https://​doi.​org/​10.​1007/​s12652-​022-​03701-7
mate prediction. Phil Trans R Soc A Math Phys Eng Sci 33. Sengar S, Liu X (2020) Ensemble approach for short term load
369(1956):4751–4767 forecasting in wind energy system using hybrid algorithm. J
16. Frnda J, Durica M, Rozhon J et al (2022) ECMWF short-term Ambient Intell Humaniz Comput 11(11):5297–5314. https://​
prediction accuracy improvement by deep learning. Sci Rep doi.​org/​10.​1007/​s12652-​020-​01866-7
12:7898. https://​doi.​org/​10.​1038/​s41598-​022-​11936-9 34. Singh U, Rizwan M (2022) SCADA system dataset exploration
17. Shao B, Song D, Bian G, Zhao Y (2021) Wind speed forecast and machine learning based forecast for wind turbines. Results
based on the lstm neural network optimized by the firework Eng 16:100,640
algorithm. Adv Mater Sci Eng 2021:1–13 35. Mujeeb S, Alghamdi TA, Ullah S, Fatima A, Javaid N, Saba
18. Dong X, Deng S, Wang D (2022) A short-term power load fore- T (2019) Exploiting deep learning for wind power forecasting
casting method based on k-means and SVM. J Ambient Intell based on big data analytics. Appl Sci 9:4417. https://​doi.​org/​10.​
Humaniz Comput 13(11):5253–5267. https://​doi.​org/​10.​1007/​ 3390/​app92​04417
s12652-​021-​03444-x 36. Torres JM, Aguilar RM, Zuñiga-Meneses KV (2018) Deep
19. Fasil OK, Rajesh R (2022) Epileptic seizure classification using learning to predict the generation of a wind farm. J Renew Sus-
shifting sample difference of EEG signals. J Ambient Intell tain Energy 10(1):013305
Humaniz Comput. https://​doi.​org/​10.​1007/​s12652-​022-​03737-9 37 Gers JS, Felix A, Cummins F (2000) Learning to forget: Con-
20. Gupta V (2023) Wavelet transform and vector machines as tinual prediction with lstm. Neural Comput 12(10):2451–2471
emerging tools for computational medicine. J Ambient Intell 38. Met-Office (2006) UK daily temperature data, part of the
Humaniz Comput 14(4):4595–4605. https://​doi.​org/​10.​1007/​ met office integrated data archive system (midas).ncas brit-
s12652-​023-​04582-0 ish atmospheric data centre, date of citation. https://​catal​ogue.​
21. Wang P, Fan E, Wang P (2021) Comparative analysis of image ceda. ​ a c. ​ u k/ ​ u uid/ ​ 1 bb47 ​ 9 d3b1​ e 38c3 ​ 3 9adb ​9 c82c ​ 1 5579 ​ d 8?_​
classification algorithms based on traditional machine learning ga=2.​18571​0890.​20716​99676.​16773​53726-​89943​0677.​16773​
and deep learning. Pattern Recogn Lett 141:61–67 53726. Accessed Mar 2023

13
25002 G. Zenkner, S. Navarro‑Martinez

39. Wang JQ, Du Y, Wang J (2020) Lstm based long-term Publisher's note Springer Nature remains neutral with regard to
energy consumption prediction with periodicity. Energy jurisdictional claims in published maps and institutional affiliations.
197(117):197
40. Kreuzer D, Munz M, Schlüter S (2020) Short-term temperature
Gabriel Zenkner received his MSc at Imperial College London in
forecasts using a convolutional neural network - an applica-
Advanced Mechanical Engineering in 2022. His current research inter-
tion to different weather stations in germany. Mach Learn Appl
ests include data science, data engineering and prediction of physical
2:100,007
and social phenomena using machine learning techniques.
41. Met-Office (2022) How accurate are our public forecasts?
https:// ​ w ww. ​ m etof ​ f ice. ​ g ov. ​ u k/ ​ a bout- ​ u s/ ​ w hat/ ​ a ccur ​ a cy-​
Salvador Navarro‑Martinez is a reader at Imperial College London in
and-​t rust/​h ow-​a ccur​a te-​a re-​o ur-​p ublic-​forec​a sts . Accessed
the Department of Mechanical Engineering. His research interests lie
30/08/2022
in the modelling of turbulent flows using stochastic and machine learn-
ing techniques.

13

You might also like