Manuscript InGARSS
Manuscript InGARSS
0
Temperature 100% C Weather 8.5 76.5 2019-2021
Surface Pressure 100% mb Weather 8.5 76.5 2019-2021
Precipitation 100% mm Weather 8.5 76.5 2019-2021
U & V Component of 100% m/s Weather 8.5 76.5 2019-2021
Wind Velocity
Relative Humidity 100% % Weather 8.5 76.5 2019-2021
ZTD 100% m GNSS 8.5 76.5 2019-2021
Reflectivity low dBZ Radar 8.5 76.5 2019-2021
Fig. 2: Seasonal Variation Over a Three-Year Period: Trends and Patterns
3. THEORETICAL BACKGROUND: LSTM 1. Forget Gate: The forget gate determines which
information in the cell state should be discarded or
3.1. Background Theory preserved. It analyses the previous cell state and
determines which pieces of information are no longer
LSTM networks are a sophisticated form of RNNs relevant to the current context and should be
specifically designed to address the limitations of standard "forgotten." By doing so, the forget gate helps prevent
RNNs, particularly in retaining and modelling long-term the cell from becoming cluttered with outdated or
dependencies in sequential data. Conventional RNNs unnecessary data, ensuring that only useful
frequently encounter difficulties in learning such information is carried forward in the sequence [11]. It
dependencies due to problems like vanishing or exploding is computed as:
gradients. These problems arise during the backpropagation
process, where gradients either diminish to near zero or ft = (W f [ht −1 , xt ] + b f ) (1)
grow exponentially, leading to unstable training and
difficulty in retaining information over extended sequences where ft is the output of the forget gate at any time step t,
[7]. W f is the weight matrix, ht −1 is the hidden state of the
LSTM networks address these challenges by
incorporating a sophisticated mechanism of gates within previous time step, xt is the current input, and b f is the
their architecture. Specifically, LSTMs utilize three
bias term. The sigmoid function σ ensures that the gate
primary gates—namely, the input gate, forget gate, and
output is between 0 and 1.
output gate. These gates work in concert to carefully
2. Input Gate: The input gate controls the amount of new
manage the flow of information within the network,
information from the current input that is incorporated
deciding which information to keep, update, or discard as it
into the cell state. It decides what portion of the
processes the sequence. This gating mechanism allows
incoming data is relevant and should be considered for
LSTM networks to maintain a stable gradient, effectively
updating the memory. This gate plays a crucial role in
capturing dependencies across long sequences and enabling
determining which parts of the new input should
the model to remember information over much longer
influence the cell state, allowing the network to
periods, making them particularly well-suited for tasks
selectively update its knowledge based on the most
involving time series data or sequences where long-term
important aspects of the new information [11]. It is
context is crucial. [7].
computed as:
Mathematically, an LSTM unit is composed of several
components: it = (Wi [ht −1 , xt ] + bi ) (2)
The candidate cell state Ct is generated using a tanh learning to compress input data and subsequently
reconstruct it, enabling the identification of latent patterns
activation function:
within the data that may improve model performance when
Ct = tanh(WC [ht −1 , xt ] + bC ) (3) integrated with LSTM for predictive tasks.
The cell state is then updated as: For example, [12], it has utilized autoencoders in
combination with LSTM to enhance weather forecasting by
Ct = ft Ct −1 + it Ct (4) reducing the noise in satellite and ground-based
3. Output Gate: The output gate controls what part of measurements, resulting in more accurate predictive
the cell state should be exposed as the output at each models.
time step. This gate determines how much of the The model architecture includes an input layer, 2
current cell state should be passed on to the next layer LSTM layers, one repeat vector layer and 2 dense output
or as the final output of the LSTM unit. It ensures that layers. The input layer accepts time-series data from
the output reflects the most relevant information while various meteorological sources, while the LSTM layers
keeping other parts of the cell state intact for future process this data to learn temporal patterns. The dense
time steps. This selective process enables the LSTM to output layer generates predictions for the target variable—
focus on producing meaningful outputs that are precipitation.
informed by both the current input and the long-term
context stored in the cell state [11]. It is calculated as: 3.2.2. Model Post Processing
ot = (Wo [ht −1 , xt ] + bo ) (5) To refine the model's predictions, several post-processing
techniques were applied. These included:
The ht , the hidden state of the current time step is then a) Normalization of outputs to align with the range
computed as: of observed values.
ht = ot tanh(Ct ) (6) b) Application of correction models to adjust for
systematic biases observed during testing. Various
These equations allow LSTM networks to maintain long- regression models, such as Polynomial and Ridge
term memory over time, making them highly effective for Regression, were evaluated for this purpose and
tasks involving sequential data, such as weather Extra Trees Regression model was chosen.
nowcasting.
3.2.3. Model Training and Evaluation
The LSTM model was trained using a design of
experiments approach, which involved extensive
hyperparameter tuning. The hyperparameters adjusted
included the number of epochs, number of LSTM units,
batch size, and the input sequences length. The optimal
configuration was determined to be 70 epochs, a batch size
of 32, 128 dense neurons, and 128 LSTM units, with an
Fig. 3: LSTM Architecture input sequence length of 12-time steps.
The model's loss function, which guides the
3.2. Design of Experiment optimization process during training, was also carefully
selected. After comparing several options, including mean
3.2.1. Model Preparation squared error and mean squared logarithmic error, the
The foundation of the predictive model is an LSTM Huber loss function was chosen due to its robustness to
network, specifically designed to effectively capture long- outliers. The final model achieved a test loss of 3.31×10-5,
term dependencies present in weather-related data In this indicating its ability to generalize well to unseen data.
study, we utilize LSTM autoencoders—sophisticated
neural network structures that are employed to learn and 3.2.4. Nowcasting Performance Assessment
extract efficient representations of data in an unsupervised Model nowcasting performance was assessed using several
manner An autoencoder is typically divided into two metrics:
primary components: the encoder and the decoder. The a) Probability of Detection (POD): POD evaluates
encoder functions to reduce or transform the input data into the ability of the model to correctly predict the
a lower-dimensional latent space, capturing the essential occurrence of extreme weather events.
characteristics of the data. In contrast, the decoder is b) False Alarm Ratio (FAR): Assesses the
responsible for reconstructing the original input from this proportion of incorrect predictions relative to all
compressed latent representation. This architecture is positive predictions.
particularly advantageous when dealing with complex c) Critical Success Index (CSI): The Critical
datasets, such as weather data, as it allows for significant Success Index (CSI) is a statistical measure used
dimensionality reduction while maintaining the integrity of to evaluate the accuracy of forecasts, particularly
crucial data features [4]. Autoencoders are commonly used in meteorology. It is calculated by comparing the
for anomaly detection, noise reduction, and feature number of correct predictions of an event (hits)
extraction. They generally have the characteristics of with the total number of instances where the event
was either predicted or actually occurred. Initially, the RMSE is observed more than one for the
Specifically, CSI is the ratio of the number of true nowcasted results which is further corrected using Extra
positive forecasts (hits) to the sum of hits, false trees Regressor corrector model [13]. It is evident from
alarms (events predicted but not occurred), and figure 4 that nowcasting has a similar trend to that of
misses (events occurred but not predicted). This observation, showing a bias due to intensity of the rain.
metric provides a balanced assessment of forecast Further, the trend ceases to follow the trend leading to
performance, taking into account both the higher RMSE in further output time step.
successes and the errors, making it particularly Incorporating multi-source data as well fine
useful in scenarios where both false alarms and tuning the hyperparameters was pivotal to the model's
missed events need to be minimized [9]. success. The RMSE corresponding to the 12 input steps
The model demonstrated a high POD and CSI, with a comes at the lowest and therefore, adopted for this study.
FAR of 9.19%, underscoring its effectiveness in The configuration that included weather data, GNSS-
nowcasting severe weather. derived ZWD, and radar achieved the highest correlation
(0.9988) post-correction, and the lowest Root Mean Square
4. RESULTS AND DISCUSSION
Error (RMSE) of 0.4337, and a Mean Absolute Error
The results of the study indicate that the LSTM-based (MAE) of 0.0343 for the first time step i.e. one hour. These
nowcasting model is highly effective in predicting rainfall results highlight the importance of data diversity in
events. The model's performance varied with lead time, improving nowcasting accuracy. For both one and two
with higher accuracy observed at shorter lead times. The output time-steps, results indicate a POD exceeding 90%, a
integration of weather data (WS), GNSS and radar data was FAR and a CSI better than 9%, and of 90% respectively.
particularly beneficial, enhancing the model's ability to
detect the onset and intensity of convective storms.
Fig. 4: Comparison of actual precipitation and predicted precipitation (before and after correction)
Table 2: Result metrics before and after correction for different output length
Version I/P O/P Loss Correlation Correlation RMSE RMSE
Sequence Before After Before After
WS+GNSS+RADAR 12,2 0.00010469 0.9901 0.998 1.293 0.465
WS+GNSS+RADAR 12,1 0.0000334 0.991 0.998 1.135 0.437
Figure 5: Loss function curve