A Framework for Automatic Validation and Application of Lossy Data Compression in Ensemble Data Assimilation

Kai Keller1, Hisashi Yashiro2, Mohamed Wahib3, Balazs Gerofi4, Adrian Cristal Kestelman1, Leonardo Bautista-Gomez1 1Barcelona Supercomputing Center (BSC-CNS), Spain
{kai.keller, leonardo.bautista, adrian.cristal}@bsc.es
2National Institute for Environmental Studies, Japan
[email protected]
3National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
[email protected]
4RIKEN, Japan
[email protected]
Abstract

Ensemble data assimilation techniques form an indispensable part of numerical weather prediction. As the ensemble size grows and model resolution increases, the amount of required storage becomes a major issue. Data compression schemes may come to the rescue not only for operational weather prediction, but also for weather history archives. In this paper, we present the design and implementation of an easy-to-use framework for evaluating the impact of lossy data compression in large scale ensemble data assimilation. The framework leverages robust statistical qualifiers to determine which compression parameters can be safely applied to the climate variables. Furthermore, our proposal can be used to apply the best parameters during operation, while monitoring data integrity. We perform an exemplary study on the Lorenz96 model to identify viable compression parameters and achieve a 1/3 saving in storage space and an effective speedup of 6% per assimilation cycle, while monitoring the state integrity.

I Introduction

Ensemble methods have become increasingly important for numerical weather and climate prediction. One of the main reasons for this is the encoded statistics in the ensemble. Whereas the widely used four-dimensional variational data assimilation (4D-var) often leads to more accurate predictions, it does not provide a simple way for building assumptions on the prediction uncertainty [29]. Therefore, it is frequently combined with an ensemble method. On the other hand, ensemble methods are often preferred from the beginning. For instance, due to the typically less implementation effort or because they constitute the better fit in certain cases [14, 25, 26]. Ensemble data assimilation with large ensembles and large models requires high performance I/O [19, 36, 11]. This is due to the large amount of data that needs to be circulated between different constituents of the assimilation system.

A widely used workflow in ensemble data assimilation is to perform the climate simulation and the data assimilation on separate executables [41]. The ensemble members (i.e., the climate simulations) store the climate states to the file system, and after all simulations have finished, the data assimilation system reads the files, assimilates the observations and writes back the improved states to storage. The ensemble members then reread them to perform the next assimilation cycle. The amount of data transferred between the two steps often leads to an I/O bottleneck, where the storage subsystem cannot deliver the throughput that is needed to keep up with the computing power. In some cases, I/O can be overlapped with computation and be performed in the background, which alleviates the I/O overhead itself. The states can also be transferred through the network, bypassing the I/O layer. However, this approach is limited by the memory available and raises fault tolerance issues, since such approaches typically require large monolithic MPI allocations.

However, reducing the data to minimize storage requirements and storage space availability for other users is still beneficial. A recent example of storage based state circulation between simulation and data assimilation system has been published by Yashiro et al . [40]. The article presents the execution of the NICAM-LETKF system on Fugaku [30], using 82% of the entire system. During each cycle, the system circulates more than 400 TB of data through the parallel file system. Yashiro et al. also compare double to mixed precision executions, where the computing times with mixed precision show a 1.6x speedup compared to double precision. This demonstrates the prospect of departing from the double precision doctrine in climate science. There are a number of works studying explicitly the impact and advantage of mixed precision in data assimilation [16, 15, 28]

Besides using mixed precision, data can be reduced by using compression schemes. Data can be compressed in a lossless fashion, i.e., without any loss in information, or through lossy methods, i.e., with a certain loss in accuracy. Since climate systems are highly non-linear and typically chaotic, it is essential not to introduce significant perturbations when applying compression to the states. On the other hand, the models and the observations are both imperfect and inevitably introduce errors to the states. This means that the intrinsic precision of the states can be lower than the precision of the data type used in the application. Indeed, multiple previous studies have shown that climate simulations can tolerate certain loss in data precision [4, 7, 6, 31].

In this work, we propose a framework that: (1) explores the impact of lossy data compression of the climate states and the ensemble consistency through time (i.e., the error propagation), and (2) dynamically selects the best compression parameters during operational mode. The former constitutes the validation mode of the framework and the latter is the dynamic mode. In the validation mode, we generate and store validation data each cycle. The collected data provides a means to evaluate the impact of inconsistency in the states and ensemble through time. We provide a JSON configuration file to conveniently set the desired compression parameters. During the dynamic mode, the compression parameters are applied to the states to reduce their size before writing them to the file system. Optionally, the states can be tested by a validation function, before writing them, to ensure their consistency. The framework allows easy interfacing and is build on top of the ensemble data assimilation architecture of Melissa-DA. Thus, after exposing the simulation variables to Melissa-DA and a few more steps, it can be operated with the two modes from above, in combination with the wide range of ensemble data assimilation methods that are contained in Melissa-DA.

The rest of the paper is organized as follows. Section II provides background information on ensemble data assimilation techniques. Section III presents the design and implementation of the proposed framework and explains its usage, and Section IV provides experimental evaluation. Related work is surveyed in Section V. Finally, Section VI concludes the paper.

II Background

In this section, we outline the basic concepts behind our work and clarify the terminology that we use.

II-A Ensemble Data Assimilation

Data assimilation is based on Bayes’ theorem, allowing us to combine the information from both real world observation and numerical model states. Combining the information from both sources leads to an improved accuracy of the state [39]. Ensemble data assimilation follows a Monte Carlo approach, approximating the state mean and covariance by the moments of a statistically significant sample of states. The most common ensemble methods for data assimilation are the ensemble Kalman filter (EnKF) [12] and the particle filter (PF)  [38]. To address the issues that arise from relatively small ensemble sizes given the high dimensionality of the climate states, modified flavors of the original versions are used, e.g., LETKF [17] or LAPF [32], and others.

One important difference between particle filtering and ensemble Kalman filtering is that the particle states do not change during the filter update. The particles get assigned a weight and particles with small weights are discarded. Particles with high weights will be selected for propagation during the next cycle. The particle filter implementation in Melissa-DA uses sequential importance resampling (SIR), where particles with high weights are multiplied to avoid ensemble shrinking. This technique relies on a randomization of the model, reaching different output states when starting from identical particles. This is achieved either by randomization during the model evolution, or by introducing small perturbations to the input particle states before the propagation [21, 38].

II-B Terminology

In spite of the same origin, being both Monte Carlo methods, the terminology used for ensemble Kalman filters and particle filters is quite different. To avoid confusion, we will introduce the terminology that we use in this work. We implemented our validation framework into the particle filter of Melissa-DA 111The validation methods that we use, however, are perfectly suitable for other ensemble methods as well.. For this reason, we will use the particle filter terminology. We refer to a simulation state as particle state, or sometimes just particle or state. The workflow of ensemble data assimilation is divided into assimilation cycles. Each cycle comprises the propagation step and the update step. During the propagation step, the climate states are advanced by the numerical model (a.k.a., propagated). During the update step, the model states are improved by the filter update, assimilating the observations. In particle filtering, the update step is called sampling, or resampling.

III Design and Implementation

In this section, we first introduce the particle filter implementation of Melissa-DA, which serves as testbed for our work, and then explain implementation and operation of our proposed framework in detail.

III-A Melissa-DA Particle Filter

Melissa-DA is developed to perform ensemble data assimilation at large scale. The framework comprises three modules; a launcher, a server, and multiple runners. In comparison to the common practice in ensemble data assimilation leveraging a bash script for the workflow, the launcher replaces the script and orchestrates the submission of the ensemble simulations and the data assimilation system. The launcher has plugins for the most common cluster schedulers to submit and monitor the jobs for the server and runners. In this architecture, the runners constitute a worker pool for propagating the ensemble states. The server requests available runners to perform propagation of unscheduled states and performs the data assimilation step. To ensure optimal fault tolerance, the server and each runner is allocated on separate jobs and are restarted by the launcher upon failures. Melissa-DA provides data assimilation with various flavors of ensemble Kalman filters and allows the creation of custom filter plugins. Besides the Kalman filter, the framework implements a particle filter that uses a fast distributed cache on the runner nodes, where particles can be asynchronously prefetched and cached for future propagations. The cache is maintained by dedicated MPI processes, asynchronously, and uses one process per runner node. Generally, for fault tolerance, the states need to be stored on global storage. Since this is typically slower than using the local storage, the simulation processes only write and load from local storage. The cache controller is responsible for providing the states locally and sending the states to global storage after they have been propagated.

In most particle filters, in contrast to Kalman filters, the states are not changed during the filter step. Instead, states that carry high weights (i.e., that are consistent with the observation data), are selected for the next assimilation cycle and states with low weights are discarded. This is precisely why the local cache helps to overcome the I/O bottleneck; states that have been selected for the next assimilation are still locally available on runners that have propagated them. Since the states remain unchanged during filtering, they are immediately ready for propagation. The local cache leverages FTI to store and load the states. FTI is a multi-level checkpoint/restart library that is aware of the node local storage. We implemented several compression techniques into FTI to enable the state compression while storing the state and decompression while loading it.

III-B High-Level View on the Validation Framework

Our proposed framework comprises two modes of operation. The first mode allows us to explore the impact of data compression on the integrity of the ensemble and of the states in particular. The second mode allows us to apply the best compression parameters during operation to increase performance, while respecting data consistency. We refer to the first mode as validation mode and to the second as dynamic mode. The respective modes and corresponding parameters are selected by providing a JSON configuration file (see LABEL:lst:jsoncfg).

{
"variables" : [ "state1", "state2" ],
"compression" : {
"method" : "validation",
"validation" : [
{
"mode" : "fpzip",
"parameter" : [16,24,32]
},
{
"mode" : "zfp",
"type" : "precision",
"parameter" : [32,40]
}
],
"dynamic" : [
{
"name" : "state1",
"sigma" : 10e-7,
"mode" : "zfp",
"type" : "accuracy",
"parameter" : [0,6,8,10]
},
{
"name" : "state2",
"sigma" : 10e-5,
"mode" : "fpzip",
"parameter" : [24,28,32,36,40]
}
]
}
}
Listing 1: Example of a configuration file. We set the compression parameters for two variables named state1 and state2. The framework will operate in validation mode, since the method key is set to validation. To apply the dynamic mode, the method must be set to dynamic.

Our framework operates in conjunction with the particle filter of Melissa-DA. During the particle filter update, the particle weights are normalized and P𝑃Pitalic_P particles are drawn from the resulting distribution function. P𝑃Pitalic_P remains constant during all cycles and it can be set in the Melissa-DA configuration. Hence, during each cycle we propagate the same number of particles. However, the SIR algorithm leads to a sample of only M𝑀Mitalic_M distinct particles wiuth typically M<P𝑀𝑃M<Pitalic_M < italic_P. Therefore, some particles are multiplied. To account for this, the model needs to provide some randomness to ensure that two propagations of the same particle lead to distinct output states

During the validation mode of our framework, we now propagate (C+1)P𝐶1𝑃(C+1)\cdot{}P( italic_C + 1 ) ⋅ italic_P particles, with C𝐶Citalic_C being the number of parameters that are specified in the configuration file. The configuration shown in  LABEL:lst:jsoncfg, leads to the propagation of 6 ensembles with P𝑃Pitalic_P particles each. One ensemble for the original states and 5 ensembles that use data compression with the specified parameters. This allows us to compare the ensembles with compressed data to the original ensemble. Furthermore, we can observe the evolution of each particle ensemble over time for the number of cycles specified in the Melissa-DA configuration.

In the dynamic mode we aim to improve the performance of the data assimilation system. Thus, only one ensemble with P𝑃Pitalic_P particles is propagated. During the first assimilation cycle, the compression parameters are tested using a validation function. Per default, we check the point-wise maximum error between the compressed and uncompressed state, however, a custom function can be provided by the user, or the validation can also be deactivated entirely. If using the default validation, the compression parameters that lead to a value larger than sigma (see LABEL:lst:jsoncfg) are discarded. The selected parameters are then stored and ordered by the compression rate. In subsequent cycles, the best compression parameters are successively checked by the validation function, and the first parameter that passes is used to compress the state before writing it to the file system. We allow setting the compression parameters for each variable independently. The meta-data that is required to recover the variable with the correct compression settings is maintained by FTI. A speedup is achieved, if the compression/decompression plus validation time is less than the time we save due to write/read the fewer (i.e., compressed) data.

Refer to caption
Figure 1: Workflow in validation mode.

III-C Validation Mode

For the validation mode, we leverage the Melissa-DA launcher to submit several validator instances, in addition to the Melissa-DA server and runners. As for the runners, the number of validators can be specified in the Melissa-DA configuration. Each validator is executed on one node and is parallelized leveraging all available cores on the node. Figure 1 shows the workflow in validation mode. The diagram indicates that the propagation of particles by the runners is overlapped with the validation performed on the validators. The workflow is a follows, during the propagation phase, the server schedules particles to the runners and waits for the particles weights to be returned. As soon as all particles have been propagated, the server performs the update and communicates particle ids and weights to the validators. The validators can then start with the calculation of the statistical qualifiers, where the validation always processes the states from the previous cycle.

The statistical qualifiers that we calculate are the same as in the work from Baker et al. [6], where the impact of data compression on climate states is discussed in detail. We implemented the root mean squared Z-value. Where we calculate both the value Zxc,ip,0superscriptsubscript𝑍subscript𝑥𝑐𝑖𝑝0Z_{x_{c,i}}^{p,0}italic_Z start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , 0 end_POSTSUPERSCRIPT, which encodes information on the ensemble spread, and Zxc,ip,+superscriptsubscript𝑍subscript𝑥𝑐𝑖𝑝Z_{x_{c,i}}^{p,+}italic_Z start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , + end_POSTSUPERSCRIPT, which can detect a bias on the ensemble spread introduced by the compression. The two values are calculated as follows:

Zxc,ip,0superscriptsubscript𝑍subscript𝑥𝑐𝑖𝑝0\displaystyle Z_{x_{c,i}}^{p,0}italic_Z start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , 0 end_POSTSUPERSCRIPT =xc,ipx¯0,iP/pσ0,iP/pabsentsuperscriptsubscript𝑥𝑐𝑖𝑝superscriptsubscript¯𝑥0𝑖𝑃𝑝superscriptsubscript𝜎0𝑖𝑃𝑝\displaystyle=\frac{x_{c,i}^{p}-\bar{x}_{0,i}^{P/p}}{\sigma_{0,i}^{P/p}}= divide start_ARG italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT end_ARG (1)
Zxc,ip,+superscriptsubscript𝑍subscript𝑥𝑐𝑖𝑝\displaystyle Z_{x_{c,i}}^{p,+}italic_Z start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , + end_POSTSUPERSCRIPT =xc,ipx¯c,iP/pσc,iP/pabsentsuperscriptsubscript𝑥𝑐𝑖𝑝superscriptsubscript¯𝑥𝑐𝑖𝑃𝑝superscriptsubscript𝜎𝑐𝑖𝑃𝑝\displaystyle=\frac{x_{c,i}^{p}-\bar{x}_{c,i}^{P/p}}{\sigma_{c,i}^{P/p}}= divide start_ARG italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT end_ARG (2)
RMSZXcpsuperscriptsubscriptRMSZsubscript𝑋𝑐𝑝\displaystyle\text{RMSZ}_{X_{c}}^{p}RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT =1NiN(Zxc,ip)2absent1𝑁superscriptsubscript𝑖𝑁superscriptsuperscriptsubscript𝑍subscript𝑥𝑐𝑖𝑝2\displaystyle=\sqrt{\frac{1}{N}\sum_{i}^{N}\left(Z_{x_{c,i}}^{p}\right)^{2}}= square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (3)

with:

x¯c,iP/psuperscriptsubscript¯𝑥𝑐𝑖𝑃𝑝\displaystyle\bar{x}_{c,i}^{P/p}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT =k!=pPwkxc,ikk!=pPwkabsentsuperscriptsubscript𝑘𝑝𝑃subscript𝑤𝑘superscriptsubscript𝑥𝑐𝑖𝑘superscriptsubscript𝑘𝑝𝑃subscript𝑤𝑘\displaystyle=\frac{\sum_{k!=p}^{P}w_{k}x_{c,i}^{k}}{\sum_{k!=p}^{P}w_{k}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_k ! = italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ! = italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG (4)
σc,iP/psuperscriptsubscript𝜎𝑐𝑖𝑃𝑝\displaystyle\sigma_{c,i}^{P/p}italic_σ start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT =k!=pPwk(xc,ikx¯c,iP/p)2k!=pPwkabsentsuperscriptsubscript𝑘𝑝𝑃subscript𝑤𝑘superscriptsuperscriptsubscript𝑥𝑐𝑖𝑘superscriptsubscript¯𝑥𝑐𝑖𝑃𝑝2superscriptsubscript𝑘𝑝𝑃subscript𝑤𝑘\displaystyle=\frac{\sum_{k!=p}^{P}w_{k}\left(x_{c,i}^{k}-\bar{x}_{c,i}^{P/p}% \right)^{2}}{\sum_{k!=p}^{P}w_{k}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_k ! = italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P / italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ! = italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG (5)

RMSZXcpsuperscriptsubscriptRMSZsubscript𝑋𝑐𝑝\text{RMSZ}_{X_{c}}^{p}RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is the root mean squared z-value, and constitutes the quantity that we actually evaluate. P𝑃Pitalic_P is the number of particles, Xcsubscript𝑋𝑐X_{c}italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT specifies the state variable with c𝑐citalic_c being the compression parameter-id and c=0𝑐0c=0italic_c = 0 indicating the uncompressed state. The index p𝑝pitalic_p denotes the particle-id, thus, we have one RMSZXcpsuperscriptsubscriptRMSZsubscript𝑋𝑐𝑝\text{RMSZ}_{X_{c}}^{p}RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT value for each particle, state variable and compression parameter. Finally wpsubscript𝑤𝑝w_{p}italic_w start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT denotes the weight of particle p𝑝pitalic_p. We further compute the peak signal-to-noise ratio (PSNR):

PSNRXc=20log10(max(|x0,i|)RMSEXc)subscriptPSNRsubscript𝑋𝑐20subscript10subscript𝑥0𝑖subscriptRMSEsubscript𝑋𝑐\text{PSNR}_{X_{c}}=20\log_{10}\left(\frac{\max\left(\left|x_{0,i}\right|% \right)}{\text{RMSE}_{X_{c}}}\right)PSNR start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 20 roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( divide start_ARG roman_max ( | italic_x start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT | ) end_ARG start_ARG RMSE start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) (6)

Where x0,isubscript𝑥0𝑖x_{0,i}italic_x start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT indicate the components of the uncompressed state variable X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and RMSEXcsubscriptRMSEsubscript𝑋𝑐\text{RMSE}_{X_{c}}RMSE start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the root mean squared error of the compressed to uncompressed state variable. Further, we compute the Pearson correlation coefficient:

ρXc=iN(x0,ix¯0,i)(xc,ix¯c,i)iN(x0,ix¯0,i)2iN(xc,ix¯c,i)2subscript𝜌subscript𝑋𝑐superscriptsubscript𝑖𝑁subscript𝑥0𝑖subscript¯𝑥0𝑖subscript𝑥𝑐𝑖subscript¯𝑥𝑐𝑖superscriptsubscript𝑖𝑁superscriptsubscript𝑥0𝑖subscript¯𝑥0𝑖2superscriptsubscript𝑖𝑁superscriptsubscript𝑥𝑐𝑖subscript¯𝑥𝑐𝑖2\rho_{X_{c}}=\frac{\sum_{i}^{N}\left(x_{0,i}-\bar{x}_{0,i}\right)\left(x_{c,i}% -\bar{x}_{c,i}\right)}{\sqrt{\sum_{i}^{N}\left(x_{0,i}-\bar{x}_{0,i}\right)^{2% }\sum_{i}^{N}\left(x_{c,i}-\bar{x}_{c,i}\right)^{2}}}italic_ρ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT ) ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG (7)

The pointwise maximum error:

ΔXcmax=max(|x0,ixc,i|)Δsuperscriptsubscript𝑋𝑐subscript𝑥0𝑖subscript𝑥𝑐𝑖\Delta X_{c}^{\max}=\max\left(\left|x_{0,i}-x_{c,i}\right|\right)roman_Δ italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT = roman_max ( | italic_x start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT | ) (8)

And further the mean, standard deviation, minimum and maximum values for all state variables Xcsubscript𝑋𝑐X_{c}italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, and the compression rate:

CRc=original sizecompressed sizesubscriptCR𝑐original sizecompressed size\text{CR}_{c}=\frac{\text{original size}}{\text{compressed size}}CR start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = divide start_ARG original size end_ARG start_ARG compressed size end_ARG (9)

The validator is implemented in python and can be further customized by passing custom validation functions to the validator class. The path to the custom validator script is set in the Melissa-DA configuration. If no path is set in the configuration, the framework uses a default validator, calculating the introduced qualifiers. LABEL:lst:validator-script shows an example of a custom validator script.

Refer to caption
(a) FPZIP
Refer to caption
(b) ZFP - Precision
Refer to caption
(c) ZFP - Accuracy
Refer to caption
(d) Half/Single precision
Figure 2: Z-Value deviation, ΔRMSZXcpΔsuperscriptsubscriptRMSZsubscript𝑋𝑐𝑝\Delta\text{RMSZ}_{X_{c}}^{p}roman_Δ RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT 11, for (a) FPZIP, (b) ZFP in precision mode, (c) ZFP in accuracy mode, and (d) half and single precision. The colors indicate values at different cycles.
from validator import *
def custom_write( mean, variance, cycle, nranks, ndims ):
’’
mean - { "variable1" : [mean_rank1, ...],
"variable2" : [mean_rank1, ...], ... }
variance - { "variable1" : [variance_rank1, ...],
"variable2" : [variance_rank1, ...], ... }
cycle - Assimilation cycle
nranks - Number of processes simulation
ndims - { "variable1" : [ndim_rank1, ...],
"variable2" : [ndim_rank1, ...], ... }
’’
...
def custom_evaluate( data, rank, name ):
’’
data - variable data on application rank
rank - application rank
name - variable name
’’
...
def custom_compare( data, rank, name ):
’’
data[0] - uncompressed variable data on application rank
data[1] - compressed variable data on application rank
rank - application rank
name - variable name
’’
...
validator = Validator(
evaluate_function=custom_evaluate,
compare_function=custom_compare
write_funtion=custom_write
)
validator.run()
Listing 2: Example of a custom validator script. The Validator class provides arguments for callback functions. The functions have to comply the function interface. We can pass the functions as scalar variables or arrays. The evaluate and compare functions must return a scalar value. The result will be included in the data output. A function to write the mean particle state into files can also be provided.

We provide two kinds of custom functions. The evaluation function is called with the state data, compressed with parameter c𝑐citalic_c and can be used to compute scalar quantities for the state variables Xcsubscript𝑋𝑐X_{c}italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, for instance, the total energy, energy budgets, etc. The compare function is called with both the data of the uncompressed and the data of the corresponding compressed state and provides a means to customize the validation, calculating additional qualifiers for testing the consistency of the compressed states. In addition to the functions shown in the listing, it is also mandatory to pass appropriate reduction functions, since the functions are called with the rank local parts of the data, allowing for computing the quantities in parallel. All values that are calculated will be recorded and written into a comma seperated csv file.

III-D Dynamic Mode

After identifying the parameters that can safely be applied for data compression leveraging the validation mode, the framework can now be operated in dynamic mode. This mode aims the best performance for the data assimilation using the Melissa-DA particle filter. The same configuration file as for the validate mode serves here as well for providing the compression parameters (see LABEL:lst:jsoncfg). The dynamic mode operates in two phases. During the initial phase we check all compression parameters using a validation function and discard those parameters that fail the validation. The remaining parameters are sorted by the compression rate and kept in a variable. After the initialization, before writing the particle state to disk, we check the integrity of the state with the validation function after the compressing it with the best parameter. If the validation passes, the state is stored to the file system. If the test fails, we check the next parameter in the list and we repeat this until we find a parameter that passes the validation. This ensures, that we never use a compression parameter that introduces inconsistencies to the particle state.

Per default, the validators are inactive during the dynamic mode. However, the validators can be activated to compute certain quantities using custom evaluation functions, or they can be used to write out the mean particle state with a custom function. This has the advantage, that those tasks are performed asynchronously to the particle propagation. The validators are implemented in Python, hence, we can leverage Python bindings for common I/O libraries (e.g., netCDF4 [34], ADIOS [1] or HDF5 [2]) to write the state data.

IV Evaluation

Refer to caption
(a) FPZIP, HP, SP
Refer to caption
(b) ZFP
Figure 3: Normalized maximum pointwise error and normalized root mean square error for (a) FPZIP, half and single precision, and (b) ZFP in accuracy and precision modes. The colors indicate values at different cycles.
Cycle FPZIP 32 FPZIP 40 FPZIP 48 ZFP 32 ZFP 40 ZFP 48 ZFP 1e-6 ZFP 1e-8 ZFP 1e-10 HP SP
NRMSE
1 2.12e-07 8.32e-10 3.25e-12 1.11e-09 4.34e-12 1.69e-14 3.35e-08 2.59e-10 2.03e-12 1.16e-04 1.42e-08
3 4.88e-06 1.87e-08 6.90e-11 2.34e-08 1.02e-10 3.91e-13 8.02e-07 5.82e-09 4.39e-11 2.47e-03 3.31e-07
5 1.25e-04 5.29e-07 2.23e-09 6.47e-07 3.16e-09 1.35e-11 2.19e-05 1.55e-07 1.21e-09 2.10e-02 8.88e-06
7 1.88e-03 1.17e-05 5.36e-08 1.31e-05 6.32e-08 2.21e-10 4.58e-04 3.39e-06 2.57e-08 6.58e-02 2.09e-04
9 1.04e-02 1.99e-04 8.44e-07 2.39e-04 1.10e-06 5.75e-09 3.37e-03 6.34e-05 5.36e-07 1.20e-01 1.90e-03
11 3.32e-02 1.68e-03 2.20e-05 1.97e-03 4.64e-05 3.38e-07 1.37e-02 7.48e-04 1.04e-05 1.60e-01 8.83e-03
13 7.27e-02 7.45e-03 2.09e-04 8.19e-03 2.26e-04 1.42e-06 3.82e-02 3.93e-03 1.42e-04 1.84e-01 2.73e-02
15 1.18e-01 2.25e-02 1.36e-03 2.44e-02 1.55e-03 2.00e-05 7.71e-02 1.34e-02 9.56e-04 1.94e-01 6.02e-02
NPME
1 7.68e-06 2.34e-08 1.33e-10 4.74e-08 3.44e-10 1.05e-12 1.54e-06 7.20e-09 7.62e-11 6.56e-03 5.26e-07
3 1.38e-03 4.38e-06 1.52e-08 3.81e-06 2.66e-08 9.60e-11 2.52e-04 1.88e-06 8.73e-09 4.11e-01 1.02e-04
5 4.86e-02 2.34e-04 1.11e-06 3.01e-04 1.68e-06 8.27e-09 1.06e-02 6.61e-05 5.37e-07 7.35e-01 3.80e-03
7 4.77e-01 6.16e-03 3.15e-05 5.90e-03 3.42e-05 1.10e-07 2.25e-01 1.77e-03 1.24e-05 7.72e-01 1.17e-01
9 6.76e-01 1.04e-01 5.05e-04 1.31e-01 6.60e-04 4.09e-06 5.64e-01 3.68e-02 3.10e-04 8.00e-01 4.97e-01
11 7.47e-01 5.05e-01 1.73e-02 5.19e-01 3.01e-02 2.35e-04 6.70e-01 3.59e-01 7.01e-03 8.24e-01 6.47e-01
13 8.00e-01 6.35e-01 1.27e-01 6.59e-01 1.31e-01 1.04e-03 7.53e-01 5.80e-01 9.36e-02 8.08e-01 7.38e-01
15 7.96e-01 7.24e-01 4.59e-01 7.15e-01 4.57e-01 1.29e-02 7.80e-01 6.83e-01 3.73e-01 8.05e-01 7.79e-01
Pearson Correlation Coefficient
1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 0.99986 1
5 1 1 1 1 1 1 1 1 1 0.98953 1
7 0.99992 1 1 1 1 1 0.99999 1 1 0.89889 1
9 0.99750 1 1 1 1 1 0.99974 1 1 0.66111 0.99991
11 0.97405 0.99993 1 0.99991 1 1 0.99550 0.99999 1 0.38879 0.99815
13 0.87646 0.99870 1 0.99839 1 1 0.96534 0.99963 1 0.20570 0.98261
15 0.67014 0.98790 0.99995 0.98595 0.99994 1 0.85887 0.99574 0.99998 0.11110 0.91457
Table I: Average values of the statistical qualifiers NRMSE, NPME and ρXcsubscript𝜌subscript𝑋𝑐\rho_{X_{c}}italic_ρ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT for selected compression parameters. The rows show the evolution of the qualifiers by assimilation cycles.
Compression Rate
State Size [MB] FPZIP 32 FPZIP 40 FPZIP 48 ZFP 32 ZFP 40 ZFP 48 ZFP 1e-6 ZFP 1e-8 ZFP 1e-10 HP SP
16 2.332125 1.805610 1.473040 1.897811 1.533923 1.287127 2.240469 1.799499 1.503566 4 2
32 2.332534 1.805876 1.473223 1.897829 1.533934 1.287135 2.240486 1.799508 1.503571 4 2
64 2.332749 1.806008 1.473311 1.897833 1.533936 1.287137 2.240469 1.799498 1.503564 4 2
128 2.332955 1.806137 1.473397 1.897827 1.533933 1.287135 2.240469 1.799497 1.503564 4 2
Table II: Compression rates, CRcsubscriptCR𝑐\text{CR}_{c}CR start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, for selected compression parameters, ordered by the state size.
State Size [MB] Uncompressed FPZIP 32 FPZIP 40 FPZIP 48 ZFP 32 ZFP 40 ZFP 48 ZFP 1e-6 ZFP 1e-8 ZFP 1e-10
Load State from PFS (median) [ms]
16 877.1 776.8 769.6 779.0 805.3 820.3 824.3 - - -
32 933.8 793.7 848.2 860.3 872.2 907.5 926.7 810.1 854.6 871.3
64 989.6 957.5 1011.9 1052.4 845.0 892.4 897.7 877.6 945.7 948.2
128 1185.0 954.5 1030.5 1119.2 957.9 995.7 1011.9 977.3 1045.6 1044.4
Speedup Load [%]
16 - 11.4 12.3 11.2 8.2 6.5 6.0 - - -
32 - 15.0 9.2 7.9 6.6 2.8 0.8 13.2 8.5 6.7
64 - 3.2 2.3 6.3 14.6 9.8 9.3 11.3 4.4 4.2
128 - 19.4 13.0 5.6 19.2 16.0 14.6 17.5 11.8 11.9
Store State to PFS (median) [ms]
16 524.9 479.6 481.2 491.2 500.0 501.3 516.3 - - -
32 651.0 513.7 527.4 567.9 524.7 540.0 593.2 514.3 531.6 588.2
64 761.0 585.4 599.2 659.0 461.1 498.4 534.9 582.6 590.0 653.3
128 1215.5 744.8 966.7 1132.2 698.2 869.9 956.4 676.9 833.3 943.8
Speedup Store [%]
16 - 8.6 8.3 6.4 4.7 4.5 1.6 - - -
32 - 21.1 19.0 12.8 19.4 17.1 8.9 21.0 18.3 9.6
64 - 23.1 21.3 13.4 39.4 34.5 29.7 23.4 22.5 14.2
128 - 38.7 20.5 6.9 42.6 28.4 21.3 44.3 31.4 22.4
Table III: Speedup for the various compression parameters while storing and loading the states fomr the PFS.
Qualifier FPZIP 32 FPZIP 40 FPZIP 48 ZFP 32 ZFP 40 ZFP 48 ZFP 1e-6 ZFP 1e-8 ZFP 1e-10 SP
ρXcsubscript𝜌subscript𝑋𝑐\rho_{X_{c}}italic_ρ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.67014 0.98790 0.99995 0.98595 0.99994 1 0.85887 0.99574 0.99998 0.91457
ΔRMSZXcpΔsuperscriptsubscriptRMSZsubscript𝑋𝑐𝑝\Delta\text{RMSZ}_{X_{c}}^{p}roman_Δ RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT 0.22 0.04 0.0009 0.03 0.001 3.6e-6 0.21 0.03 0.0005 0.14
A(RMSE)𝐴RMSEA(\text{RMSE})italic_A ( RMSE ) 1.41(05) 1.0007(02) 1.0002(01) 0.9958(22) 0.9964(23) 0.9963(23) 1.0855(90) 0.9965(20) 0.9973(20) 1.0232(29)
OK
Table IV: Summary of the best compression parameters and the exclusion criteria.

To evaluate our framework, we perform an exemplary workflow that a user would follow when applying it to the climate model at stake. First, we use the validation mode to identify viable compression parameters. Afterwards, we apply the parameters to performing the data assimilation with the dynamic mode. We instrumented the validators to acquire information about the performance of the individual validation tasks. For measuring the performance during the dynamic mode, we leverage the internal profiler of Melissa-DA. We present the analysis on the statistical qualifiers that are computed during the validation mode in Section IV-C. The performance of the validators is evaluated in Section IV-D1 and finally, the evaluation of the performance during the dynamic mode in Section IV-E1.

IV-A Experimental Setup

All of our experiments were performed on Fugaku [35], a 488 (double-precision) PFlops supercomputer hosted by RIKEN R-CCS in Japan. Fugaku consists of 158,976 compute nodes that are each equipped with a Fujitsu A64FX CPU. A64FX provides 48 application CPU cores and is integrated with 32 GiB of HBM2 memory. The compute nodes are interconnected through the TofuD network.

Each group of 16 compute nodes in Fugaku shares a 1.6 TB SSD storage, and all the nodes can access the global 150 PB Lustre file system. We utilize the SSDs in the so-called local mode, where each SSD is divided proportionally among compute nodes in the given group and is exposed as dedicated per-node file system.

IV-B Methodology

We performed experiments with different state sizes, N𝑁Nitalic_N, and different ensemble sizes, P𝑃Pitalic_P, to examine the scaling behavior of the validation mode. We selected such N𝑁Nitalic_N, that result in state sizes of 16, 32, 64 and 128 MB. We further set P𝑃Pitalic_P to 25, 50 and 100 particles, using 36, 72 and 132 compute nodes respectively. All experiments are performed using the Loren96 [27] model. The model equation reads:

dxidt=(xi+1xi2)xi1xi+Fi=1,,Nformulae-sequence𝑑subscript𝑥𝑖𝑑𝑡subscript𝑥𝑖1subscript𝑥𝑖2subscript𝑥𝑖1subscript𝑥𝑖𝐹𝑖1𝑁\frac{dx_{i}}{dt}=(x_{i+1}-x_{i-2})x_{i-1}-x_{i}+F\quad\,\quad i=1,\dots,Ndivide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG = ( italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_i = 1 , … , italic_N (10)

at a forcing of 6 or larger (i.e., F6𝐹6F\leq 6italic_F ≤ 6), the model exhibits chaotic behavior. Hence, we set the forcing to 6 in all our experiments. The model has been initialized with small perturbations at t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Before starting the data assimilation, we run the model for a long enough time (DT=10𝐷𝑇10DT=10italic_D italic_T = 10), so that it exhibits a chaotic state. We further introduced a generic perturbation to the states at the beginning of each particle propagation. To ensure that we can track the errors that are introduced by the compression method only, we used the same static seed for all particles pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT with c=0,,C𝑐0𝐶c=0,\dots,Citalic_c = 0 , … , italic_C, and C𝐶Citalic_C the number of compression parameters. Thus, at the beginning of each particle propagation, we reset the seed to a value sd=f(pid,rid,rank,t)𝑠𝑑𝑓pidridranktsd=f(\text{pid},\text{rid},\text{rank},\text{t})italic_s italic_d = italic_f ( pid , rid , rank , t ), with pid the particle-id, rid the runner-id, rank the mpi rank of the runner and t the assimilation cycle.

IV-C Statistical Evaluation

In this section, we present the results of our experiments in validation mode. The aim of this analysis is to determine viable compression parameters for production runs. For this, we will evaluate the statistical qualifiers presented in Section III-C. The qualifiers have to fulfill certain requirements which we will pose in the next paragraphs. If all the requirements are met, the parameter can safely be applied for compression in the climate model.

IV-C1 Z-Value Deviation

To test the ensemble consistency, we apply the Z-test from Baker et al. [7]. For this, we plot the Z-value deviation:

ΔRMSZXcp=|RMSZXcpRMSZX0p|\Delta\text{RMSZ}_{X_{c}}^{p}=\left|\text{RMSZ}_{X_{c}}^{p}-\text{RMSZ}_{X{{}_% {0}}}^{p}\right|roman_Δ RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = | RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT - RMSZ start_POSTSUBSCRIPT italic_X start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT | (11)

Figure 2 shows the plots resolved by compression method. The plots show the values for cycles 1 to 18 of the ensemble data assimilation, indicated by the different colors. The variation of the z-value deviation can be read from the error bars of the boxes. Baker et al. [7] requires the deviation to be smaller than 0.1:

0ΔRMSZXcp<0.10ΔsuperscriptsubscriptRMSZsubscript𝑋𝑐𝑝0.10\leq\Delta\text{RMSZ}_{X_{c}}^{p}<0.10 ≤ roman_Δ RMSZ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT < 0.1 (12)

Thus, as long as all values stay within this interval, the test is considered passed.

IV-C2 Normalized Error Statistic

To get information on the error propagation, we plot the normalized pointwise maximum error (NPME) and normalized root mean squared error (NRMSE):

NRMSEXcsubscriptNRMSEsubscript𝑋𝑐\displaystyle\text{NRMSE}_{X_{c}}NRMSE start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT =RMSEXcmax(xc,i)min(xc,i)i=1,2,,Nformulae-sequenceabsentsubscriptRMSEsubscript𝑋𝑐subscript𝑥𝑐𝑖subscript𝑥𝑐𝑖𝑖12𝑁\displaystyle=\frac{\text{RMSE}_{X_{c}}}{\max\left(x_{c,i}\right)-\min\left(x_% {c,i}\right)}\quad\,\quad i=1,2,\dots,N= divide start_ARG RMSE start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG roman_max ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT ) - roman_min ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT ) end_ARG italic_i = 1 , 2 , … , italic_N (13)
NPMEXcsubscriptNPMEsubscript𝑋𝑐\displaystyle\text{NPME}_{X_{c}}NPME start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT =max(|xc,ix0,i|)max(xc,i)min(xc,i)i=1,2,,Nformulae-sequenceabsentsubscript𝑥𝑐𝑖subscript𝑥0𝑖subscript𝑥𝑐𝑖subscript𝑥𝑐𝑖𝑖12𝑁\displaystyle=\frac{\max\left(\left|x_{c,i}-x_{0,i}\right|\right)}{\max\left(x% _{c,i}\right)-\min\left(x_{c,i}\right)}\quad\,\quad i=1,2,\dots,N= divide start_ARG roman_max ( | italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT | ) end_ARG start_ARG roman_max ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT ) - roman_min ( italic_x start_POSTSUBSCRIPT italic_c , italic_i end_POSTSUBSCRIPT ) end_ARG italic_i = 1 , 2 , … , italic_N (14)
Refer to caption
(a) FPZIP, HP, SP
Refer to caption
(b) ZFP
Figure 4: Linear correlation between the timely evolution of the NRMSE of compressed to lossy compressed states. We plot the NRMSE of the compressed states for each cycle by the respective values for the lossy compressed state.

The plots are presented in Figures 3a and 3b. We can see that the errors increase very quickly in the beginning and much slower towards the end. However, as we described in Section IV-A, we use the same static seed for the propagation of identical particles. Therefore, the errors that we see in the plots are introduced only by the compression method. However, the randomization of the model introduces an error to the states as well, which limits the deterioration introduced by the compression. To make the actual impact of the compression method apparent, we repeated the experiments for the 25 particles ensemble, adding a small additional perturbation of O(108superscript10810^{-8}10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT) to the states using a random seed. Plotting the NRMSE of the various compression methods versus that of the lossless compressed state, gives us a graphical means to decide whether the compression method adds additional error or not. Since the perturbation is indeed random, even the comparison between two identical uncompressed particles differs. However, if the evolution of the NRMSE of a compressed particle is perfectly linear to the one of the lossless compressed states, we can infer that no additional error is imposed by the compression method.

Figure 4 shows the correlations for all compression methods. We can see that ZFP 32, ZFP 40, ZFP 48, ZFP 1e-8, ZFP 1e-10, FPZIP 40, FPZIP 48 and SP show an almost perfect linear relationship to the uncompressed state. To quantify the correlation and to develop an exclusion criteria, we performed a linear regression of the NRMSE-evolution for all P(P1)/2𝑃𝑃12P(P-1)/2italic_P ( italic_P - 1 ) / 2 combinations of the 25 particles (i.e., P=25𝑃25P=25italic_P = 25), which have been lossless compressed with FPZIP. In that way, we can determine the variation of the linear correlation between identical states. This results into an average correlation of A=1.00(01)𝐴1.0001A=1.00(01)italic_A = 1.00 ( 01 ), with A𝐴Aitalic_A being the slope of the linear regression model yxsimilar-to𝑦𝑥y\sim xitalic_y ∼ italic_x. Thus, we consider the NRMSE test passed, when the value for the linear correlation lies within the error interval of A.

IV-C3 Pearson Correlation Coefficient

The last value that we look at is the Pearson correlation coefficient, Equation 7, ρXcsubscript𝜌subscript𝑋𝑐\rho_{X_{c}}italic_ρ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT encodes the linear correlation between the compressed and uncompressed state. Baker et al. [7] requires this value to be at least 0.999990.999990.999990.99999.

IV-C4 Summary of the Validation Study

A detailed list of the results is provided in Table I, resolved by the compression method and assimilation cycle. We list the compression rates in Table II. The results show that the rate does practically not change with increasing data size. By matching the results with the requirements, we can extract a subset of viable compression parameters from the initial set. We take a closer look on the parameters that show most promising in Table I, and Table IV summarizes the results of the tests for the best parameters for cycle 15. Parameters that pass the tests, indicated by the check-mark, can be safely used for data compression in the Lorenz96 model within 15 assimilation cycles.

IV-D Performance

Now that we have outlined how to use the framework and have performed an exemplary evaluation, we will present a performance profile of the framework and look into the savings that we achieve during the dynamic mode.

IV-D1 Validation Mode

The Validator parallelization is two fold. First, we parallelize among the validator instances by distributing the particle IDs among all available validators. Thus, each validator computes the statistical qualifiers for P/V𝑃𝑉P/Vitalic_P / italic_V particles, where P𝑃Pitalic_P is the total number of particles in the ensemble and V𝑉Vitalic_V the number of validators. The number of validators can be set in the Melissa-DA configuration. Each validator is executed on one node and each instance is independent from the others (i.e., different MPI executions). Second, we parallelize the validation tasks on the validators, using the python multiprocessing class. Hence, we compute the statistical quantities leveraging all cores available on the node. Furthermore, the validators are connected via TCP to each other, leveraging ZMQ [3], to gather all the qualifiers on a single validator and collecting them in a single file.

Figure 5 shows the profile of one validation cycle on a randomly selected validator instance for the 50 particles and 16 MB experiment. We can see that we spend most of the time to load the particle states (second row in the figure). The calculation of the ensemble mean and standard deviation, require the availability of all the ensemble particle states on the validators. Thus, we need to load in total (C+1)×P𝐶1𝑃(C+1)\times P( italic_C + 1 ) × italic_P particles on each validator. In a first implementation, we parallelized the computations for those quantities differently. We computed the ensemble mean and standard deviation partially on each validator, containing only the terms for the particles that have been assigned to the validator. However, The terms have dimension N𝑁Nitalic_N, as we can see in Equation 1, and the reduce operation involves the transfer of those states among the validators, even worse, we need to perform an additional allreduce operation to compute the ensemble standard deviation, since we need the formerly calculated ensemble mean for this available on all validators. The current implementation shows a considerable speedup compared to this, despite the overhead of the additional I/O.

Refer to caption
Figure 5: Trace of randomly selected validator for one validation cycle
Refer to caption
Figure 6: Comparison of the time for one assimilytion cycle leveraging the dynamic mode of our proposed framework.

IV-E Discussion

IV-E1 Dynamic mode

In Section III-C, we identified viable compression parameters, studying the results from the validation mode.  Table III lists the I/O performance that we measured on the runners while executing the climate model. The times contain the compression and IO operation. We can see that we save most while storing the states, loading the states shows a significantly smaller speedup. The speedup is defined by ΔTSU=TcT0T0Δsubscript𝑇SUsubscript𝑇𝑐subscript𝑇0subscript𝑇0\Delta T_{\text{SU}}=\frac{T_{c}-T_{0}}{T_{0}}roman_Δ italic_T start_POSTSUBSCRIPT SU end_POSTSUBSCRIPT = divide start_ARG italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG. The maximum speedup is achieved with ZFP and FPZIP at 32 bit precision and with ZFP 106superscript10610^{-6}10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT accuracy. The corresponding speedup is 39%, 43% and 44% respectively for the store, and 19%, 19% and 18% for the load operation. The parameters which have passed all tests also show considerable speedup for storing and loading the states. Even the most accurate methods show a reasonable speedup of 21% and 22% for storing and 15% and 12% for loading the states. We further observe that the speedup increases with larger state sizes. Moreover, we can see in Table II, that the compression rate does not change with an increasing state size. This means that we can expect even better speedups and equal reduction rates for larger state sizes. This is supported by experiments that we performed in dynamic mode with a state size of 2 GB. We measured a speedup of 19% for loading and 57% for storing the state. Furthermore, we have been using the validation function, computing the maximum pointwise error of the compressed state, while still reaching an effective speedup of 6% for the assimilation cycles. This can be seen in Figure 6. The values for the speedups are always given as the median.

Our analysis is performed on the Lorenz96 model, which has only one state variable. Baker et al. [7] evaluates lossy compression for four different variables of the CESM. The evaluation shows different behavior for all variables in compression rate and variable consistency. For instance, compression with FPZIP-24 leads to compression rates between 2.56 and 5.26 and NRMSE values between 1.8e-5 and 6.5e-7. Further, the ensemble consistency varies significantly among the variables for different compression methods. This demonstrates that we need to allow for different compression parameters per variable and that each variable achieves a different compression rate. That is to say, our evaluation does not give a general statement for the performance of the compression parameters. Moreover, it underlines the importance of studying the impact of lossy compression on consistency for each model and its variables separately.

V Related Work

Compression of scientific datasets is not only interesting for numerical climate science. Every field in HPC that deals with large datasets benefits from reduced data sizes. Data compression can be applied, for instance, to datasets before visualization and to generate checkpoints. A variety of compression algorithms are used in HPC: ZFP [23], FPZIP [24], ISABELA [20], SZ [10], MGARD [5] and MGARD+ [22], to name a few. I/O libraies such as ADIOS [13], HDF5 [2], and NetCDF4 [9] offer high-level interfaces for compression of datasets in self-descriptive hierarchical files.

The impact and applicability of data compression to scientific datasets has been studied in several works [19, 31, 9, 7, 6, 33] The community earth system model (CESM)  [18, 8] includes a Port-Validation tool, originally used to determine the consistency of the results after porting to a different architecture. According to Baker et al. [7], the tool can also be used to validate data compression for the CESM module states. Z-Checker [37] is a framework that can be used to analyze the impact of compression to any scientific dataset. The framework offers offline and online analysis of the datasets. The online mode can be used after instrumenting the code with the Z-Checker API functions. The online mode can be used inside the application to observe the dynamic behavior of data compression.

Our proposed framework is similar to the port validation tool in the CESM, as the tool checks ensemble consistency and can detect issues in the climate model after porting it to a new machine. However, our framework (1) is not constrainted to a certain climate modelling system, (2) is more flexible as it enables definition of custom validation functions and (3) provides automatic and direct comparison between ensembles that use different compression methods. Z-Checker provides several features to evaluate the impact of the compression method on the data, and the online mode can potentially be used to perform an analysis that is similar to ours. However, this would be associated with considerable implementation efforts, and the tool does not provide measures to detect ensemble inconsistencies.

VI Conclusion

In this work, we present a novel framework, build on top of the Melissa-DA architecture, that provides validation and application of lossy compression in climate models for ensemble data assimilation. We conducted and presented an exemplary study based on the Loren96 model, where we evaluated the applicability of the FPZIP and ZFP 16, 24, 32, 40 and 48 bit precision modes, the ZFP 1e-4, 1e-6, 1e-8, 1e-10 accuracy modes, and single and half precision floating point representations for data compression. Our validation follows the suggestion of Baker et al. [7], requiring the deviation of the Z-Values of compressed and uncompressed states to be less than 0.1 (Equation 12), the pearson correlation coefficient (Equation 7) of the compressed and the uncompressed state to be at least 0.9999 (Baker et al. is more restrictive, requiring at least 0.99999), and the impact on the normalized root mean squared error of compressed and uncompressed state to be negligible (compare Section IV-C2). After matching our results with this metric, we remain with FPZIP 48, ZFP 40 and 48 and ZFP 1e-10. According to this, those parameters can safely be applied for the compression of the states for 15 assimilation cycles, without affecting the data assimilation result. The compression rates for those parameters are 1.47, 1.53 and 1.5 respectively, which translates to a saving of 1/3 in storage space. Our measurements during the dynamic mode and a state size of 2GB, show speedups of 19% while loading and 57% while storing the states. We also check the integrity of the states applying the default validation function. These results into an effective speedup of 6% for the full assimilation cycle.

VII Acknowledgements

Part of the research presented here has received funding from the Horizon 2020 (H2020) funding framework under grant/award number: 824158; Energy oriented Centre of Excellence II (EoCoE-II). The present publication reflects only the authors’ views. The European Commission is not liable for any use that might be made of the information contained therein.

References

  • [1] ADIOS: The Adaptable I/O System | Computer Science and Mathematics.
  • [2] The HDF5® Library & File Format.
  • [3] ZeroMQ - https://zeromq.org/.
  • [4] Sameh Abdulah, Qinglei Cao, Yu Pei, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, and Ying Sun. Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC. IEEE Transactions on Parallel and Distributed Systems, 33(4):964–976, April 2022. Conference Name: IEEE Transactions on Parallel and Distributed Systems.
  • [5] Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky. Multilevel techniques for compression and reduction of scientific data—the univariate case. Computing and Visualization in Science, 19(5):65–76, 2018.
  • [6] Allison H. Baker, Dorit M. Hammerling, Sheri A. Mickelson, Haiying Xu, Martin B. Stolpe, Phillipe Naveau, Ben Sanderson, Imme Ebert-Uphoff, Savini Samarasinghe, Francesco De Simone, Francesco Carbone, Christian N. Gencarelli, John M. Dennis, Jennifer E. Kay, and Peter Lindstrom. Evaluating lossy data compression on climate simulation data within a large ensemble. Geoscientific Model Development, 9(12):4381–4403, December 2016. Publisher: Copernicus GmbH.
  • [7] Allison H. Baker, Haiying Xu, John M. Dennis, Michael N. Levy, Doug Nychka, Sheri A. Mickelson, Jim Edwards, Mariana Vertenstein, and Al Wegener. A methodology for evaluating the impact of data compression on climate simulation data. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, HPDC ’14, pages 203–214, New York, NY, USA, June 2014. Association for Computing Machinery.
  • [8] G. Danabasoglu, J.-F. Lamarque, J. Bacmeister, D. A. Bailey, A. K. DuVivier, J. Edwards, L. K. Emmons, J. Fasullo, R. Garcia, A. Gettelman, C. Hannay, M. M. Holland, W. G. Large, P. H. Lauritzen, D. M. Lawrence, J. T. M. Lenaerts, K. Lindsay, W. H. Lipscomb, M. J. Mills, R. Neale, K. W. Oleson, B. Otto-Bliesner, A. S. Phillips, W. Sacks, S. Tilmes, L. van Kampenhout, M. Vertenstein, A. Bertini, J. Dennis, C. Deser, C. Fischer, B. Fox-Kemper, J. E. Kay, D. Kinnison, P. J. Kushner, V. E. Larson, M. C. Long, S. Mickelson, J. K. Moore, E. Nienhouse, L. Polvani, P. J. Rasch, and W. G. Strand. The Community Earth System Model Version 2 (CESM2). Journal of Advances in Modeling Earth Systems, 12(2):e2019MS001916, 2020. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1029/2019MS001916.
  • [9] Xavier Delaunay, Aurélie Courtois, and Flavien Gouillon. Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files. Geoscientific Model Development, 12(9):4099–4113, September 2019. Publisher: Copernicus GmbH.
  • [10] Sheng Di and Franck Cappello. Fast Error-Bounded Lossy HPC Data Compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 730–739, May 2016. ISSN: 1530-2075.
  • [11] Francesca Eggleton and Kate Winfield. Open Data Challenges in Climate Science. Data Science Journal, 19(1):52, December 2020. Number: 1 Publisher: Ubiquity Press.
  • [12] Geir Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans, 99(C5):10143–10162, 1994. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1029/94JC00572.
  • [13] William F. Godoy, Norbert Podhorszki, Ruonan Wang, Chuck Atkins, Greg Eisenhauer, Junmin Gu, Philip Davis, Jong Choi, Kai Germaschewski, Kevin Huck, Axel Huebl, Mark Kim, James Kress, Tahsin Kurc, Qing Liu, Jeremy Logan, Kshitij Mehta, George Ostrouchov, Manish Parashar, Franz Poeschel, David Pugmire, Eric Suchyta, Keichi Takahashi, Nick Thompson, Seiji Tsutsumi, Lipeng Wan, Matthew Wolf, Kesheng Wu, and Scott Klasky. ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management. SoftwareX, 12:100561, July 2020.
  • [14] Ganesh Gopalakrishnan, Ibrahim Hoteit, Bruce D. Cornuelle, and Daniel L. Rudnick. Comparison of 4DVAR and EnKF state estimates and forecasts in the Gulf of Mexico. Quarterly Journal of the Royal Meteorological Society, 145(721):1354–1376, 2019. _eprint: https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/qj.3493.
  • [15] Sam Hatfield, Peter Düben, Matthew Chantry, Keiichi Kondo, Takemasa Miyoshi, and Tim Palmer. Choosing the Optimal Numerical Precision for Data Assimilation in the Presence of Model Error. Journal of Advances in Modeling Earth Systems, 10(9):2177–2191, 2018. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1029/2018MS001341.
  • [16] Sam Hatfield, Aneesh Subramanian, Tim Palmer, and Peter Düben. Improving Weather Forecast Skill through Reduced-Precision Data Assimilation. Monthly Weather Review, 146(1):49–62, January 2018. Publisher: American Meteorological Society Section: Monthly Weather Review.
  • [17] Brian R. Hunt, Eric J. Kostelich, and Istvan Szunyogh. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D: Nonlinear Phenomena, 230(1):112–126, 2007.
  • [18] James W. Hurrell, M. M. Holland, P. R. Gent, S. Ghan, Jennifer E. Kay, P. J. Kushner, J.-F. Lamarque, W. G. Large, D. Lawrence, K. Lindsay, W. H. Lipscomb, M. C. Long, N. Mahowald, D. R. Marsh, R. B. Neale, P. Rasch, S. Vavrus, M. Vertenstein, D. Bader, W. D. Collins, J. J. Hack, J. Kiehl, and S. Marshall. The Community Earth System Model: A Framework for Collaborative Research. Bulletin of the American Meteorological Society, 94(9):1339–1360, September 2013. Publisher: American Meteorological Society Section: Bulletin of the American Meteorological Society.
  • [19] Milan Klöwer, Miha Razinger, Juan J. Dominguez, Peter D. Düben, and Tim N. Palmer. Compressing atmospheric data into its real information content. Nature Computational Science, 1(11):713–724, November 2021. Number: 11 Publisher: Nature Publishing Group.
  • [20] Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In Proceedings of the 17th international conference on Parallel processing - Volume Part I, Euro-Par’11, pages 366–379, Berlin, Heidelberg, August 2011. Springer-Verlag.
  • [21] Peter Jan van Leeuwen. Particle Filtering in Geophysical Systems. Monthly Weather Review, 137(12):4089–4114, December 2009. Publisher: American Meteorological Society Section: Monthly Weather Review.
  • [22] Xin Liang, Ben Whitney, Jieyang Chen, Lipeng Wan, Qing Liu, Dingwen Tao, James Kress, Dave Pugmire, Matthew Wolf, Norbert Podhorszki, and Scott Klasky. MGARD+: Optimizing Multilevel Methods for Error-bounded Scientific Data Reduction. arXiv:2010.05872 [cs], November 2020. arXiv: 2010.05872.
  • [23] Peter Lindstrom. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization and Computer Graphics, 20(12):2674–2683, December 2014. Conference Name: IEEE Transactions on Visualization and Computer Graphics.
  • [24] Peter Lindstrom. FPZIP, 2017. Language: en.
  • [25] Andrew Lorenc. Relative merits of 4d-var and ensemble kalman filter. Technical report, NWP Internal Report, 2003.
  • [26] Andrew C. Lorenc. The potential of the ensemble Kalman filter for NWP—a comparison with 4D-Var. Quarterly Journal of the Royal Meteorological Society, 129(595):3183–3203, 2003. _eprint: https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1256/qj.02.132.
  • [27] Edward N Lorenz. Predictability: A problem partly solved. In Proc. Seminar on predictability, volume 1, 1996.
  • [28] Masuo Nakano, Hisashi Yashiro, Chihiro Kodama, and Hirofumi Tomita. Single Precision in the Dynamical Core of a Nonhydrostatic Global Atmospheric Model: Evaluation Using a Baroclinic Wave Test Case. Monthly Weather Review, 146(2):409–416, February 2018. Publisher: American Meteorological Society Section: Monthly Weather Review.
  • [29] H. Ngodock, I. Souopgui, M. Carrier, S. Smith, J. Osborne, and J. D’Addezio. An ensemble of perturbed analyses to approximate the analysis error covariance in 4dvar. Tellus A: Dynamic Meteorology and Oceanography, 72(1):1–12, January 2020. Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/16000870.2020.1771069.
  • [30] Ryohei Okazaki, Takekazu Tabata, Sota Sakashita, Kenichi Kitamura, Noriko Takagi, Hideki Sakata, Takeshi Ishibashi, Takeo Nakamura, and Yuichiro Ajima. Supercomputer Fugaku CPU A64FX Realizing High Performance, High-Density Packaging, and Low Power Consumption. Fujitsu Technical Review, Fujitsu Limited, March 2020.
  • [31] Andrew Poppick, Joseph Nardi, Noah Feldman, Allison H. Baker, Alexander Pinard, and Dorit M. Hammerling. A statistical analysis of lossily compressed climate model data. Computers & Geosciences, 145:104599, December 2020.
  • [32] Roland Potthast, Anne Walter, and Andreas Rhodin. A Localized Adaptive Particle Filter within an Operational NWP Framework. Monthly Weather Review, 147(1):345–362, January 2019. Publisher: American Meteorological Society Section: Monthly Weather Review.
  • [33] Oriol Tintó Prims, Mario C. Acosta, Miguel Castrillo, Stella Valentina Paronuzzi Ticco, Kim Serradell, Ana Cortés, and Francisco J. Doblas-Reyes. Discriminating accurate results in nonlinear models. In 2019 International Conference on High Performance Computing Simulation (HPCS), pages 1028–1031, July 2019.
  • [34] Russ Rew, Glenn Davis, Steve Emmerson, Cathy Cormack, John Caron, Robert Pincus, Ed Hartnett, Dennis Heimbigner, Lynton Appel, and Ward Fisher. Unidata NetCDF, 1989. Language: en Medium: application/java-archive,application/gzip,application/tar.
  • [35] RIKEN Center for Computational Science. Fugaku Supercomputer. https://www.r-ccs.riken.jp/en/fugaku/, 2021. [1 April 2021].
  • [36] John L. Schnase, Tsengdar J. Lee, Chris A. Mattmann, Christopher S. Lynnes, Luca Cinquini, Paul M. Ramirez, Andre F. Hart, Dean N. Williams, Duane Waliser, Pamela Rinsland, W. Philip Webster, Daniel Q. Duffy, Mark A. McInerney, Glenn S. Tamkin, Gerald L. Potter, and Laura Carrier. Big Data Challenges in Climate Science. IEEE geoscience and remote sensing magazine, Volume 4(Iss 3):10–22, September 2016.
  • [37] Dingwen Tao, Sheng Di, Hanqi Guo, Zizhong Chen, and Franck Cappello. Z-checker: A framework for assessing lossy compression of scientific data. The International Journal of High Performance Computing Applications, 33(2):285–303, March 2019. Publisher: SAGE Publications Ltd STM.
  • [38] Peter Jan van Leeuwen, Hans R. Künsch, Lars Nerger, Roland Potthast, and Sebastian Reich. Particle filters for high-dimensional geoscience applications: A review. Quarterly Journal of the Royal Meteorological Society, 145(723):2335–2365, 2019. _eprint: https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/qj.3551.
  • [39] Christopher K. Wikle and L. Mark Berliner. A Bayesian tutorial for data assimilation. Physica D: Nonlinear Phenomena, 230(1):1–16, June 2007.
  • [40] H. Yashiro, K. Terasaki, Y. Kawai, S. Kudo, T. Miyoshi, T. Imamura, K. Minami, H. Inoue, T. Nishiki, T. Saji, M. Satoh, and H. Tomita. A 1024-member ensemble data assimilation with 3.5-km mesh global weather simulations. In 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1–10, Los Alamitos, CA, USA, nov 2020. IEEE Computer Society.
  • [41] Yongjun Zheng, Clément Albergel, Simon Munier, Bertrand Bonan, and Jean-Christophe Calvet. An offline framework for high-dimensional ensemble Kalman filters to reduce the time to solution. Geoscientific Model Development, 13(8):3607–3625, August 2020. Publisher: Copernicus GmbH.