0% found this document useful (0 votes)
4 views14 pages

Paper 7

This article presents two optimized lossy compression strategies for scientific data that adhere to pointwise relative error bounds, which are crucial for applications requiring varying error tolerances based on data values. The first strategy, block-based, is suitable for datasets with smooth value changes, while the second, multi-threshold-based, is designed for datasets with spiky value changes. Experimental results demonstrate that these strategies achieve significantly higher compression ratios compared to existing methods, enhancing data storage efficiency without compromising visualization quality.

Uploaded by

Sheng Di
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views14 pages

Paper 7

This article presents two optimized lossy compression strategies for scientific data that adhere to pointwise relative error bounds, which are crucial for applications requiring varying error tolerances based on data values. The first strategy, block-based, is suitable for datasets with smooth value changes, while the second, multi-threshold-based, is designed for datasets with spiky value changes. Experimental results demonstrate that these strategies achieve significantly higher compression ratios compared to existing methods, enhancing data storage efficiency without compromising visualization quality.

Uploaded by

Sheng Di
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 1

Efficient Lossy Compression for Scientific Data


based on Pointwise Relative Error Bound
Sheng Di, Senior Member, IEEE, Dingwen Tao, Xin Liang, Franck Cappello, Fellow, IEEE

Abstract—An effective data compressor is becoming increasingly critical to today’s scientific research, and many lossy compressors
are developed in the context of absolute error bounds. Based on physical/chemical definitions of simulation fields or multiresolution
demand, however, many scientific applications need to compress the data with a pointwise relative error bound (i.e., the smaller the
data value, the smaller the compression error to tolerate). To this end, we propose two optimized lossy compression strategies under a
state-of-the-art three-staged compression framework (prediction + quantization + entropy-encoding). The first strategy (called
block-based strategy) splits the data set into many small blocks and computes an absolute error bound for each block, so it is
particularly suitable for the data with relatively high consecutiveness in space. The second strategy (called multi-threshold-based
strategy) splits the whole value range into multiple groups with exponentially increasing thresholds and performs the compression in
each group separately, which is particularly suitable for the data with a relatively large value range and spiky value changes. We
implement the two strategies rigorously and evaluate them comprehensively by using two scientific applications which both require
lossy compression with point-wise relative error bound. Experiments show that the two strategies exhibit the best compression qualities
on different types of data sets respectively. The compression ratio of our lossy compressor is higher than that of other state-of-the-art
compressors by 17.2%–618% on the climate simulation data and 30%–210% on the N-body simulation data, with the same relative
error bound and without degradation of the overall visualization effect of the entire data.

Index Terms—Lossy compression, science data, high performance computing, relative error bound

1 I NTRODUCTION errors (such as the absolute error bound for each data point
or the root mean squared errors).
An effective data compressor is becoming more and more In addition to the absolute error bound, another type of
critical for today’s scientific research because of extremely error bound (called pointwise relative error bound or relative
large volume of data produced by scientific simulations. error bound for short) has received increasing attention be-
The large volume of data generally are stored in a parallel cause it is more reasonable for some applications based on
file system, with a limited capacity of the storage space the users’ demand on multiresolution or physical definition
and limited I/O bandwidth to access. Climate scientists, of simulation fields. As for the pointwise relative error
for example, need to run large ensembles of high-fidelity bound, the maximum compression error for each data point
1km × 1km simulations, with each instance simulating 15 is equal to a percentage ratio of the data point’s value,
years of climate in 24h of computing time. Estimating even such that the larger the value is, the larger the compression
one ensemble member per simulated day may generate 260 error. Hence, such an error bound leads to multiple preci-
TB of data every 16s across the ensemble. In the Hard- sions/resolutions on different data points. As one Argonne
ware/Hybrid Accelerated Cosmology Code (HACC) [1] (a Photon Source (APS) researcher pointed out, for instance, he
well-known cosmology simulation code), the number of needs to use different levels of precisions to study various
particles to simulate could reach up to 3.5 trillion, which regions of an APS generated X-ray image. Moreover, in
may produce 60 petabytes of data to store. One straight- some N-body applications such as cosmology simulation,
forward data reduction method is decimating data in either thousands of millions of particles are moving in the space
time dimension or space, however, this may lose important with different speeds. According to the corresponding re-
information for user’s analysis. searchers, the faster the particles move, the larger errors
Error-controlled lossy compression techniques have been their analysis can accept, which means that the compression
considered the best trade-off solution compared to lossless of the velocity field of particles needs to use point-wise
compression, because not only can such techniques signif- relative error bound.
icantly reduce the data size but they can also keep the In this paper, we focus on the optimization of compres-
data valid after the data decompression based on the error sion quality based on pointwise relative error bound for
controls. The existing compressors (such as [2], [3], [4]) are scientific data with various dimensions. Our exploration
basically designed in the context of absolute compression is based on a generic compression model - SZ [2], [3],
which involves three critical steps: (1) value prediction for
each data point, (2) quantization of prediction errors, and
• Sheng Di and Franck Cappello are with the Mathematics and Computer
Science (MCS) division at Argonne National Laboratory, USA. Dingwen (3) entropy encoding of the quantization output. Several
Tao and Xin Liang are with the Computer Science department of Univer- challenging issues have to be faced. SZ constructs a set
sity of California, River side. of adjacent quantization bins to transform each original
floating-point data value to an integer based on its predic-

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 2

tion error. Guaranteeing the absolute error bound in this rather random ending mantissa bits, generic lossless binary-
framework is relatively easy, in that the sizes of quantization stream compressors such as Gzip [5] and BlosC [6] cannot
bins could be fixed because of the constant absolute error work effectively. Although there are also some existing loss-
bound. Problems arise, however, when the error bounds are less compressors [7], [8], [9], [10] designed for floating-point
varying with different data points upon the request for a data sets, they still suffer from very limited compression
relative error bound. ratios [2], [3].
We propose two effective lossy compression strategies Lossy compressors have been developed for years and
which can adapt to different types of scientific data sets they can be categorized as either error-controlled or not.
with various data features. The first strategy is particularly Many of the existing lossy compressors (such as [11], [12])
suitable for the datasets with relatively consecutive/smooth are not error controlled, such that the reconstructed data
value changes in space (e.g., 2D or 3D datasets). This strat- may not be valid from the perspective of data researchers or
egy (a.k.a., block-based strategy) is to split the whole data users. Error-controlled lossy compression techniques have
set into multiple small blocks in the multidimensional space also been developed in the past decade. SZ [2], [3], [13],
and adopt a uniform error bound in each block. There are for example, provides multiple types of error controls, yet
several critical questions to answer: how to determine the all of them are based on a constant/uniform bound (a.k.a.,
error bound in one block? how to minimize the overhead absolute error bound) for all the data points. Another out-
of storing the metadata in each block? how to determine standing floating-point lossy compressor, ZFP [4], provides
the block size considering the tradeoff between the com- three types of error controls: fixed absolute error bound,
pression gains with possibly more relaxed error bounds and fixed rate, and fixed precision (the number of uncompressed
unavoidable metadata overhead? The other strategy (called bits per value). According to the developer of ZFP, its most
multi-threshold-based strategy) is designed particularly for efficient mode is fixed absolute error bound, but it still
the dataset with spiky value changes on adjacent data overpreserves the compression errors for a large majority
points (such as N-body simulation data). In this method, of the data points. Moreover, some specific compression
the whole value range will be split into multiple groups techniques are also customized for particular datasets based
with exponentially increasing thresholds and then the data on their features. R-index sorting [14], [15], for instance, is
will be compressed according to the groups separately with adopted to improve the compression ratio for the position
a common absolute error bound in the same group. As for fields of N-body simulation; Lee et al. [16] proposed to
this strategy, how to determine the absolute error bound for leverage the statistical features to improve the compression
each group and how to encode the group index for each ratio for time-series datasets.
data point would be two serious questions to answer. Relative-error-bound based compressors also have been
Our key contribution is twofold. On the one hand, we developed, however, they suffer from limited compression
answer the above-listed technical questions and propose quality. ISABELA [17], for example, allows compression
complete design/implementation methods for the above with point-wise relative error bounds; but its compression
two strategies based on the requirement of relative error ratio (the ratio of the original data size to the decompressed
bound. Further, we validate the effectiveness of the new data size) is usually around 4:1 [2], [3], [17], which is
solution by using real-world scientific data sets. Experi- far lower than the demanded level of the extreme-scale
ments show that our block-based strategy exhibits 17.2– application users. Moreover, its compression rate is about
618% higher compression ratios on climate simulation 5–10X lower than that of SZ [2], [3] and ZFP [4]. Sasaki et
data than other state-of-the-art existing compressors; our al. proposed a wavelet transform based lossy compressor
multi-threshold-based strategy outperforms other compres- [11], which splits data into a high-frequency part and low-
sors by 31–210% on cosmology simulation with the user- frequency part by the Wavelet transform and then uses
required point-wise error bound and comparable compres- vector quantization and Gzip [5]. Similar to ISABELA, the
sion/decompression time. Wavelet transform based method suffers from lower com-
The rest of the paper is organized as follows. In Section pression ratio than SZ and ZFP do [2]. Moreover, it cannot
2, we discuss related work. We formulate the scientific deal with the data sets having an odd number of elements at
data compression problem with the requirement of relative some dimension. FPZIP [18] allows users setting the number
error bound in Section 3. In Section 4, we describe the SZ of bits (called precision) to maintain for each data point
compression framework in details. In Section 5, we pro- during the compression. For single-precision floating-point
pose the block-based lossy compression strategy and multi- data sets, for example, precision = 32 indicates a lossless
threshold-based lossy compression strategy respectively. In compression in FPZIP and other settings with lower preci-
Section 6, we describe the experimental setting and present sions corresponds to different lossy levels of compression in
the evaluation results. We conclude in Section 7 with a the sense of point-wise relative error bound approximately.
vision of future work. FPZIP has two drawbacks: on the one hand, it is uneasy
for users to control the compression errors based on a
specific point-wise error bound. In practice, they have to try
2 R ELATED WORK compressing the data sets multiple times to estimate the pre-
Many data compressors have been developed to signifi- cision required corresponding to the expected error bound.
cantly reduce the data size. In general, the compressors On the other hand, its compression ratio is lower than
can be split into two categories, lossless compressor and our proposed solution with the same relative error bound,
lossy compressor. Since scientific data are mainly gener- to be shown in Section 6 in details. All in all, compared
ated/stored in the form of floating-point values each with with the existing error-controlled lossy compressors, our

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 3

proposed relative error bound based compression strategies TABLE 1


can improve the compression ratio by 17.2–618% on CESM- Formulas of 1, 2, 3-layer prediction for two-dimensional data sets
ATM climate simulation data [19] and by 31–210% on HACC
Prediction Formula
particular simulation data [1]. 1-Layer f (i0 , j0 ) = V (i0 , j0 − 1) + V (i0 − 1, j0 ) − V (i0 − 1, j0 − 1)
f (i0 , j0 ) = 2V (i0 − 1, j0 ) + 2V (i0 , j0 − 1)
2-Layer −4V (i0 − 1, j0 − 1) − V (i0 − 2, j0 ) − V (i0 , j0 − 2)
3 P ROBLEM F ORMULATION +2V (i0 − 2, j0 − 1) + 2V (i0 − 1, j0 − 2) − V (i0 − 2, j0 − 2)
f (i0 , j0 ) = 3V (i0 − 1, j0 ) + 3V (i0 , j0 − 1)
The research objective is to optimize the compression qual- −9V (i0 − 1, j0 − 1) − 3V (i0 − 2, j0 ) − 3V (i0 , j0 − 2)
ity for the lossy compression of scientific data with the con- +9V (i0 − 2, j0 − 1) + 9V (i0 − 1, j0 − 2) − 9V (i0 − 2, j0 − 2)
3-Layer
+V (i0 − 3, j0 ) + V (i0 , j0 − 3)
straint of the pointwise relative error bound (a.k.a, relative −3V (i0 − 3, j0 − 1) − 3V (i0 − 1, j0 − 3)
error bound for short). Specifically, given a data set S ={d1 , +3V (i0 − 3, j0 − 2) + 3V (i0 − 2, j0 − 3) − V (i0 − 3, j0 − 3)
d2 , · · · , dN } with N data points (where di refers to the data
point i in the data set), the reconstructed data set S ′ ={d′1 , 4 SZ C OMPRESSION F RAMEWORK
d′2 , · · · , d′N } must satisfy the following inequality:
We designed a novel, effective lossy compression model,
di − d′i namely SZ compression framework [3], which includes
max ( )≤ǫ (1)

di ∈S,di ∈S ′ di three fundamental steps: (I) value prediction on each data
point for the sake of decorrelation, (II) linear-scaling quan-
where ǫ is a small constant value (i.e., relative error bound)
tization surrounding the predicted value with equal-sized
specified by the user.
bins, and (III) variable-length encoding used to encode the
We present an example to further illustrate the definition
integer indices of the bins. We describe the three steps in the
of pointwise relative error bound. Suppose we are given a
following text.
set of data {1.23, 1.65, 2.34, 3.56, 10.0, 20.0} to compress with
a relative error bound of 0.01 (or 1%). Then the absolute
error bounds for the six data points will be 0.0123, 0.0165, 4.1 Step I: data prediction with reconstructed values
0.0234, 0.0356, 0.1, and 0.2, respectively. That is, as long as In our design, we scan the whole data set with an in-
the difference of the reconstructed value d′i and its original creasing order of dimensions, and the prediction values
value di is no greater than 1% of di , the lossy compression are generated for each data point based on their preceding
will be thought of as satisfying the error bound. processed neighbors. The prediction formulas are derived
A typical use-case that requires the relative error bound by constructing a surface using the neighboring data points
based compression is the compression of velocity fields in on one or multiple layers, as presented in Table 1, where
a cosmology simulation such as HACC [1]. In HACC, for (i0 ,j0 ) refers to the current data point to be predicted, V (x,y )
example, each particle has two parts of information, position refers to the corresponding value of a data point (x,y ),
{x,y,z} and velocity {vx,vy,vz}, each being involved in three and x=0,1,2,· · · , y =0,1,2,· · · . The detailed derivation of the
dimensions. So, there are six 1D arrays - x, y, z, vx, vy, and prediction formulas is described in our prior conference
vz - used to store the position and velocity information for publication [3]. In fact, more advanced prediction methods
all particles and the index of each array indicates the particle can be customized based on the characteristics of specific
ID. According to the cosmology researchers and HACC scientific data, which will be studied in our future work.
developers, the larger velocity a particle has, the larger error Now, the key question is how many layers we are
it can tolerate in general. In this case, the point-wise relative supposed to use for the data prediction. Before answering
error bound is required to compress the velocity data. the question, we must notice an important constraint: in the
Our objective is to maximize the compression ratio, sub- compression phase, the data prediction must be conducted
ject to the inequality (1), where compression ratio (denoted based on the previously decompressed values instead of
by γ ) is defined in Equation (2). the original data values, otherwise the data decompression
cannot respect the error bound strictly. In our prior work [3],
original data size
γ= (2) we presented that the one-layer prediction is the best choice,
compressed data size considering the impact of the inevitable data distortion of
In addition to the point-wise relative error bound (i.e., the reconstructed data points to the prediction accuracy of
Inequality (1)), another important evaluation metric is peak the current data point. We omit the details here because of
signal-to-noise ratio (PSNR), as defined in Formula (3). the space limitation.
dmax − dmin
psnr = 20 · log10 ( ) (3) 4.2 Step II: error-bounded linear-scaling quantization
rmse
q P The step II is a critical step for controlling the compression
1 N ′ 2
where rmse = N i=1 (di − di ) , and dmax and dmin errors. As illustrated in Figure 1, a set of consecutive bins
refer to the max and min value respectively. are constructed based on each predicted data value as the
PSNR is commonly used to assess the overall distortion center. The size of each bin is 2X error bound and the bins
between the original data and reconstructed data especially will be tagged as following indices: · · · , -3, -2, -1, 0, 1, 2, 3,
in visualization. Logically, the higher the PSNR is, the lower · · · . As such, the original raw value of each data point can
the overall compression error, indicating a better compres- be transformed to an integer value (i.e., the index of the bin
sion quality. In our experiments, we will evaluate both containing the corresponding true value).
maximum point-wise relative error and PSNR, in order to In absolute terms, the quantization bin index can be
assess the distortion of data comprehensively. calculated by Formula (4), and the center of each bin will

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 4

Bin index numbers 0 -1 1 2 1 -1

3
2
Linear 1 bin size = 2e
quantization

240

240
180
200
220

180
200
220
140
160

140
160
100
120

100
120
-220

-140
-120
-100

-220

-140
-120
-100
-200

-200
-160

-160
-240

-240
error bound e

20
40
60
80

20
40
60
80
-20

-20
-80
-60
-40

-80
-60
-40
0

0
-1 (a) (b)

-2 Fig. 2. Distribution produced by error-controlled quantization encoder on


ATM data sets of (a) value-range-based relative error bound = 10−3 and
-3
(b) value-range-based relative error bound = 10−4 with 255 quantization
Data value

Real value
Predicted value intervals (m = 8).

data index firstly sampling a very small portion (generally only 1%)
i i+1 i+2 i+3 i+4 i+5 …
of the data points in the whole data set by a systematic
Fig. 1. Illustration of linear-scaling quantization step sampling, then performing data prediction and quantization
on each of them and finally determining the appropriate
number of quantization bins accordingly. We present an
serve as the reconstructed data value of the corresponding
example pseudo-code in Algorithm 1. Without loss of gener-
data point during the decompression.
 ality, we assume the data set to compress is a 3D array with
d −d′

 ⌊ i2e i + 21 ⌋ di ≥ d′i three dimension sizes being equal to M , N , K respectively
bin index = (4) in this example.
 d′i −di 1 ′
−⌊ 2e + 2 ⌋ di < di Algorithm 1 C ALCULATING AN APPROPRIATE NUMBER OF

QUANTIZATION BINS
where ⌊⌋ refers to the floor function. Although step II may
Input: user-specified error bound e, lower bound of prediction hitting
introduce distortion of the data, the compression error (i.e., rate (denoted by η); maximum number of quantization bins (denoted
the distance between the center of the located bin and the by m); sampling distance s
real value of the data point) must be within the required Output: the appropriate number of quantization bins (denoted by P )
based on which the real prediction hitting rate will be greater than η
error bound e, since the bin’s size is twice as large as the
1: Allocate a buffer called intervals, with M elements; /*for keeping
error bound. the counts of different bin-index codes*/.
As discussed above, the error-bounded linear-scaling 2: for (i = 1→M − 1) do
quantization will transform all of the original floating-point 3: for (j = 1→N − 1) do
data values to another set of integer values, and a large 4: for (k = 1→K − 1) do
5: if ((i+j +k )% s == 0) then
majority of such integer values are expected to be very close 6: Perform the one-layer prediction to get d′i,j,k .
to 0, in that most of the raw data values are consecutive  ′
d −di,j,k

7: bin index = i,j,k2e + 21 . /*Formula (4)*/
in space. Figure 2 shows an example of the distribution of
quantization codes produced by our quantization encoder, 8: intervals[bin index]++.
9: end if
which uses 255 quantization bins to represent predictable 10: end for
data. From this figure, we can see that the distribution of 11: end for
quantization codes exhibits a fairly sharp shape and that 12: end for
13: target hit count = η(M −1)(N −1)(K−1).
the shape depends on the accuracy of the data prediction.
14: hitting count = 0.
The higher prediction accuracy, the sharper the distribution 15: for (r = 0→m) do
exhibits, and then the higher compression ratio we can get 16: hitting count += intervals[i].
by using a variable-length encoding, in which more frequent 17: if (hitting count ≥ target hit count) then
18: break.
symbols will be coded using fewer bits. In our implementa- 19: end if
tion, all the bin-index codes are actually mapped to positive 20: end for
integers by adding a constant offset, because we reserve 21: p = 2×(r+1). /*r +1 is half of the expected # quantization bins.*/
integer 0 to represent the unpredictable data that cannot be 22: P = 2h , where 2h−1 < p ≤ 2h , h=0,1,2,· · · . /*if p=28, then P =32*/
identified by the prediction + quantization method because
of limited number of quantization bins for possible spiky We describe Algorithm 1 as follows. In the beginning, we
data changes. The unpredictable data will be labeled and first allocate a counting buffer (namely intervals) in order
compressed by keeping only valuable bits in its IEEE 754 to count the number of hits for different bin-index codes.
binary representation, respecting the specified error bound. The data points will be selected by a systematic sampling
The last critical issue regarding the linear-scaling quanti- (line 2-5) with a sampling distance s, which means that only
1
zation step is how to set an appropriate number of quantiza- s data will be selected. Then, we compute the prediction
tion bins provided a specific data set. To this end, we design value for each sampled data point based on the one-layer
an efficient algorithm that can determine the appropriate prediction (line 7). Note that each bin here involves both
number of quantization bins. Specifically, we allow users cases in Formula (4) (i.e., di ≥d′i and di <d′i ). Thereafter, we
specifying a lower bound of the prediction hitting-rate, count the total number of hits with increasing number of
based on which our algorithm can estimate the required bins based on the counting buffer against the target hitting
number of quantization bins accurately. The design idea is count calculated by the lower bound of prediction hitting

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 5

rate (line 13-20). In the end, the appropriate number of Huffman encoding algorithm implemented in existing loss-
quantization bins will be finalized in the form of power of less compression package such as Zlib [20] treats the stream
two (line 22), in order to guarantee enough quantization of data always in the unit of bytes, which will definitely
bins on demand. cause a sub-optimal compression effect. Moreover, it fixes
Table 2 presents the effectiveness of reaching demanded the number of integer codes to 256, which cannot fit our
hitting rate by Algorithm 1 based on two data sets (CESM- error-bounded linear scaling quantization method that may
ATM and HACC), with a sampling rate of 1% (i.e., sampling output a large number of integer codes.
distance = 100; select one sample point every 100 data Specifically, the key difference between our improved
points) and two different error bounds (1E-4 and 1E-6 com- Huffman encoding algorithm and the traditional algorithm
pared to the value range). The column named ’sampled’ in is that our algorithm treats the sequence of quantization
the table refers to the percentage of predictable data covered codes (i.e., bin indices generated by Algorithm 1) based
by p quantization bins during the calculation of these bins on its integer values, which may lead to a large number
(i.e., Algorithm 1), and the column tagged ’real’ means the of nodes in the tree, such that how to store the tree as
percentage of the number of data points covered by the final effectively as possible needs to be handled carefully. We
P quantization bins during the data compression. It is ob- designed a recursive algorithm to convert the Huffman tree
served that when the lower bound of prediction hitting rate to byte stream, as shown in Algorithm 2. The initial node
is set to 80%∼99%, the minimum numbers of bins required is the root of the Huffman tree and its id is 0. Left node
(i.e., the value p calculated at line 21 in the algorithm) are array L and right node array R are used to record the
in the range of [2, 6515] and [24, 98294] for the two data left children and right children respectively. The tag array
sets respectively. The appropriate numbers of quantization is used to record whether the corresponding node is an
bins output by our algorithm is always no smaller than the internal node or a leaf. The algorithm adopts a pre-order
minimum requirement more or less, because of the line 22 traversal to scan all the nodes and pad the information into
(that is, the final number of quantization bins is rounded to four buffer arrays.
2h , where 2h−1 < p ≤ 2h and p=2(r+1)). In practice, we Algorithm 2 PAD T REE
recommend to set the lower bound of prediction hitting rate Input: left node array L, right node array R, tag array tag, and code
to 99%, which is also the default setting of SZ. array C, node’s id i, node’s object (denoted by node)
Output: left node array L, right node array R, tag array, code array C
TABLE 2
1: C [i] = node.c; /*record the code*/
Effectiveness of reaching demanded hitting rate under Algorithm 1
2: tag[i] = node.tag; /*record whether the node is internal or leaf*/
based on CESM-ATM and HACC data set with different error bounds
3: if (node.left tree 6= null) then
4: j ++; /*increment the global node ID j */
CESM-ATM data set (err=1E-4)
Demanded CLDLOW FLDSC 5: L[i] = j ; /*append nodej as nodei ’s left child*/
hitting rate p P sampled real p P sampled real 6: PAD T REE (L, R, tag, C , j , node.left tree); /*recursive call*/
80% 14 32 82.77% 96.42% 2 32 85.7% 99.7% 7: end if
90% 20 32 90.8% 96.42% 4 32 94.3% 99.7%
99% 68 128 99.04% 99.7% 16 32 99.04% 99.7% 8: if (node.right tree 6= null) then
HACC data set (err=1E-4) 9: j ++; /*increment the global node ID j */
Demanded position x velocity vx 10: R[i] = j ; /*append nodej as nodei ’s right child*/
hitting rate p P sampled real p P sampled real 11: PAD T REE (L, R, tag, C , j , node.right tree); /*recursive call*/
80% 24 32 80.1% 86.96% 202 256 80.2% 84.8%
90% 40 64 90.7% 95.3% 350 512 90.1% 94.9% 12: end if
99% 122 128 99.1% 99.3% 984 1,024 99% 99.1%
CESM-ATM data set (err=1E-6) The time complexity of the algorithm is O(N ), where
Demanded position x velocity vx
hitting rate p P sampled real p P sampled real N is the number of nodes, which is twice the number of
80% 1,178 2,048 80% 91.6% 70 128 80% 88.3% quantization bins. The storage overhead of saving the Huff-
90% 1,816 2,048 90% 91.6% 156 256 90% 93.4%
99% 6,518 8,192 99% 99% 1,462 2,048 99% 99.4% man tree is a constant with respect to the entire compression
HACC data set (err=1E-6)
Demanded position x velocity vx
size, because such an overhead is determined by the number
hitting rate p P sampled real p P sampled real of quantization bins, which is supposed to be a very small
80% 2,292 4,096 80% 90.5% 19,904 32,768 80% 89.1%
90% 3,708 4,096 90% 90.5% 34,706 65,536 90% 97% constant (such as 65536 quantization bins) compared to the
99% 11,970 16,384 99% 99.15% 98,294 131,072 99% 99.6%
number of data points (such as 230 particles in one snapshot
of HACC simulation) without loss of generality.

4.3 Step III: a customized variable-length encoding


5 O PTIMIZED L OSSY C OMPRESSION BASED ON
We adopt Huffman encoding method in step III, because
Huffman encoding has been proved as an optimal variable- P OINT- WISE R ELATIVE E RROR B OUND
length encoding approach in most of cases (especially when The SZ compression framework is particularly suitable for
the distribution of codes follows a normal distribution). In lossy compression with absolute error bound because the
order to optimize the compression quality, we improved the sizes of all bins in the linear-scaling quantization step can
Huffman encoding algorithm based on the characteristic of be always the same throughout the data set, such that the
the error-bounded linear scaling quantization bins. As dis- bin size is a constant metainformation for the whole data
cussed previously, the appropriate number of quantization set. The relative error bound demand, however, will lead to
bins calculated by Algorithm 1 could be very large. Experi- various absolute error bounds across different data points
ments based on HACC data sets (velocity field), for instance, because the actual absolute error bounds are proportional to
show that the number of quantization bins could reach up to individual data values. Because of significant overhead, we
200k if the required error bound is 10−5 while the expected cannot store the absolute error bound for each data point. In
prediction hitting rate is set to 99%. However, the traditional this section, we describe how we design and implement the

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 6

pointwise-relative-error-based lossy compression under the 5.2 Multi-threshold-based Strategy


SZ compression framework. In our solution, we explore two Note that the block-based strategy may work effectively,
strategies to address this issue, and they are called block- especially when the data exhibit relatively high coherence in
based strategy and multi-threshold-based strategy, respec- the space. However, this assumption may not always hold.
tively. The elements of the velocity array in HACC, for example,
represent the velocity values of different particles, thus the
adjacent data in the array could be largely different, as
5.1 Block-based Strategy presented in Figure 4.
The block-based strategy first splits the whole data set into 5000
4000 3000
many small blocks and then computes an absolute error 3000 2000

Data value
2000
1000
bound for each block based on the relative error bound. The 1000
0 0
-1000
data points inside each block will share the same absolute -2000
-1000
-3000 -2000
error bound during the compression, as shown in Figure -4000 -3000
-5000
3. In this sense, we just need to store an absolute error Array Index (Particle ID) Array Index (Particle ID)
bound for each block. In fact, to minimize the overhead,
we keep the first two bytes of the IEEE 754 floating-point Fig. 4. Illustration of vx data array in a snapshot of HACC simulation
representation of the absolute error bound because doing To address this issue, we propose a multi-threshold-
so already provides a fairly accurate estimate of absolute based strategy. The basic idea is splitting the whole value
error bound for each block. On the other hand, as confirmed range into multiple groups with exponentially increasing
by our previous work [2], [3], the data in local regions are
thresholds, as presented in Figure 5, and then performing
generally smooth, such that a common absolute error bound the SZ compression (including prediction and quantization)
in a small block could be used to approximate the real in each group separately. This idea is largely different from
absolute error bounds calculated based on the individual the existing multi-level thresholding based image compres-
values. sion method [21], [22], which splits the data set into multiple
Split
data
M1 M2 M3 M4 non-overlapping 4×4 blocks and uses two values (called
B1 B2 B3 B4 C1 C2 C3 C4
into
M5 M6 M7 M8
mean-of-high-values and mean-of-low-values) to substitute
blocks
B5 B6 B7 B8 C5 C6 C7 C8 the pixel values in each block. The threshold determining
M9 M10 M11 M12 the high values and low values is calculated based on Shan-
B9 B10 B11 B12 C9 C10 C11 C12
M13 M14 M15 M16
non entropy. Unfortunately, this method is unable to respect
B13 B14 B15 B16 C13 C14 C15 C16 the point-wise relative error bound and also suffers from
low compression rate (as confirmed in [22]). Our strategy
Pack data into
compressed formats also has nothing to do with the Wavelet-based thresholding
Simply store global meta compression method [23] and recursive thresholding tech-
Seg_size Relative_bound …… data as they are nology [24]. The former aims to reduce the noise of the
M1 M2 M3 M4 M5 M6 …… Truncation (keep only Wavelet-transformed coefficients in the detail subbands (or
two bytes per data point) high resolution subbands) based on a thresholding function
C1 C2 C3 C4 C5 C6…… Prediction + Quantization
+ Huffman-encoding
such that the mean squared error (MSE) could be mini-
Legend
mized. The latter optimized the image segmentation effect,
Global meta data meta data Block-based raw data B# by recursively segmenting the objects in the image data from
Block-based meta data M# Block-based compressed bytes C# the lowest intensity until there is only one object left.
In what follows, we will describe our multi-threshold-
based strategy. We mainly answer three critical questions:
Fig. 3. Illustration of block-based strategy
(1) how to construct the multiple groups, (2) how to perform
In our design, users have three options to generate the the compression for the data points in each group, and (3)
absolute error bound for each block: calculating it based on how to encode the group IDs (a.k.a., group indices).
the minimum value, average value, or maximum value in
A B
each block. If the user chooses the minimum value option, 24
the final compression error must be strictly following the rel- 232
values

21
ative error bound for each data point. However, this mode 2
-212
may overreserve the accuracy, especially when the data -2
-23
exhibit spiky changes in local areas, such that the absolute -24 i j k
compression errors may be largely smaller than the de facto
error bound, causing a poor compression factor. To address Fig. 5. Illustration of multi-threshold-based strategy
this issue, the users can use the average value option to
Each data point in the whole data set belongs to only one
approximate the absolute error bound for each block, which
group, depending on its value. Specifically, the group ID of
may effectively avoid the issue of over-reserving accuracy.
a data point di can be calculated by the following equation:
By comparison, the maximum value of the absolute error
bound provides the most relaxed error controls with mul- Expo(2) (di ), di ≥ 0 & Expo(2) (di) ≥ λ

tiple resolutions yet with the highest possible compression GID(di )= 0, di ≥ 0 & Expo(2) (di) < λ (5)
ratio. 
−GID(−di ), di < 0

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 7

where Expo(2) (di ) refers to the base-2 exponent of the value Algorithm 3 M ULTI -T HRESHOLD - BASED D ESIGN FOR THE
di , which can be quickly received by reading the exponent C OMPRESSION WITH P OINT- WISE R ELATIVE E RROR B OUND
part of the IEEE 754 representation. The λ is the lower bound Input: relative error bound ε, lower-bound of threshold λ (=0 by
default), maximum group ID G (2G is maximum number of groups), n
of the exponent threshold under which the corresponding data points
values will be compressed by an absolute error bound, in Output: Byte steam of compressed data
order to avoid too many groups to deal with and over- 1: Construct buffers pos group buf for positive data points;
reservation of the precision during the compression (to be 2: Construct buffers neg group buf for negative data points;
3: Allocate memory for the array dataGrpCode adn dataQuntCode;
discussed in more details later). For example, if λ is set to 4: for (i = 1→n − 1) do
0, then we can get Expo(2) (0.123) = 0, Expo(2) (1.23) = 0, 5: if (di ≥0) then
Expo(2) (12.3) = 3 and Expo(2) (123) = 6. In our implementa- 6: g buf = pos group buf ;
tion, λ is set to 0 by default. 7: else
8: g buf = neg group buf ;
The pseudo-code of the multi-threshold-based compres- 9: end if
sion is presented in Algorithm 3. The compression in the 10: Compute group number g = GID(di ) based on Formula (5);
11: if (|g|>G or g buf [|g|] = φ) then
multi-threshold-based strategy includes three steps: data 12: Treat di as unpredictable data point;/*binary analysis*/
prediction, quantization of prediction errors and Entropy- 13: g buf [|g|] = d′i ; /*d′i here is the decompressed value of di */
encoding, based on the SZ compression framework. Unlike 14: else
the original design of SZ compressor, we need to declare a 15: pred = g buf [|g|]; /*|g| is absolute value of g */
16: perr = |pred−di |;
buffer for all groups, and each element of the buffer keeps 17: Compute real error bound (denoted ǫ) in the group g ;
a preceding value in the corresponding group. Specifically, 18: dataQuntCode[i] = 12 · perrǫ+1
;
there are two types of such buffers, called pos group buf and 19: dataGrpCode[i] = g ;
neg group buf, to deal with positive values and negative 20: g buf [|g|] = d′i ; /*Put decompressed value in the buffer.*/
21: end if
values respectively (see line 1-2). In the main loop, we 22: end for
need to select the corresponding last-preceding-value buffer 23: Compress dataQuntCode by our customized Huffman tree;
g buf [|g|] first (line 5-10). In the prediction step, the value 24: Compress dataGrpCode by our customized Huffman tree;
of a data point is predicted by the preceding data value in
its group (line 15). In particular, the size of the quantization
in the data set but the preceding data points in the same
bins used in the quantization step is determined based on
group, such that the prediction accuracy could be improved
the group ID (line 17). For instance, if the user expects to
significantly especially when the data are not smooth in
do the compression strictly using a relative error bound
the array. As shown in Figure 5, for instance, the segment
on each data point, the bin size should be always set to
A includes three data points i, j , and k , whose values
twice as large as the absolute error bound calculated in
are largely different from their previous neighbor values
terms of the bottom threshold line of the data point’s group.
yet are very close with each other in the same group. We
Specifically, for the data point di (it is located in the group
compare the distribution of prediction errors (PDF) for the
[2k ,2k+1 ]), the absolute error bound should be set to ε·2k ,
multi-threshold-based design and the traditional neighbor
where k =GID(di ) according to Formula (5) and ε is the
prediction method in Figure 6 (a), by using the HACC data
relative error bound. For instance, if λ is set to 0 in Formula
set (vx field). We can clearly observe that the former leads
(5), 0.123 and 1.23 will be compressed by the absolute error
to a sharper distribution, which means a higher prediction
bound of ε because 2GID(0.123) ·ε=2GID(1.23) ·ε=ε; 12.3 and
accuracy. On the other hand, the absolute error bound of
123 will be compressed based on the absolute error bounds
each data point is dependent on its value, so the size
of 23 ε and 26 ε respectively. The setting of λ determines
of each quantization bin would get larger than the fixed
the value range in which the data will be compressed by
bin size used in the traditional compression model with
an absolute error bound to avoid the over-reservation of
a global absolute error bound. The diverse bin sizes will
precision. Table 3 presents the calculated values of the real
lead to a more intense clustering of the data points in the
absolute error bounds for different groups. We can clearly
quantization bins, i.e., more data points will be represented
see that such a design realizes a multi-resolution effect to
by fewer integer bin indices after the quantization step. We
control the compression errors of the data points in various
use Figure 6 (b) to illustrate the distribution of data points
ranges: i.e., the larger the values, the larger the absolute
located in different quantization bins (−500∼500) based on
error bounds.
the relative error bound of 0.001 and absolute error bound of
TABLE 3 0.1 respectively. In the figure, the two distributions exhibit
An example of absolute error bounds calculated by
exponential-increasing value ranges (λ=0) very similar shapes. This means that in order to reach the
similar quantization effectiveness (or compression ratio),
ε ··· (-64,-32] (-32,-16] (-16,-8] (-8,-4) (-4,-2) (-2,-1] (-1,0) the absolute error bound has to be increased up to two
0.1 ··· 3.2 1.6 0.8 0.4 0.2 0.1 0.1
0.01 ··· 0.32 0.16 0.08 0.04 0.02 0.01 0.01
orders of magnitude to the relative error bound, leading to a
0.001 ··· 0.032 0.016 0.008 0.004 0.002 0.001 0.001 significant loss of precision on the close-to-zero data points.
ε [0,1) [1,2) [2,4) [4,8) [8,16) [16,32) [32,64) ···
0.1 0.1 0.1 0.2 0.4 0.8 1.6 3.2 ··· We adopt Huffman encoding to encode the difference of
0.01 0.01 0.01 0.02 0.04 0.08 0.16 0.32 ··· adjacent group IDs. The reason is that the data may still
0.001 0.001 0.001 0.002 0.004 0.008 0.016 0.032 ···
have a certain coherence, even for the particle’s velocity
There are two significant advantages in the multi- arrays, as as we observed in Figure 4. That is, the difference
threshold-based design. On the one hand, the data predic- of adjacent group IDs is likely to be around 0 in most of
tion step is not subject to the adjacent neighbor data points cases, as confirmed in Figure 7. This figure demonstrates

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 8

Probability Density Function (PDF)


0.06
multi-threshold-based prediction
0.00025
relative err bound = 1E-3
the value range to an expected space on demand, as
0.05 pre-neighbor data prediction
0.0002
absolute err bound = 0.1 discussed above.

Percentage of data
0.04
0.00015
0.03
0.0001
6 E VALUATION OF C OMPRESSION QUALITY
0.02
-5
In this section, we evaluate the compression quality of
0.01 5x10
our pointwise relative-error-based compression technique
0 0
on two real-world HPC simulation data sets, and compare
-3

-2

-1

10

20

30

-5
-4
-3
-2
-1
0
10
20
30
40
50
00

00

00

00

00

00

00
00
00
00
00

0
0
0
0
0
0

prediction error quantization bins it with other state-of-the-art work.


(a) Distribution of prediction errors (b) Distribution of # data in bins
6.1 Experimental Setting
Fig. 6. Distribution analysis of multi-threshold-based design The state-of-the-art compressors in our evaluation are ZFP
[4], FPZIP [18], and ISABELA [17], all of which support
the distribution of the differences of adjacent group IDs pointwise relative-error-based compression to a certain ex-
when the multi-threshold-based strategy is adopted on the tent. ZFP provides three compression modes: error-bound-
vx array of a snapshot in the HACC cosmology simulation. based compression, rate-fixed compression, and precision-
In particular, there are 50% of data points belonging to based compression. The precision-based mode allows users
the group 0, and the three most intensive groups (ID=−1, to set the number of bits to keep for the coefficient values
0, 1) occupy up to 80% of data points. In this situation, in the orthogonal transformed space, so as to achieve an
Huffman encoding can significantly reduce the storage size approximate effect of pointwise relative-error based com-
for keeping group IDs, in that most frequent group IDs pression. FPZIP also allows users to set a number of bits to
would be represented by very short codes. maintain for each data point, leading to the pointwise rela-
tive error controls. Unlike ZFP and FPZIP, ISABELA allows
Probability Density Function (PDF)

0.6
users setting a pointwise relative-error bound explicitly.
0.5
In our experiments, the simulation data are from
0.4
two well-known scientific applications: CESM Atmosphere
0.3
model (CESM-ATM) and Hardware/Hybrid Accelerated
0.2

0.1
Cosmology Code (HACC). They represent typical meteo-
0
rological simulation and cosmology simulation respectively.
-25 -20 -15 -10 -5 0 5 10 15 20 25 We also evaluate Hurrican simulation data [25], which ex-
group ID (i.e., group Index)
hibits the similar compression features with that of CESM-
Fig. 7. Distribution of adjacent group ID codes (HACC: vx)
ATM. It is not shown in the paper because of the space
limitation.
Remark:
• CESM is a well-known climate simulation code that
• The multi-threshold-based strategy is particularly suit- may produce large volumes of data every day [19],
able for the scientific data that exhibit a rather low [26]. The CESM-ATM data comprise 60+ snapshots,
spatial coherence (i.e., spiky changes in local areas). The each containing 100+ fields. We performed compression
reason is that unlike the block-based strategy, multi- on each of them and observed that many fields exhibit
threshold-based strategy does not closely rely on the similar compression results. Therefore, we illustrate the
smoothness/coherence of the data. compression results based on four representative fields
• If the value range of the data set is very small, e.g., (CLDLOW, CLDHGH, FLDSC and PHIS).
if it is in the range [−2,2], there would be only a few • HACC is a large-scale cosmology simulation code de-
groups based on the base-2 thresholds. In this case, veloped at Argonne National Laboratory. There are also
the user can amplify the value range by introducing dozens of snapshots, each of which has 230 particles
a multiplier θ on each data point such that all the emulated. Each particle involves six fields (x, y, z, vx,
data will be mapped to another space which is θ times vy, vz) stored as single-precision floating point values,
as large as the original value range, and we call this so the total original data size of each snapshot is 24GB
transform “θ-mapping”. Considering the minimization in the evaluation.
of the transformation overhead, θ is supposed to be set For the evaluation metric regarding data distortion, we
to 2c such that we just need to add c onto the exponent adopt the average relative error, maximum relative error,
part of the IEEE 754 representation of each original data and peak signal-to-noise ratio (PSNR). PSNR represents the
value in the θ-mapping. In the course of decompression, overall distortion of data from the perspective of visual-
the reconstructed data just need to go through a reverse ization and is calculated as Formula (3). PSNR is non-
procedure of θ-mapping (i.e., divide every data point by negligible even if one chooses the relative error bound to
θ) to get the original data value. do the compression, because large distortion of overall data
• Based on the Equation (5), all the data points whose visualization is undesired. In addition to PSNR, we will also
values are in the range [−1,1] belong to the group ID demonstrate the impact of decompressed data on the visu-
0, hence their absolute error bounds are a constant ε. alization of physics-related metrics such as kinetic energy
That is, the reconstructed values for these data points and moving direction of particles in HACC simulation.
may not follow a strict relative error bound, while such We also assess the I/O performance gain when using our
a distortion is tolerable because the user can amplify lossy compressor against other related work by performing

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 9

a parallel experiment on a supercomputer [27] using 2,048 TABLE 4


cores (i.e., 64 nodes, each with two Intel Xeon E5-2695 Comparison of Compression Results among Different Compressors
(CLDLOW field in CESM-ATM simulation)
v4 processors and 128 GB of memory, and each processor
with 16 cores). The storage system uses General Parallel Compressor Setting bounded ε̄ max ε PSNR CR
File Systems (GPFS). These file systems are located on a SZ ǫ=1E-2 100% 0.00466 0.009997 52.64 18.6
raid array and served by multiple file servers. The I/O (MIN mode) ǫ=1E-4 100% 4.65E-5 1E-4 92.56 4.99
and storage systems are typical high-end supercomputer SZ ǫ=1E-2 98.6268% 1.03E+7 1.66E+13 52.7 18.67
(AVG mode) ǫ=1E-4 98.646% 1.03E+5 1.34E+11 92.7 5.12
facilities. We use the file-per-process mode with POSIX I/O SZ ǫ=1E-2 95.171% 1.5E+8 6.2E+14 52.37 19.63
[28] on each process for reading/writing data in parallel 1 . (MAX mode) ǫ=1E-4 95.199% 4.4E+5 1.03E+12 92.4 5.28
Pc=16 99.96% 4.2E+5 1.26E+12 85.94 4.04
ZFP [4] Pc=18 99.977% 1.7E+5 4.15E+11 97.86 3.61
Pc=20 99.984% 2E+5 3.47E+10 109.9 2.98
6.2 Evaluation based on CESM-ATM data
Pc=15 87.2515% 0.00566 0.0154 51.32 20.3
Table 4 presents the compression results generated by dif- FPZIP [18] Pc=16 100% 0.00284 0.0078 57.32 15.87
Pc=22 97.2724% 4.4E-5 1.22E-4 93.44 4.49
ferent compressors based on the CLDLOW field in the Pc=23 100% 2.2E-5 6.1E-5 87.5 3.95
CESM-ATM simulation. Our solution, SZ, has three optional ISABELA [17] ǫ=1E-2 99.9997% 0.00226 1 58.8 2.59
modes—MIN, AVG, and MAX—meaning that the absolute ǫ=1E-4 99.9952% 5.15E-05 1 92.8 1.39

error bound will be calculated based on the minimum,


average, and maximum value in each block, respectively. We each block changes little with different block sizes as long
set the block size in our solution to 5x5 because of its opti- as the block size stays small. We can also observe that the
mality (to be shown later). From the table, we can see that maximum relative errors are always close or exactly equal
only SZ(MIN mode) and FPZIP can control the pointwise to specified bounds (1E-2 or 1E-4).
relative error perfectly. We present the percentage of the data
TABLE 5
points that meet the relative error bound requirement in the
Compression results of SZ(MIN) with different block sizes
third column. Since ZFP has no explicit relative error bound
mode, we compute the bounded percentage for its three Compression Ratio
compression cases (Pc=16,18,20) based on the relative error Setting 4x4 5x5 6x6 7x7 8x8
1E-2 17.04 18.6 17.13 18.21 17.48
bound of 1E-2. Since FPZIP supports only precision setting 1E-4 4.89 4.99 5.01 5.02 5.02
to control the errors, we have to try different precisions to Maximum Relative Error (i.e., max ε)
tune the point-wise relative error bound to a target level. Setting 4x4 5x5 6x6 7x7 8x8
1E-2 0.01 0.009997 0.00998 0.009998 0.009998
Specifically, Pc=16 and Pc=23 correspond to point-wise rela- 1E-4 1E-4 1E-4 1E-4 1E-4 1E-4
tive error bounds of 1E-2 and 1E-4, respectively. In addition, PSNR
ε̄ and max ε denote the average relative error and maximum Setting 4x4 5x5 6x6 7x7 8x8
1E-2 52.27 52.64 52.92 52.99 53.1
relative error, respectively; and ǫ refers to the relative error 1E-4 92.33 92.56 92.86 92.68 92.83
bound setting for SZ. For other compressors, we present the
We present in Figure 8 the overall compression ratios
compression results based on a similar maximum relative
based on four representative fields, with various relative
error or PSNR. SZ(MIN mode) exhibits the best results, as
error bounds under the four different compressors. We set
it not only leads to the highest compression ratio among
precisions of FPZIP to 13, 16, 19 and 23, respectively, because
the compressors studied, but also strictly limits the relative
such settings can respect the point-wise relative error bound
errors. In quantitative terms, when the relative error bound
of 1E-1, 1E-2, 1E-3, and 1E-4, respectively. ZFP adopts the
is set to 0.01, SZ(MIN mode) achieves a compression ratio
precisions of 14, 16, 18, and 20, respectively. We adopt block-
up to 18.6, which is higher than FPZIP, ZFP, and ISABELA
based strategies on all fields except for PHIS which adopts
by 17.2%, 360%, and 618%, respectively. SZ(MAX mode)
threshold-based strategy because pretty small values are
achieves higher compression ratios because of its more re-
scattered throughout this dataset (leading to severely over-
laxed relative error bound settings. Because of space limits,
preserved precisions unexpectedly). Based on Figure 8, it is
we cannot present the compression/decompression time
observed that SZ leads to about 50% higher compression
here. Basically, we note that SZ, ZFP, and FPZIP lead to
ratios than the second best compressor FPZIP based on the
similar compression times, while ISABELA is much slower
point-wise relative error bound.
due to its data-sorting step.
We explore the optimal setting of the block size for 35
SZ
FPZIP
the relative-error-bound-based compression under the SZ 30 ZFP
ISABELA
model, as presented in Table 5. We perform the compression 25
Compression Ratio

evaluation using different block sizes (from 4x4 to 8x8) and 20


observe that the compression qualities are similar. For in-
15
stance, the compression ratios are in the range of [17.04,18.6]
10
and [4.89,5.02] when the relative error bounds are set to
5
0.01 and 0.0001, respectively. The reason is that the climate
simulation data exhibit relatively high smoothness in the 0
1E-1 1E-2 1E-3 1E-4
space, such that the absolute error bound approximated for Relative error bounds

1 POSIX I/O performance is close to other parallel I/O performance


Fig. 8. Compression Ratio on CESM-ATM data
such as MPI-IO [29] when thousands of files are written/read simulta-
neously on GPFS, as indicated by a recent study [30]. Table 6 presents the compression rate and decompres-

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 10

sion rate of compressing CESM-ATM data by the four the two compression modes using four representative fields
state-of-the-art compressors respectively. The relative error in CESM-ATM. We set the relative error bound to 1E-5
bound is set to 1E-2 for SZ and ISABLEA; the precision bit- and compare the two compression modes under the same
count is set to 16 and 20 for FPZIP and ZFP respectively, compression ratio, with different levels of visual granularity.
such that they will have approximate relative errors with The visual granularity is a threshold to select the areas with
SZ and ISABELA according to our above analysis. The small values in the data set. For instance, if it is set to 0.01,
four compressors are all executed on the Argonne Bebop only the data values that are lower or equal to 0.01 will
cluster server [27] and Table 6 demonstrates the single-core be selected to check the PSNR. Through the figure, we can
processing rates. It is observed that FPZIP has the fastest clearly see that the PSNR could be retained at a high level
speed on compression while SZ is the fastest compressor with finer visual granularity under the relative error based
on decompression. Specifically, SZ is slower than FPZIP by compression, while it degrades significantly in the absolute-
35-56% on compression while it is faster than FPZIP by error-based compression.
48-66% on decompression. ISABELA exhibits much lower
140
compression rate than others because of its costly sort- 120
ABS_Err_based_Compression
REL_Err_based_Compression
160
140
ABS_Err_based_Compression
REL_Err_based_Compression
ing operation during the compression. In addition, in our 100 120
100

PSNR

PSNR
80
evaluation, we also observe that SZ exhibits similar com- 60
80

pression/decompression rate when using the absolute error 40


60
40
bound setting and relative error bound setting respectively. 20 20
0 0
The relative error bound based compression is faster some-

1E
1E 6
0. 5
0. 01
0. 1
0. 3
0.
0.
0.
1

1E
1E 6
0. 5
0. 01
0. 1
0. 3
0.
0.
0.
1
00
00
00
01
03
1
3

00
00
00
01
03
1
3
-0
-0

-0
-0
times because of the denser distribution of quantization bins Visual Granularity Visual Granularity
leading to a more efficient Huffman encoding operation,
(a) CLDLOW (b) CLDHGH
while the absolute error bound based compression could
120 180
be faster because of the cost in calculating statistical error ABS_Err_based_Compression
REL_Err_based_Compression 160
ABS_Err_based_Compression
REL_Err_based_Compression
100
bound for each block in the block-based strategy. 80
140
120

PSNR

PSNR
60 100
TABLE 6 80
Compression/Decompression rate (MB/s): CESM-ATM data 40 60
20 40
20
0
Compressor CLDLOW CLDHGH FLDSC PHIS 0
1E
0. 5
0. 1
0.
0.
0.
0.
0.
1

0.

0.

0.

0.

0.

0.

1
00
00
00
01
03
1
3
SZ 53.7/130 54.9/137 65.1/190 49.3/99
-0

00

00

01

03

3
0
1
3

3
FPZIP 95.1/88 85.2/82.4 145.4/117.7 79.7/75 Visual Granularity Visual Granularity
ZFP 103/118 91.6/95 103/145 80/103
ISABELA 4.05/12.5 4.34/13.4 4.86/13.5 1.01/12.6
(c) FLDSC (d) PHIS

We present the visualization images about absolute- Fig. 10. Evaluation of PSNR with Different Visual Granularity
error-bound-based compression and pointwise-relative-
In Figure 11, we compare the visualization image of
error-based compression in Figure 9. The absolute error
the decompressed data under the two compression modes
bound and relative error bound are set to 1.1E-3 and 1E-
using the field CLDLOW with the value range [0,0.001].
2, respectively, such that they have the same compression
Because of the space limitation, we do not show the original
ratio. We observe that the two figures both look the same
image, which looks exactly the same as Figure 11 (b). The
as the original image at the full resolution to the naked
MAX mode with ǫ=0.01 is adopted for the relative-error-
eyes, indicating that the overall visualization effect under
based compression. We can observe that the image under
the relative error bound is not degraded at all.
relative-error-based compression is exactly the same as the
original one, which confirms the high effectiveness of our
relative-error-based compression technique to maintain de-
tails. By comparison, the absolute-error-based compression
distorts the visualization considerably in the areas with
small values. For instance, some areas that were originally
blue/purple exhibit other colors (such as yellow and cyan)
because of the changed values after the reconstruction.

(a) Abs Err Cmpr (b) Rel Err Cmprs

Fig. 9. Visualization of Decompressed Data (CLDLOW) with Different


Error Bounds
Since the smaller the value the lower the error bound,
the relative-error-based compression is supposed to have
greater ability to retain the details than does the absolute-
error-bound-based compression, in the areas with a lot (a) Abs Err Compression (b) Rel Err Compression
of small data values. This is confirmed in Figure 10,
which evaluates PSNR with different precisions between Fig. 11. Visualization of Decompressed Data (CLDLOW:[0,1E-3])

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 11

6.3 Evaluation based on HACC data decompression. By comparison, ISABELA suffers from very
In this subsection, we present the evaluation results by low compression/decompression rate because of its costly
using different lossy compressors on the HACC data set. sorting operation.
As discussed previously, each particle involves six fields, TABLE 7
which can be split into two categories - position and velocity. Compression/Decompression Rate (MB/s): HACC data
According to the HACC developers, the users require to
Compressor vx vy vz
use absolute error bound to compress position values while SZ 48/88 46.8/77 50/90
they hope to compress the velocity values using point- FPZIP 71.5/68.7 74.5/69.2 74.2/68.3
wise relative error bounds. The reason is that the different ZFP 68.7/60.1 68.2/59.2 66.1/60.2
ISABELA 2.95/14.9 2.85/15.1 2.9/15.3
positions of the particles in the simulation are equivalently
significant in space, yet the lower velocity of a particle, the As for the impact of decompressed data in the analysis,
higher demand on the compression precision in general. Ac- we first study particles’ kinetic energy in the simulation
cordingly, we mainly focus on the evaluation of the velocity space. A particle’s kinetic energy is defined as follows:
data compression with point-wise relative error bound in 1
our experiment. E(particlei ) = mi · |vi |2 (6)
2
In what follows, we first compare the compression
quality among different lossy compressors (including SZ, where mi is the mass of the particle and |vi | is the potential
ZFP, FPZIP and ISABELA), in terms of compression ratio of the velocity vector vi . Since HACC simulation assumes
and compression/decompression rate. Then, we analyze the that each particle has the same mass, we set mi to 1 in
impact of decompressed data on particle’s kinetic energy our evaluation. In order to compare the compression quality
and moving direction, based on absolute error bound and using absolute error bound versus relative error bound, we
relative error bound respectively. We observe that the com- compress the data with absolute error bound = 6.2 and
pression with relative error bound can keep a very satisfac- relative error bound = 0.1 respectively, because they lead
tory overall visualization effect on both kinetic energy and to the same compression ratio (about 6.2:1).
moving direction, and also keep the details of low-velocity Figure 13 plots the slice images of the accumulated
particles more accurately than the compression with abso- kinetic energy in the space. Specifically, we split the space
lute error bound under the same compression ratio. into 1000 slices each having 1000×1000 blocks and then
Figure 12 presents the point-wise relative error bound aggregate the kinetic energy of the particles in the middle
based compression ratio of the four compressors, using 10 slices, plotting the slice images based on three datasets
the HACC velocity data (vx, vy and vz). As for FPZIP (original raw data and two decompressed datasets with dif-
and ZFP, we ran them using multiple precisions and select ferent types of error bounds) respectively. We also zoom in
the results with the maximum relative errors closest to the bottom-left square by splitting it further into 1000×1000
target error bound. We can observe that SZ leads to 31%- small blocks to observe the possible difference. We compare
210% higher compression ratio than other compressors do. the three slice images at a full resolution, while they look
The key reason SZ works more effectively than others on exactly the same with each other. This means that the point-
HACC dataset is that we adopt the multi-threshold-based wise relative error bound = 0.1 can already lead to a satisfac-
strategy which can effectively reduce the prediction errors tory effect from the perspective of the overall visualization.
(as shown in Figure 6 (a)) and the group ID could also be We also analyze the impact of the decompressed data
compressed effectively because of the sharp distribution of on the aggregated moving directions of the particles in the
their corresponding codes (as demonstrated in Figure 7). simulation. We just analyze the moving direction along the
Z-axis here, because the studies based on Y-axis and X-axis
6.5
6
SZ
FPZIP
lead to the similar results. The aggregated Z-axis moving
ZFP
5.5 ISABELA direction of the particles in a block is defined as as the sum
5
of the moving direction of the particles along Z-axis in the
Compression Ratio

4.5
4 same block, as shown blow:
3.5 X vzi
3
Amd(Z -axis) = q (7)
2.5
2 particlei ∈BlockI vx2i + vyi2 + vzi2
1.5
1
1E-1 1E-2 1E-3 1E-4
where vxi , vyi and vzi are the velocity values along the three
Relative error bounds dimensions respectively. Figure 14 presents the aggregated
moving direction of the 10 middle slices for the original
Fig. 12. Compression Ratio on HACC velocity data data and decompressed data respectively. The regions with
Table 7 shows the single-core compres- positive values and negative values indicate the overall
sion/decompression rates of the four compressors running particles in those regions are moving closer to or farther
on Argonne Bebop cluster server [27], using HACC velocity away from the observer respectively. We can see that the
data with the relative error bound of 0.01. It is observed two types of decompressed data are both very satisfactory to
that FPZIP and SZ are the fastest compressors with users from the perspective of the overall visualization effect.
respect to the compression and decompression respectively. Specifically, it is extremely hard to observe a tiny difference
Specifically, FPZIP is faster than SZ by about 50% on the among the three images even though we zoom in them to a
compression while SZ is faster than FPZIP by 30% on the very high resolution.

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 12

100

10

80

Accdmdlated moving direction


70

60

50

40

30

20

10

00 10 20 30 40 50 60 70 80 10 100

(a) Raw data visualization

100

10 (a) Original raw data


80

70

60

50

Accdmdlated moving direction


40

30

20

10

00 10 20 30 40 50 60 70 80 10 100

(b) Decompressed data with ABS ERR=6.2


100

10

80

70
(b) Decompressed data with ABS ERR=6.2
60

50

40

30

Accdmdlated moving direction


20

10

00 10 20 30 40 50 60 70 80 10 100

(c) Decompressed data with REL ERR=0.1

Fig. 13. Visualization of accumulated kinetic energy (HACC)

However, the compression with point-wise relative error


bound can retain the details much more effectively than the
compression with absolute error bound, especially for the (c) Decompressed data with REL ERR=0.1
low-velocity particles, as illustrated by Figure 15. This figure
Fig. 14. Particle moving direction along Z-axis (HACC)
presents the accumulated speed’s angles between original
particles and their corresponding decompressed particles,
based on the particles located in the middle 10 slices and velocity particles with fast speeds. However, the zoomed-in
with velocity potential lower than 100. In this figure, the figure shows that compression with relative error bound has
brighter color a region exhibits, the larger the distortion (i.e., little absolute errors while the compression with absolute
accumulated speed’s angles) between the original particles error bound suffers from too large errors to tolerate. In
and decompressed particles in that region. We can clearly other words, the decompressed velocities of the low-speed
observe that the decompressed data with absolute error particles are invalid under the compression with absolute
bound leads to much larger discrepancies of the speed error bound. This can be explained clearly using Figure
angles from the original particles than the decompressed 16 (b), which shows the relative errors of the low-speed
data with relative error bound does. The reason is that the particles under the two types of compressions. Specifically,
former suffers from much larger compression errors for the the maximum relative error could reach up to 45 times
close-to-zero velocity values as confirmed in Figure 16. as large as the data value under the absolute-error-bound
based compression, while it is only about 50% in the worse
Figure 16 presents the absolute errors and relative errors case for our designed relative-error-based compression.
for the two types of compressions respectively. We select
the first 10000 particles to demonstrate the compression We finally perform a parallel experiment with up to 2,048
errors and other particles exhibit the similar results. We also cores from a supercomputer [27] to assess the I/O perfor-
sorted the compression errors for the purpose of the simpler mance gain under our proposed compressor, as shown in
observation. Based on Figure 16 (a), it is observed that the Figure 17. Specifically, our experiment is weak-scaling: i.e.,
overall absolute errors under the two compression types are we lunched different numbers of ranks in parallel and let
both negligible compared to their data values for the high- each rank write/load simulation data in the same size such

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 13

improve the compression ratio based on the relative error


bound. The strategy A (called block-based strategy) splits
the whole data set into multiple small blocks and then
computes the real absolute error bound for each block,

degree
which is particularly effective on the compression of rel-
atively smooth data with multiple dimensions. Strategy
B (called multi-threshold-based strategy) splits the entire
data value range into multiple groups with exponential
sizes. This method is particularly effective on the data that
exhibits spiky value ranges (e.g., the 1D data arrays in N-
(a) REL ERR=0.1 (b) ABS ERR=6.2 body simulation). We evaluate our proposed compression
methods using two well-known datasets from the climate
Fig. 15. Accumulated speed angles between original particle and de- simulation and N-body simulation community respectively.
compressed particle (HACC) The key insights are summarized as follows:
4000 50
15 raw data
45
REL_ERR=0.1 REL_ERR=0.1 • Our relative error bound based compression strategies
ABS_ERR=6.2
Compression Error

10 ABS_ERR=6.2
3000 5
40 exhibit the best compression ratios than other state-
0 35
2000 -5 30 of-the-art compressors on both of the two datasets.
Data

-10 25
1000 -15
17 18 18 18 18 18 19 19 19 20 Specifically, the compression ratio of SZ is higher than
80 00 20 40 60 80 00 20 40
15
10
the second-best compressor (FPZIP) by 17.2–618% on
0
5 CESM-ATM climate simulation data and by 31–210%
-10000 0 1
10 20 30 40 50 60 70 80 90 10
00 00 00 00 00 00 00 00 00 00
0
78 180 182 184 186 188 190 192 194
0 0 0 0 0 0 0 0 0 on HACC particular simulation data.
Sorted Index Sorted Index
• Based on both CESM-ATM data and HACC simulation
(a) decompression data vs. raw data (b) relative errors data, we demonstrate that the relative error bound
based compression can effectively retain more details
Fig. 16. Analysis of compression errors based on HACC velocity)
than the absolute error bound based compression does,
without any degradation of overall data visualization.
that the total data size increases with the execution scales. • As for compression/decompression rate, FPZIP runs
We adopt the point-wise relative error bound of 0.1 for SZ the fastest on the compression based on relative error
and FPZIP, because such a setting already leads to a good bound, while SZ runs the fastest on the decompression
visual quality as we discussed above. For ZFP, we adopt from among all the compressors.
the precision of 20. As a comparison, the total I/O times of • Parallel experiments with up to 2,048 cores show that
writing the original data to the PFS is about 5600 seconds, SZ improves the I/O performance by 520-560% than
and the I/O time of reading data is about 5800 seconds, writing/reading the original HACC data, and by 21-
when running the parallel experiment with 2,048 cores. As 24% than the second best compressor FPZIP.
shown in Figure 17 (a), when using 2,048 cores, the total In the furture, we plan to customize more effective
time of dumping simulation data (i.e., compression time + strategies to further improve the compression ratio for the
writing compressed data) is only about 895 seconds when non-smooth data sets such as N-body simulation data.
equipped with our compressor, which means a throughput
improvement of 5.25X compared with the original data
writing time and an improvement of 24% than the second- ACKNOWLEDGMENTS
best solution using FPZIP. The data loading throughput can This research was supported by the Exascale Computing Project (ECP),
be improved by 560+% compared with reading the original
Project Number: 17-SC-20-SC, a collaborative effort of two DOE or-
dataset and by 21% than the solution employing the second
ganizations – the Office of Science and the National Nuclear Security
best compressor FPZIP, when using 2,048 cores.
Administration, responsible for the planning and preparation of a
3000 3000 capable exascale ecosystem, including software, applications, hard-
Elapsed Time(s)

Elapsed Time(s)

2500 2500
2000
compression
2000
compression ware, advanced system engineering and early testbed platforms, to
writing data writing data
1500 1500 support the nation’s exascale computing imperative. The material was
1000 1000 supported by the U.S. Department of Energy, Office of Science, under
500 500
contract DE-AC02-06CH11357, and supported by the National Science
0 0
Foundation under Grant No. 1619253.
ZF
FP
SZ IP

ZF
FP
SZ IP

ZF
FP
SZ IP

ZF
FP
SZ IP
ZF
FP
SZ IP

ZF
FP
SZ IP
P

P
P

P
Z

Z
Z

512 1024 2048 512 1024 2048


Number of Cores Number of Cores

(a) Dumping data to PFS (b) Loading data from PFS R EFERENCES
Fig. 17. Parallel performance evaluation using HACC (velocity fields)) [1] S. Habib, V. Morozov, N. Frontiere, H. Finkel, A. Pope and K.
Heitmann, “HACC: Extreme scaling and performance across di-
verse architectures,” in International Conference for High Performance
Computing, Networking, Storage and Analysis (SC13), Denver, CO,
7 C ONCLUSION AND F UTURE WORK pp. 1-10, 2013.
Point-wise relative error bound is significant to respect the [2] S. Di and F. Cappello, “Fast Error-bounded Lossy HPC Data
Compression with SZ,” In Proceedings of IEEE 30th International
multi-resolution demand duringlossy compression. In this Parallel and Distributed Processing Symposium (IPDPS16), pp. 730–
paper, we present two novel strategies that can significantly 739, 2016.

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2859932, IEEE
Transactions on Parallel and Distributed Systems
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH 2018 14

[3] D. Tao, S. Di, Z. Chen, and F. Cappello, “Significantly Improv- Compression on Climate Simulation Data. In ACM HPDC14, pages
ing Lossy Compression for Scientific Data Sets Based on Mul- 203-214, 2014.
tidimensional Prediction and Error-Controlled Quantization,” In [27] Bebop cluster. [Online]. Available at https://www.lcrc.anl.gov/
IEEE International Parallel and Distributed Processing Symposium systems/resources/bebop
IPDPS2017, Orlando, Florida, USA, May 29-June 2, pp. 1129–1139, [28] B. Welch, “POSIX IO extensions for HPC,” Proceedings of the 4th
2017. USENIX Conference on File and Storage Technologies (FAST05),
[4] P. Lindstrom. Fixed-Rate Compressed Floating-Point Arrays. IEEE 2005.
Trans. on Visualization and Computer Graphics, vol. 20, no. 12, pp. [29] R. Thakur, W. Gropp, E. Lusk, “On implementing MPI-IO portably
2674–2683, 2014. and with high performance,” Proceedings of the sixth workshop
[5] Gzip compression. [Online]. Available at http://www.gzip.org. on I/O in parallel and distributed systems, pages 23–32, 1999.
[6] BlosC compressor. [Online]. Available at http://blosc.org [30] A. Turner, “Parallel I/O Performance,” [Online]. Avail-
[7] J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data able at: https://www.archer.ac.uk/training/virtual/2017-02-08-
Compression. IEEE Trans. on Information Theory, vol. 23, no. 3, pp. Parallel-IO/2017 02 ParallelIO ARCHERWebinar.pdf.
337–343, 1977.
Sheng Di Sheng Di received his master’s degree
[8] P. Ratanaworabhan, J. Ke, and M. Burtscher. Fast Lossless Com-
from Huazhong University of Science and Tech-
pression of Scientific Floating-Point Data. in Proceedings of Data
nology in 2007 and Ph.D. degree from the Uni-
Compression Conference (DCC), pp. 133–142, 2006.
versity of Hong Kong in 2011. He is currently an
[9] B.E. Usevitch, “JPEG2000 compatible lossless coding of floating-
assistant computer scientist at Argonne National
point data,” in Journal on Image and Video Processing, vol. 2007, no.
Laboratory. Dr. Di’s research interest involves
1, pp. 22–22, 2007.
resilience on high-performance computing (such
[10] M. Burtscher and P. Ratanaworabhan, “High Throughput Com- as silent data corruption, optimization checkpoint
pression of Double-Precision Floating-Point Data,” In Data Com- model, and in-situ data compression) and broad
pression Conference (DCC’07), pp. 293–302, 2007. research topics on cloud computing (including
[11] N. Sasaki , K. Sato, T. Endo, and S. Matsuoka, “Exploration of optimization of resource allocation, cloud net-
Lossy Compression for Application-level Checkpoint/Restart,” work topology, and prediction of cloud workload/hostload). He is working
In Proceedings of IEEE 29th International Parallel and Distributed on multiple HPC projects, such as detection of silent data corruption,
Processing Symposium (IPDPS15), pp. 914–922, 2015. characterization of failures and faults for HPC systems, and optimization
[12] Z. Chen, S.W. Son, W. Hendrix, A. Agrawal, W. Liao, and of multilevel checkpoint models. Contact him at [email protected].
A. Choudhary, “NUMARCK: Machine Learning Algorithm for
Resiliency and Checkpointing,” In Proceedings of IEEE/ACM Su- Dingwen Tao Dingwen Tao received his bach-
percomputing (SC14), pp. 733–744, 2014. elor’s degree from University of Science and
[13] S. Di and F. Cappello, “Optimization of Error-Bounded Lossy Technology of China in 2013 and will receive
Compression for Hard-to-Compress HPC Data,” in IEEE Trans- his Ph.D. degree from University of California,
actions on Parallel and Distributed Systems (TPDS), vol. 29, no. 1, Riverside in 2018. His research interest includes
pp. 129–143, Jan. 1 2018. parallel and distributed systems, high perfor-
[14] A. Omeltchenko, T.J. Campbell, R.K. Kalia, X. Liu, A. Nakano, and mance computing, big data analytic, resilience,
P. Vashishta, “Scalable I/O of large-scale molecular dynamics sim- data compression, and so on. Contact him at
ulations: A data-compression algorithm,” in Journal of Computer [email protected].
Physics Communications (CPC), 131(1):78–85, 2000.
[15] D. Tao, S. Di, Z. Chen, F. Cappello, “In-Depth Exploration of
Single-Snapshot Lossy Compression Techniques for N-Body Sim- Xin Liang Xin Liang received his bachelors
ulations,” in IEEE International Conference on Big Data (BigData17), degree from Peking University in 2014 and
2017. will receive his Ph.D. degree from University
[16] D. Lee, A. Sim, J. Choi and K. Wu, “Novel Data Reduction Based of California, Riverside in 2019. His research
on Statistical Similarity,” in Proceedings of the 28th International Con- interest includes parallel and distributed sys-
ference on Scientific and Statistical Database Management (SSDBM16), tems, fault tolerance, high-performance comput-
pp. 21:1–21:12, ACM, New York, USA, 2016. ing, large-scale machine learning, big data anal-
[17] S. Lakshminarasimhan, N. Shah, S. Ethier, S. Klasky, R. Latham, ysis and quantum computing. Contact him at
R. Ross, and N.F. Samatova, “Compressing the Incompressible [email protected].
with ISABELA: In-situ Reduction of Spatio-temporal Data,” In
17th Euro-Par11, pages 366–379, 2011.
[18] p. Lindstrom and M. Isenburg, “Fast and Efficient Compression Franck Cappello Franck Cappello is a program
of Floating-Point Data,” IEEE Trans. on Visualization and Computer manager and senior computer scientist at ANL.
Graphics, vol. 12, no. 5, pp. 1245–1250, 2006. Before moving to ANL, he held a joint position
[19] Community Earth Simulation Model (CESM). [Online]. Available at Inria and the University of Illinois at Urbana
at https://www2.cesm.ucar.edu/. Champaign, where he initiated and co-directed
[20] Gzip. [Online]. Available at https://zlib.net from 2009 the Inria Illinois-ANL Joint Laboratory
[21] S. Paul and B. Bandyopadhyay, “A novel approach for image com- on Petascale Computing. Until 2008, he led a
pression based on multi-level image thresholding using Shannon team at Inria, where he initiated the XtremWeb
Entropy and Differential Evolution,” in IEEE Students’ Technology (Desktop Grid) and MPICH-V (fault-tolerant MPI)
Symposium (TechSym), Kharagpur, 2014, pp. 56-61. projects. From 2003 to 2008, he initiated and
[22] H. Rekha and P. Samundiswary, “Image compression using mul- directed the Grid5000 project, a nationwide com-
tilevel thresholding based Absolute Moment Block Truncation puter science platform for research in large-scale distributed systems.
Coding for WSN,” in International Conference on Wireless Commu- He has authored papers in the domains of fault tolerance, high-
nications, Signal Processing and Networking (WiSPNET), Chennai, performance computing, Grids and contributed to more than 70 program
2016, pp. 396-400. committees. He is an editorial board member of the IEEE Transactions
on Parallel and Distributed Systems, the International Journal on Grid
[23] S. Grace Chang, B. Yu, and M. Vetterli, “Adaptive Wavelet Thresh-
Computing, the Journal of Grid and Utility Computing, and the Journal
olding for Image Denoising and Compression,” in IEEE Trans. on
of Cluster Computing. He is/was Program co-chair for IEEE CCGRID
Image Processing, vol. 9, no. 9, pp. 1532–1546, 2000.
2017, Award chair for ACM/IEEE SC15, Program co-chair for ACM
[24] M. Cheriet, J. N. Said, and C. Y. Suen, “A Recursive Threshold- HPDC2014, Test of time award chair for IEEE/ACM SC13, Tutorial
ing Technique for Image Segmentation,” in IEEE Trans. on Image co-chair of IEEE/ACM SC12, Technical papers co-chair at IEEE/ACM
Processing, vol. 7, no. 7, pp. 918–921, 1998. SC11, Program chair of HiPC2011, pPogram co-chair of IEEE CCGRID
[25] Hurrican ISABEL simulation data. [Online]. Available at 2009, Program Area co-chair IEEE/ACM SC09, General chair of IEEE
http://vis.computer.org/vis2004contest/data.html HPDC 2006. He is IEEE Fellow and a member of the ACM. Contact him
[26] A.H. Baker, H. Xu, J.M. Dennis, M.N. Levy, D. Nychka, and S.A. at [email protected].
Mickelson. A Methodology for Evaluating the Impact of Data

1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like