A Comparison of Hdf5, Zarr, and Netcdf4 in Performing Common I/O Operations

This document compares the performance of three scientific file formats: HDF5, netCDF4, and Zarr, through a series of benchmarks measuring common I/O operations such as create, open, read, write, and close. The study highlights the characteristics and use cases of each format, emphasizing their self-describing capabilities and suitability for various scientific applications. The results aim to assist researchers in selecting the most efficient file format for their specific data storage needs.

Uploaded by

shzhouus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

A Comparison of Hdf5, Zarr, and Netcdf4 in Performing Common I/O Operations

Uploaded by

shzhouus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A Comparison of HDF5, Zarr, and netCDF4

in Performing Common I/O Operations

Sriniket Ambatipudi Suren Byna
Dulles High School Lawrence Berkeley National Laboratory
Houston, TX, USA Berkeley, CA, USA
Email: [email protected] Email: [email protected]

Abstract—Scientific data is often stored in files because of functionality for the storing and retrieval of data and metadata
arXiv:2207.09503v2 [cs.DC] 5 Feb 2023

the simplicity they provide in managing, transferring, and in files.

sharing data. These files are typically structured in a specific
arrangement and contain metadata to understand the structure
There are numerous file formats in existence, such as
the data is stored in. There are numerous file formats in use in HDF5 [4], netCDF4 [5], ROOT [6], Zarr [7], and many more
various scientific domains that provide abstractions for storing [8]. Each of these file formats exhibits different performance
and retrieving data. With the abundance of file formats aiming characteristics and was designed to accomplish a specific
to store large amounts of scientific data quickly and easily, a task. For example, the High Energy Physics (HEP) com-
question that arises is, “Which scientific file format is best suited
for a general use case?” In this study, we compiled a set of
munity developed the ROOT framework to meet the high-
benchmarks for common file operations, i.e., create, open, read, performance requirements for multithreaded read and write
write, and close, and used the results of these benchmarks to operations and support object-oriented programming. As a
compare three popular formats: HDF5, netCDF4, and Zarr. result of these design specifications, the ROOT framework is
Note: This paper is currently a work in progress, and our used by CERN in its research with the Large Hadron Collider
results are representative for a general-purpose use case in
[9]. On the other hand, file formats such as netCDF4 or HDF5
which datasets are small in size and are not using optimizations
such as HDF5 or netCDF4 chunking, asynchronous I/O, sub- are often used in more general use case scenarios because
filing. We welcome any comments or suggestions regarding the of their self-describing capabilities, which allow the storing
benchmark located at https://github.com/asriniket/ of metadata to describe the data within a file. Such self-
File-Format-Testing. describing capabilities allow these file formats to be used
in a multitude of applications, like how HDF5 is used in
G ENERAL T ERMS / K EYWORDS
astronomy, medicine, physics, and many more fields [4].
Scientific File Formats, HDF5, netCDF4, Zarr Because there are a multitude of file formats that are
available to store scientific data, the question of which
I. I NTRODUCTION
file format is best suited for a general use case arises.
With the rapid advancement of experiments, observations, Previous research [10, 11] has mainly focused on testing
and simulations in recent years, various domains of science the performance characteristics of individual file formats,
are producing enormous amounts of data. For example, the like how HDF5 was tested for its performance in reading a
Large Hadron Collider (LHC) experiments at CERN produce subset of a large array [10] or how netCDF4’s performance
90 petabytes of data per year [1]. NASA’s Climate Data characteristics were analyzed [11]. The first test revealed that
Services (CDS) simulate our planet’s weather and climate when working with an HDF5 file in Python, the fastest way to
models from hours to millennia and produce datasets up to read data is to memory map the file with NumPy, bypassing
petabytes in volume [2]. the HDF5 Python API (h5py). Memory mapping involves
Much of the data produced in scientific experiments, obser- mapping a file’s contents into memory, and this means that
vations, and simulations are stored in files with various for- data within a file can be accessed if the location of the data
mats. Scientific file formats offer a medium to store scientific in terms of an offset is known. The hierarchical structure
data for long-term processing, which is of great importance of file formats such as HDF5 means that accessing the
to researchers. Each file format has specific characteristics data within is relatively simple, as the file’s metadata stores
that make it suited for a particular use case, and this makes the location of individual datasets. This test was useful in
choosing the appropriate file format an important task. The analyzing the shortcomings of HDF5 in a particular use case,
typical content of scientific files includes data with a structure allowing potential users to reconsider whether the HDF5 file
and metadata describing the structure and the data within. format would be best suited for their use case. In the second
Data in these files are often structured as arrays [3]. The experiment, conducted by the HDF Group, the netCDF4 file
metadata that describes the data often contains the origins format was tested for its performance characteristics and then
of the data, configurations used in generating or collecting compared to its predecessor, netCDF3. The results of this
the data, and the location of the data in the file format for experiment showed that netCDF4 generally had slower write
easy access. As such, scientific file formats will often provide speeds than netCDF3, but it had faster read speeds due to
netCDF4’s use of the HDF5 library internally. It stores data in a manner similar to HDF5, with groups
This type of testing is useful for analyzing the performance serving as the overarching data structure. Within a group,
characteristics of one file format and its shortcomings in there can be other groups or variables. Variables are akin
specific use cases, but when the performance characteristics to HDF5 datasets. Unlike HDF5 datasets, netCDF4 variables
of one file format must be compared to the performance cannot be resized once they are created [15]. To circumvent
characteristics of other file formats, a benchmark offers this, variables can be declared with an unlimited size in
itself as a viable option because it allows for the objective a specified dimension. Similar to HDF5, netCDF4 is a
measurement of the speed at which each file format is able to self-describing file format, and this means that groups and
perform a specific task. Such a benchmark has the potential variables both contain metadata describing their contents.
to be a valuable asset to researchers, as it allows them Unlike its predecessor, netCDF3, netCDF4 uses HDF5 as its
to choose the file format that is not only suited for their backend, allowing it to achieve faster read times [11].
particular use case but also performs the best in comparison
to its alternatives. This allows the researchers to have ease C. Zarr
of access when storing and modifying data at fast speeds, Zarr is a file format that is designed to store large arrays of
allowing more time and effort to be put elsewhere in their data and is characterized by a .zarr file extension. Because
project. In this work, we developed a benchmark to compare it is based on NumPy, it is geared mainly towards Python
the read and write speeds of three multipurpose scientific users. Similar to both HDF5 and netCDF4, Zarr is also a
file formats (HDF5, netCDF4, and Zarr). This benchmark hierarchical, self-describing file format that has groups as
writes randomized data to a specified number of datasets the overarching file structure. Each group contains datasets,
within a file and measures the time taken to write the data to which are representative of multidimensional arrays of a
each dataset and the time taken to read the contents of each homogeneous data type. Furthermore, the API for this file
dataset, allowing objective comparisons to be drawn between format was designed to be similar to h5py (HDF5’s Python
the three file formats’ performance in different operations. API), and as a result, it includes functions based on h5py’s
In the remainder of the paper, we first provide a brief functions, namely the group creation function [16]. One
background in §II to the three file formats we used in our advantage to using Zarr is that it provides multiple options
evaluation in this study. In §III, we describe the read and to store data by allowing a user to store a file in memory,
write benchmarks we used in the evaluation. In §IV, we in the file system, or in other storage systems with a similar
provide details of the system we used for comparing and interface to the first two options [16].
evaluating the performance of the three file formats under
different workloads. III. B ENCHMARKS
As a benchmark is being used to compare the performance
II. BACKGROUND
of the file formats, the benchmark must only test features
A. HDF5 of the file format that are supported by all the file formats
HDF5, or the Hierarchical Data Format 5, is a file format being tested. To accomplish this task, we programmed our
designed to store a large amount of data in an organized benchmark in Python, and this means we will rely on the
manner. Typically characterized by a .hdf5 or .h5 file Python APIs for each file format being tested to perform
extension, this file format stores data in a manner very similar the requested operations. Our benchmark compares the time
to that of a file system. This file format’s primary data models taken to create a dataset, write data to a dataset, and finally
are groups and datasets. Groups are the overarching structure, open that dataset at a later time and read its contents. This
and they can hold other groups or datasets. Datasets store raw can be categorized into two main types of operations—the
data values of a specified data type and are usually stored writing operation and the reading operation. Both are very
within groups [12]. A feature of HDF5 is that it is able important features to test in a file format, as the end goal of
to store data consisting of different data types within the a file format is to store data for long-term processing. The
same file [13]. As mentioned earlier, this file format is self- faster write and read times are not only indicative of better
describing, meaning that all the groups and datasets within performance characteristics but also have a tangible effect
the file format contain metadata describing their contents. on the improvement of an end-user’s workflow, as less time
This allows for the data within the file to be mapped in would be spent performing operations that are not directly
memory, provided the API supports it. Generally speaking, relevant to the task at hand.
users use the HDF5 API to issue commands to a lower-level To allow for greater flexibility when benchmarking the file
driver, which is in charge of accessing the file and performing formats, we added a configuration system in which the user
the requested operations [14]. Because the file format is open is able to specify the testing parameters such as the number
source, there has been widespread API support across most of datasets to create within the file and the dimensions of
modern languages (Python, C++, and Java). the array that will be written to each dataset by editing a
.yaml configuration file. After the benchmark is done, the
B. netCDF4 program then stores the times taken across multiple trials
netCDF4 is a file format that is designed to store array- in a .csv file and plots the data in the .csv file with
oriented data and is characterized by a .netc file extension. matplotlib.pyplot to allow a user to make a definitive
comparison between the file formats being tested. Below, the IV. P ERFORMANCE E VALUATION
main operations, the write operation and the read operation,
will be discussed in-depth.

A. Experimental setup
A. Write Benchmark

The write operation is the first operation to be tested in

the benchmark. It creates files with the filename as specified The three file formats were tested on a computer running
in the configuration file and extensions .hdf5 for HDF5 Ubuntu 18.04.5 with an Intel(R) Xeon(R) Silver 4215R CPU,
files, .netc for netCDF4 files, and .zarr for Zarr files. 196 Gigabytes of RAM, and 960 Gigabytes of solid-state
The file is placed inside a folder named Files/, to help storage provided by a Micron 5200 Series SSD. The version
reduce clutter in the working directory. Taking information of h5py used to test the HDF5 file format was 3.6.0. The
from the configuration file, a sample data array is generated version of netCDF4 used to test the netCDF4 file format
with dimensions and length as specified. This sample data was 1.5.8. The version of zarr used to test the Zarr file
array consists of randomly-generated 32-bit floats. Then, the format was 2.11.0.
program creates a dataset within the file and writes the The benchmark parameters that were used in each run of
sample data array to the dataset. This process of generating a the test can be found in the tables to the right. Note that
sample data array, creating a dataset, and populating it with the Test Name parameter is automatically generated by the
the values from the sample data array is repeated until the benchmark and is used to create the generated plot’s title.
benchmark has created the number of datasets as specified
by the configuration file. After the file is populated with data,
the benchmark copies the file to a directory named Files Test Name 2048 Datasets of
Read/ and renames the file to avoid any caching effects [128] Elements
that may interfere with the read times. There are numerous File Name 2048-Vector
ways to mitigate such caching effects, such as waiting for Number Datasets 2048
an extended period of time, but simply moving the file to Number Elements [128]
another directory and renaming the file is the quickest and
easiest way to mitigate the effects of caching in interfering Test Name 2048 Datasets of
with the times taken to read from the file. The time taken to [128, 128] Elements
create all the datasets and populate them with data is divided File Name 2048-Matrix
by the number of datasets to find the average time taken to Number Datasets 2048
create and populate one dataset. Both of these times are then Number Elements [128, 128]
returned to the main program, where they are written to the
.csv output file. Test Name 2048 Datasets of
[128, 128, 128] Elements
File Name 2048-Tensor
B. Read Benchmark Number Datasets 2048
The benchmark now opens the copied file in the Files Number Elements [128, 128, 128]
Read/ directory and begins testing the read operations of
the three file formats. This operation consists of opening each Test Name 2048 Datasets of
dataset within the file and printing its contents to the standard [256] Elements
output. The time taken to open all the datasets and the time File Name 2048-Datasets
taken to read from all the datasets are once again divided Number Datasets 2048
by the number of datasets within the file to find the average Number Elements [256]
time taken to open and read one dataset. Both of these times
are then returned to the main program, where they are also Test Name 4096 Datasets of
written to the .csv output file. [256] Elements
This process of running the write operation benchmark and File Name 4096-Datasets
the read operation benchmark is then repeated multiple times Number Datasets 4096
in order to ensure the consistency of the data gathered. To Number Elements [256]
avoid filling up the disk with generated test files, the Files/
and Files Read/ directories are deleted between trials. Test Name 8192 Datasets of
Finally, the data from the .csv file are averaged out with [256] Elements
pandas and plotted with matplotlib.pyplot to allow File Name 8192-Datasets
for visualizing a comparison between the tested file formats Number Datasets 8192
in a given operation. Number Elements [256]
B. Data

(a) Create / Open Times (b) Read / Write Times

Fig. 5: 4096 Datasets of [256] Elements

(a) Create / Open Times (b) Read / Write Times

Fig. 1: 2048 Datasets of [128] Elements

(a) Create / Open Times (b) Read / Write Times

Fig. 6: 8192 Datasets of [256] Elements

(a) Create / Open Times (b) Read / Write Times

C. Discussion

Fig. 2: 2048 Datasets of [128, 128] Elements Figure 1 shows the results when 2,048 datasets are created
and populated with a one-dimensional array containing 128
32-bit floats. This graph shows that the time taken to create
and open a dataset in netCDF4 is much faster than that of
HDF5’s or Zarr’s. In comparison to Zarr, HDF5 takes less
time to create a dataset, but it takes slightly more time to open
a dataset. When it comes to writing to datasets or reading
from datasets, HDF5 and Zarr share very similar times in
both operations, with netCDF4 trailing by a large margin.
Figure 2 shows the results when 2,048 datasets are created
and populated with a two-dimensional array containing 128
elements in each dimension. The results from this test are
almost identical to the results from the previous test, both
(a) Create / Open Times (b) Read / Write Times in terms of the trend and the time taken to complete each
operation.
Fig. 3: 2048 Datasets of [128, 128, 128] Elements
Figure 3 shows the results when 2,048 datasets are created
and populated with a three-dimensional array containing 128
elements in each dimension. The results from this test follow
the same trend as the past two tests, but the times taken to
complete each operation are almost double the times taken
in the past two tests.
These past three bar graphs show the tests in which the
number of datasets is held constant while increasing the
number of dimensions in the data array, but the next three bar
graphs will involve increasing the number of datasets while
holding the size of the data array constant in order to measure
the effect of increasing the number of datasets on file-format
(a) Create / Open Times (b) Read / Write Times performance.
Figure 4 shows the results when 2,048 datasets are created
Fig. 4: 2048 Datasets of [256] Elements and populated with a one-dimensional array containing 256
elements. The results from this test mirror those from Figure subset of a dataset, overwriting a dataset). The code for the
1, and this is to be expected as the number of datasets in benchmark can be found here: https://github.com/
both tests is the same, with the size of each dataset varying asriniket/File-Format-Testing. We are evaluat-
slightly. ing further the overheads of using the Python API on the
Figure 5 shows the results when 4,096 datasets are created observed performance as well as the impact of caching.
and populated with a one-dimensional array containing 256 Considering the small size of the data, the observed results
elements, and Figure 6 shows the results when 8,192 datasets may have been impacted by caching. This caching effect will
are created and populated with a one-dimensional array typically be reduced when the data sizes are in gigabytes
containing 256 elements. Both graphs are almost identical to (GB). We also note that many applications work with smaller
Figure 4, meaning that the number of datasets most likely has amounts of data that we used in this study. We encourage
no impact on the average time taken to perform the operations readers to try out the benchmarks provided in the GitHub
requested. repository and contribute any optimizations.

D. Write Benchmark Discussion VI. ACKNOWLEDGMENTS

The results of this benchmark show that a general trend This effort was supported in part by the U.S. Department
is that when creating a dataset, netCDF4 takes the least time of Energy (DOE), Office of Science, Office of Advanced Sci-
and is followed by HDF5, which is followed by Zarr. entific Computing Research (ASCR) under contract number
When actually writing data to a file, HDF5 takes the least DE-AC02-05CH11231 with LBNL.
time to write data to a dataset and is followed by Zarr, R EFERENCES
which is followed by netCDF4—taking on average more than
double the time of HDF5. [1] CERN. CERN - Storage. URL: https : / / home . cern /
science/computing/storage.
E. Read Benchmark Discussion [2] NASA. NASA’s Climate Data Services (CDS). URL:
https : / / www. nccs . nasa . gov / services / climate - data -
The read benchmarks show results similar to those from
services.
the write benchmark. netCDF4 takes the least time to open a
[3] Arie Shoshani and Doron Rotem. “Scientific data man-
dataset and is followed by Zarr, which is followed by HDF5.
agement. Challenges, technology, and development”.
When reading the data by printing the dataset values to the
In: Scientific Data Management: Challenges, Technol-
standard output, HDF5 takes the least time to read a dataset
ogy, and Deployment (Dec. 2009). DOI: 10 . 1201 /
and is followed by Zarr, which is followed by netCDF4.
9781420069815.
V. C ONCLUSIONS [4] HDFGroup. The HDF5® Library & File Format -
The HDF Group. URL: https : / / www. hdfgroup . org /
In this paper, we demonstrated a method in which the
solutions/hdf5/.
performance of a file format can be compared to that of
[5] UCAR/Unidata. Unidata — NetCDF. URL: https : / /
another file format through the running of a benchmark that
www.unidata.ucar.edu/software/netcdf/.
tests performance in operations like create, open, read, write,
[6] CERN. ROOT: analyzing petabytes of data, scientifi-
and close. This paper focused specifically on benchmarking
cally. URL: https://root.cern/.
three file formats: HDF5, netCDF4, and Zarr, as these three
[7] Zarr Developers. Zarr. URL: https://zarr.readthedocs.
file formats are considered to be general-purpose scientific
io/en/stable/.
file formats due to their storing of various types of data in a
[8] Wikipedia. List of file formats. URL: https : / / en .
hierarchical manner, similar to a file system.
wikipedia . org / wiki / List of file formats # Scientific
The benchmark was conducted in Python due to the lan-
data (data exchange).
guage’s widespread use in numerous scientific applications,
[9] Barbara Warmbein. Big data takes ROOT. URL: https:
and as such, the Python API for each file format was tested.
// home .cern /news /news /computing /big - data - takes -
To determine the performance of a file format, the time taken
root.
to create a dataset, write data to the dataset, open the dataset
[10] Cyrille Rossant. Moving away from HDF5. URL: https:
once the file is closed, and read data from the dataset was
//cyrille.rossant.net/moving-away-hdf5/.
measured and plotted in a bar graph. The results of the
[11] Choonghwan Lee, MuQun Yang, and Ruth Aydt.
benchmark show that HDF5 is fastest in reading or writing to
NetCDF-4 Performance Report. URL: https://support.
a dataset, netCDF4 is fastest in creating or opening a dataset,
hdfgroup . org / pubs / papers / 2008 - 06 netcdf4 perf
and Zarr generally trails right behind HDF5 in performance.
report.pdf.
Future work for this benchmark would include: expanding
[12] Leah A. Wasser. Hierarchical Data Formats - What is
support to other programming languages, as this would
HDF5? URL: https://www.neonscience.org/resources/
reveal any potential bottlenecks within the language-specific
learning-hub/tutorials/about-hdf5.
API for a file format; testing more file formats in order
[13] HDFGroup. Introduction to HDF5. URL: https://portal.
to better determine which file format is the fastest; and
hdfgroup.org/display/HDF5/Introduction+to+HDF5.
testing more aspects of a file format, which may include
testing performance in specific scenarios (i.e., reading a small
[14] HDFGroup. Chapter 3: The HDF5 File. URL: https:
/ / support . hdfgroup . org / HDF5 / doc / UG / FmSource /
08 TheFile favicon test.html.
[15] UCAR/Unidata. NetCDF4 API Documentation. URL:
https://unidata.github.io/netcdf4-python/.
[16] Zarr Developers. Zarr. URL: https://zarr.readthedocs.
io/en/stable/tutorial.html#groups/.

HDF5 Users Guide
No ratings yet
HDF5 Users Guide
342 pages
Unit - 3
No ratings yet
Unit - 3
91 pages
Structure Versioning For PyTables
100% (2)
Structure Versioning For PyTables
17 pages
Introduction To HDF5: HDF & HDF-EOS Workshop XII October 15, 2008
No ratings yet
Introduction To HDF5: HDF & HDF-EOS Workshop XII October 15, 2008
80 pages
HDF5 tutorialNUG2010
No ratings yet
HDF5 tutorialNUG2010
112 pages
Data Science Formats Beyond CSV and Hdfs
No ratings yet
Data Science Formats Beyond CSV and Hdfs
54 pages
Parallel Io hdf5
No ratings yet
Parallel Io hdf5
53 pages
Characterizing HEC Storage Systems at Rest
No ratings yet
Characterizing HEC Storage Systems at Rest
33 pages
HDF5 and H5py
No ratings yet
HDF5 and H5py
26 pages
Aggregation of Digital Forensics Evidences
100% (1)
Aggregation of Digital Forensics Evidences
4 pages
Chapter2 2
No ratings yet
Chapter2 2
27 pages
Hive Notes
No ratings yet
Hive Notes
26 pages
Python For Netcdf
No ratings yet
Python For Netcdf
17 pages
Computers 13 00139
No ratings yet
Computers 13 00139
26 pages
Ch2 PDF Slides
No ratings yet
Ch2 PDF Slides
26 pages
HDF5 Intro
No ratings yet
HDF5 Intro
25 pages
File Types in Data Engineering!
No ratings yet
File Types in Data Engineering!
18 pages
OpenStreetMap in GIScience Experiences, Research, and Applications (Jamal Jokar Arsanjani, Alexander Zipf Etc.)
No ratings yet
OpenStreetMap in GIScience Experiences, Research, and Applications (Jamal Jokar Arsanjani, Alexander Zipf Etc.)
324 pages
Electronics: Performance Evaluations of Distributed File Systems For Scientific Big Data in FUSE Environment
No ratings yet
Electronics: Performance Evaluations of Distributed File Systems For Scientific Big Data in FUSE Environment
16 pages
Top and Bottom in Dsuc
No ratings yet
Top and Bottom in Dsuc
7 pages
HDF5 in Python - The Future of Large Dataset Storage
No ratings yet
HDF5 in Python - The Future of Large Dataset Storage
11 pages
Recording Performances of Some File Types For Pandas Data. DOI-10.31590-Ejosat.1103499-2374400
No ratings yet
Recording Performances of Some File Types For Pandas Data. DOI-10.31590-Ejosat.1103499-2374400
6 pages
Amber Trajectory Netcdf Convention Version 1.0, Revision B: John Mongan (Jmongan@Mccammon - Ucsd.Edu) 20Th February 2006
No ratings yet
Amber Trajectory Netcdf Convention Version 1.0, Revision B: John Mongan (Jmongan@Mccammon - Ucsd.Edu) 20Th February 2006
8 pages
p148 Zeng
No ratings yet
p148 Zeng
14 pages
How To Choose The Right Data Storage Format For Your Measurement System
No ratings yet
How To Choose The Right Data Storage Format For Your Measurement System
6 pages
Hadoop File Formats - YoussefEtman
No ratings yet
Hadoop File Formats - YoussefEtman
8 pages
Mod 1
No ratings yet
Mod 1
8 pages
File Formats in Big Data
No ratings yet
File Formats in Big Data
13 pages
Data File Structure
No ratings yet
Data File Structure
2 pages
Metadata PUNCH Oelkers
No ratings yet
Metadata PUNCH Oelkers
10 pages
70014196
No ratings yet
70014196
81 pages
Normal Forms
No ratings yet
Normal Forms
19 pages
Netcdf (Network Common Data Format)
No ratings yet
Netcdf (Network Common Data Format)
2 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
63 pages
Book Database
No ratings yet
Book Database
56 pages
Advanced Scientific Data Format: Improvements To FITS
No ratings yet
Advanced Scientific Data Format: Improvements To FITS
1 page
Difference Between Data Science and Machine Learning
No ratings yet
Difference Between Data Science and Machine Learning
5 pages
File Types
No ratings yet
File Types
1 page
DECLARATION
No ratings yet
DECLARATION
30 pages
Self-Expression Through Fashion Presentation Purple Variant
No ratings yet
Self-Expression Through Fashion Presentation Purple Variant
26 pages
Module 1 - Modern Data Ecosystem and The Role of Data Analytics
No ratings yet
Module 1 - Modern Data Ecosystem and The Role of Data Analytics
2 pages
SIT Exam Portions: Questions For Dbms
No ratings yet
SIT Exam Portions: Questions For Dbms
4 pages
BLOCHAIN
No ratings yet
BLOCHAIN
6 pages
Big Data Analytics - Unit 2
No ratings yet
Big Data Analytics - Unit 2
10 pages
Database Definition
No ratings yet
Database Definition
8 pages
Fault Detection Analysis Using Data Mining Techniques For A Cluster of Smart Office Buildings
No ratings yet
Fault Detection Analysis Using Data Mining Techniques For A Cluster of Smart Office Buildings
15 pages
Edukuron Data Engineering
No ratings yet
Edukuron Data Engineering
10 pages
DBMS and SQL Questions For Interview
No ratings yet
DBMS and SQL Questions For Interview
10 pages
Emotion and Sentiment Analysis - A Practitioner's Guide To NLP - KDnuggets
No ratings yet
Emotion and Sentiment Analysis - A Practitioner's Guide To NLP - KDnuggets
9 pages
Online Examination System Project
No ratings yet
Online Examination System Project
5 pages
Page 1 of 3
No ratings yet
Page 1 of 3
3 pages
Shami Shaji
No ratings yet
Shami Shaji
2 pages
The Design of Cross-Border E-Commerce Recommendation System Based On Big Data Technology
No ratings yet
The Design of Cross-Border E-Commerce Recommendation System Based On Big Data Technology
4 pages
HPI Sec-Eng General Thesis Posting GraphAnalytics
No ratings yet
HPI Sec-Eng General Thesis Posting GraphAnalytics
6 pages
Omkaram Fresher Resume
No ratings yet
Omkaram Fresher Resume
2 pages
Voucher
No ratings yet
Voucher
1 page
Eric Brandon Rhoads
No ratings yet
Eric Brandon Rhoads
4 pages
Korr 2 J Lqwahrg LHSK Q5 RFZ
No ratings yet
Korr 2 J Lqwahrg LHSK Q5 RFZ
1 page
1.2. Book Contributions - Reproducible Machine Learning For Credit Card Fraud Detection - Practical Handbook
No ratings yet
1.2. Book Contributions - Reproducible Machine Learning For Credit Card Fraud Detection - Practical Handbook
1 page
Om CV PDF
No ratings yet
Om CV PDF
1 page
Santosh Mishra's Resume
No ratings yet
Santosh Mishra's Resume
1 page
Fortran Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
Fortran Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Managing Multimedia and Unstructured Data in the Oracle Database
From Everand
Managing Multimedia and Unstructured Data in the Oracle Database
Marcelle Kratochvil
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Efficient Data Processing with Apache Pig: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Processing with Apache Pig: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Apache Hudi for Scalable Data Lakes: The Complete Guide for Developers and Engineers
From Everand
Apache Hudi for Scalable Data Lakes: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Go File Handling for New Coders: A Practical Guide with Examples
From Everand
Go File Handling for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Java File Handling Step by Step: A Practical Guide with Examples
From Everand
Java File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Applied Hudi Systems: Definitive Reference for Developers and Engineers
From Everand
Applied Hudi Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
HDF5 Data Architecture and Programming Guide: Definitive Reference for Developers and Engineers
From Everand
HDF5 Data Architecture and Programming Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Python File Handling Made Easy: A Practical Guide with Examples
From Everand
Python File Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
From Everand
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IPFS Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
IPFS Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
From Everand
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming Filesystems with FUSE: Definitive Reference for Developers and Engineers
From Everand
Programming Filesystems with FUSE: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fedora Linux Essentials: Definitive Reference for Developers and Engineers
From Everand
Fedora Linux Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Delphi Programming Essentials: A Comprehensive Guide to Rapid Application Development
From Everand
Delphi Programming Essentials: A Comprehensive Guide to Rapid Application Development
Robert Johnson
No ratings yet

A Comparison of Hdf5, Zarr, and Netcdf4 in Performing Common I/O Operations

Uploaded by

A Comparison of Hdf5, Zarr, and Netcdf4 in Performing Common I/O Operations

Uploaded by

A Comparison of HDF5, Zarr, and netCDF4

in Performing Common I/O Operations

the simplicity they provide in managing, transferring, and in files.

The write operation is the first operation to be tested in

(a) Create / Open Times (b) Read / Write Times

Fig. 5: 4096 Datasets of [256] Elements

(a) Create / Open Times (b) Read / Write Times

Fig. 1: 2048 Datasets of [128] Elements

(a) Create / Open Times (b) Read / Write Times

Fig. 6: 8192 Datasets of [256] Elements

(a) Create / Open Times (b) Read / Write Times

D. Write Benchmark Discussion VI. ACKNOWLEDGMENTS

You might also like