Python Pandas to_hdf() Method



The to_hdf() method in Python's Pandas library allows you to store a DataFrame in an HDF5 file. HDF5 stands for Hierarchical Data Format version 5, is a high-performance data format that supports large-scale data storage and efficient reading/writing of datasets. Using this method, you can save Series or DataFrames to disk in an organized and highly compressed manner.

HDF5 files are widely used for handling scientific data, where large datasets need to be stored and accessed efficiently. With the to_hdf() method, you can choose storage formats, compression levels, appending modes, and key names for saving pandas object, ensuring compatibility with various analytical workflows.

Syntax

The syntax of the to_hdf() method is as follows −

DataFrame.to_hdf(path_or_buf, *, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')

When using the to_hdf() method on a Series object, you should call it as Series.to_hdf().

Parameters

The Python Pandas to_hdf() method accepts the below parameters −

  • path_or_buf: File path or HDFStore object where the HDF5 file will be saved.

  • key: Identifier for the group in the HDF5 file.

  • mode: Specifies the mode to open the file. Common values are 'w' (write), 'a' (append), and 'r+' (read/write).

  • format: The storage format, either 'fixed' (default, fast) or 'table' (slower but more flexible).

  • index: Boolean indicating whether to include the DataFrames index in the file. Defaults to True.

  • complevel: Specifies the compression level (0-9). Higher values mean more compression but slower performance.

  • complib: Specifies the compression library to use, such as 'zlib', 'bzip2', 'lzo', or 'blosc'.

  • append: Boolean indicating whether to append data to an existing HDF5 file or replace the dataset with new data. Defaults to False.

  • **kwargs: Additional keyword arguments passed to the HDF5 writer.

Return Value

The to_hdf() method does not return a value. It writes the DataFrame to the specified HDF5 file.

Example: Saving a DataFrame to an HDF5 File

This example demonstrates how to save a Pandas DataFrame to an HDF5 file using the to_hdf() method.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Save DataFrame to HDF5 file
df.to_hdf('data.h5', key='dataset')

print("DataFrame saved to 'data.h5' successfully...")

Following is an output of the above code −

DataFrame saved to 'data.h5' successfully...

When you run the above code, the DataFrame is saved in the HDF5 file named 'data.h5' under the 'dataset' key.

Example: Saving DataFrame to Compressed HDF5

This example demonstrates how to save a DataFrame to an HDF5 file with compression. Here we will specify the "zlib" compression.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Score': [88, 92, 95]}
df = pd.DataFrame(data)

# Save with compression
df.to_hdf('compressed_data.h5', key='dataset', mode='w', complevel=5, complib='zlib')

print("DataFrame saved with compression to 'compressed_data.h5'")

This will save the DataFrame to 'compressed_data.h5' with zlib compression at level 5, and returns the following message −

DataFrame saved with compression to 'compressed_data.h5'

Example: Appending Pandas Data to an Existing HDF5 File

This example shows how to append a new data in a DataFrame to an existing HDF5 file under the 'new_dataset' key. Here we will append the new data to the 'compressed_data.h5' file which is created in the above example.

import pandas as pd

# Create a new DataFrame
new_data = {'Name': ['Suman', 'Dev'], 'Score': [45, 76]}
new_df = pd.DataFrame(new_data)

# Append to existing HDF5 file
new_df.to_hdf('compressed_data.h5', key='new_dataset', mode='a')

print("DataFrame appended to 'compressed_data.h5'")

The above code appends the new DataFrame to 'compressed_data.h5' under the group 'new_dataset'.

DataFrame appended to 'compressed_data.h5'

Example: Saving a DataFrame in Table Format to an HDF5 File

The following example demonstrates saving a DataFrame in a table format of the HDF5 file.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'City': ['New Delhi', 'Chennai', 'Hyderabad'], 'Population': [8.4, 9.0, 13.9]})

# Save as a table format
df.to_hdf('table_data.h5', key='table', mode='w', format='table')

print("DataFrame saved in table format.")

Output of the above code is as follows −

DataFrame saved in table format.

Example: Specifying Columns for Querying while Saving to HDF5 file

This example demonstrates saving a DataFrame with specific columns set as data columns for querying while saving the DataFrame to the HDF5 file.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['Kiran', 'Priya', 'Naveen'], 
'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']})

# Save with data columns
df.to_hdf('queryable_data.h5', key='queryable', mode='w', format='table', data_columns=['Name'])

print("DataFrame saved with queryable columns.")

While executing the above code we obtain the following output −

DataFrame saved with queryable columns.
python_pandas_io_tool.htm
Advertisements