Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas to_hdf() Method

Quiz

The to_hdf() method in Python's Pandas library allows you to store a DataFrame in an HDF5 file. HDF5 stands for Hierarchical Data Format version 5, is a high-performance data format that supports large-scale data storage and efficient reading/writing of datasets. Using this method, you can save Series or DataFrames to disk in an organized and highly compressed manner.

HDF5 files are widely used for handling scientific data, where large datasets need to be stored and accessed efficiently. With the to_hdf() method, you can choose storage formats, compression levels, appending modes, and key names for saving pandas object, ensuring compatibility with various analytical workflows.

Syntax

The syntax of the to_hdf() method is as follows −

DataFrame.to_hdf(path_or_buf, *, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')

When using the to_hdf() method on a Series object, you should call it as Series.to_hdf().

Parameters

The Python Pandas to_hdf() method accepts the below parameters −

path_or_buf: File path or HDFStore object where the HDF5 file will be saved.
key: Identifier for the group in the HDF5 file.
mode: Specifies the mode to open the file. Common values are 'w' (write), 'a' (append), and 'r+' (read/write).
format: The storage format, either 'fixed' (default, fast) or 'table' (slower but more flexible).
index: Boolean indicating whether to include the DataFrames index in the file. Defaults to True.
complevel: Specifies the compression level (0-9). Higher values mean more compression but slower performance.
complib: Specifies the compression library to use, such as 'zlib', 'bzip2', 'lzo', or 'blosc'.
append: Boolean indicating whether to append data to an existing HDF5 file or replace the dataset with new data. Defaults to False.
**kwargs: Additional keyword arguments passed to the HDF5 writer.

Return Value

The to_hdf() method does not return a value. It writes the DataFrame to the specified HDF5 file.

Example: Saving a DataFrame to an HDF5 File

This example demonstrates how to save a Pandas DataFrame to an HDF5 file using the to_hdf() method.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Save DataFrame to HDF5 file
df.to_hdf('data.h5', key='dataset')

print("DataFrame saved to 'data.h5' successfully...")

Following is an output of the above code −

DataFrame saved to 'data.h5' successfully...

When you run the above code, the DataFrame is saved in the HDF5 file named 'data.h5' under the 'dataset' key.

Example: Saving DataFrame to Compressed HDF5

This example demonstrates how to save a DataFrame to an HDF5 file with compression. Here we will specify the "zlib" compression.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Score': [88, 92, 95]}
df = pd.DataFrame(data)

# Save with compression
df.to_hdf('compressed_data.h5', key='dataset', mode='w', complevel=5, complib='zlib')

print("DataFrame saved with compression to 'compressed_data.h5'")

This will save the DataFrame to 'compressed_data.h5' with zlib compression at level 5, and returns the following message −

DataFrame saved with compression to 'compressed_data.h5'

Example: Appending Pandas Data to an Existing HDF5 File

This example shows how to append a new data in a DataFrame to an existing HDF5 file under the 'new_dataset' key. Here we will append the new data to the 'compressed_data.h5' file which is created in the above example.

import pandas as pd

# Create a new DataFrame
new_data = {'Name': ['Suman', 'Dev'], 'Score': [45, 76]}
new_df = pd.DataFrame(new_data)

# Append to existing HDF5 file
new_df.to_hdf('compressed_data.h5', key='new_dataset', mode='a')

print("DataFrame appended to 'compressed_data.h5'")

The above code appends the new DataFrame to 'compressed_data.h5' under the group 'new_dataset'.

DataFrame appended to 'compressed_data.h5'

Example: Saving a DataFrame in Table Format to an HDF5 File

The following example demonstrates saving a DataFrame in a table format of the HDF5 file.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'City': ['New Delhi', 'Chennai', 'Hyderabad'], 'Population': [8.4, 9.0, 13.9]})

# Save as a table format
df.to_hdf('table_data.h5', key='table', mode='w', format='table')

print("DataFrame saved in table format.")

Output of the above code is as follows −

DataFrame saved in table format.

Example: Specifying Columns for Querying while Saving to HDF5 file

This example demonstrates saving a DataFrame with specific columns set as data columns for querying while saving the DataFrame to the HDF5 file.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['Kiran', 'Priya', 'Naveen'], 
'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']})

# Save with data columns
df.to_hdf('queryable_data.h5', key='queryable', mode='w', format='table', data_columns=['Name'])

print("DataFrame saved with queryable columns.")

While executing the above code we obtain the following output −

DataFrame saved with queryable columns.

python_pandas_io_tool.htm

Print Page