
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas to_hdf() Method
The to_hdf() method in Python's Pandas library allows you to store a DataFrame in an HDF5 file. HDF5 stands for Hierarchical Data Format version 5, is a high-performance data format that supports large-scale data storage and efficient reading/writing of datasets. Using this method, you can save Series or DataFrames to disk in an organized and highly compressed manner.
HDF5 files are widely used for handling scientific data, where large datasets need to be stored and accessed efficiently. With the to_hdf() method, you can choose storage formats, compression levels, appending modes, and key names for saving pandas object, ensuring compatibility with various analytical workflows.
Syntax
The syntax of the to_hdf() method is as follows −
DataFrame.to_hdf(path_or_buf, *, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')
When using the to_hdf() method on a Series object, you should call it as Series.to_hdf().
Parameters
The Python Pandas to_hdf() method accepts the below parameters −
path_or_buf: File path or HDFStore object where the HDF5 file will be saved.
key: Identifier for the group in the HDF5 file.
mode: Specifies the mode to open the file. Common values are 'w' (write), 'a' (append), and 'r+' (read/write).
format: The storage format, either 'fixed' (default, fast) or 'table' (slower but more flexible).
index: Boolean indicating whether to include the DataFrames index in the file. Defaults to True.
complevel: Specifies the compression level (0-9). Higher values mean more compression but slower performance.
complib: Specifies the compression library to use, such as 'zlib', 'bzip2', 'lzo', or 'blosc'.
append: Boolean indicating whether to append data to an existing HDF5 file or replace the dataset with new data. Defaults to False.
**kwargs: Additional keyword arguments passed to the HDF5 writer.
Return Value
The to_hdf() method does not return a value. It writes the DataFrame to the specified HDF5 file.
Example: Saving a DataFrame to an HDF5 File
This example demonstrates how to save a Pandas DataFrame to an HDF5 file using the to_hdf() method.
import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file df.to_hdf('data.h5', key='dataset') print("DataFrame saved to 'data.h5' successfully...")
Following is an output of the above code −
DataFrame saved to 'data.h5' successfully...
When you run the above code, the DataFrame is saved in the HDF5 file named 'data.h5' under the 'dataset' key.
Example: Saving DataFrame to Compressed HDF5
This example demonstrates how to save a DataFrame to an HDF5 file with compression. Here we will specify the "zlib" compression.
import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Score': [88, 92, 95]} df = pd.DataFrame(data) # Save with compression df.to_hdf('compressed_data.h5', key='dataset', mode='w', complevel=5, complib='zlib') print("DataFrame saved with compression to 'compressed_data.h5'")
This will save the DataFrame to 'compressed_data.h5' with zlib compression at level 5, and returns the following message −
DataFrame saved with compression to 'compressed_data.h5'
Example: Appending Pandas Data to an Existing HDF5 File
This example shows how to append a new data in a DataFrame to an existing HDF5 file under the 'new_dataset' key. Here we will append the new data to the 'compressed_data.h5' file which is created in the above example.
import pandas as pd # Create a new DataFrame new_data = {'Name': ['Suman', 'Dev'], 'Score': [45, 76]} new_df = pd.DataFrame(new_data) # Append to existing HDF5 file new_df.to_hdf('compressed_data.h5', key='new_dataset', mode='a') print("DataFrame appended to 'compressed_data.h5'")
The above code appends the new DataFrame to 'compressed_data.h5' under the group 'new_dataset'.
DataFrame appended to 'compressed_data.h5'
Example: Saving a DataFrame in Table Format to an HDF5 File
The following example demonstrates saving a DataFrame in a table format of the HDF5 file.
import pandas as pd # Create a DataFrame df = pd.DataFrame({'City': ['New Delhi', 'Chennai', 'Hyderabad'], 'Population': [8.4, 9.0, 13.9]}) # Save as a table format df.to_hdf('table_data.h5', key='table', mode='w', format='table') print("DataFrame saved in table format.")
Output of the above code is as follows −
DataFrame saved in table format.
Example: Specifying Columns for Querying while Saving to HDF5 file
This example demonstrates saving a DataFrame with specific columns set as data columns for querying while saving the DataFrame to the HDF5 file.
import pandas as pd # Create a DataFrame df = pd.DataFrame({'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}) # Save with data columns df.to_hdf('queryable_data.h5', key='queryable', mode='w', format='table', data_columns=['Name']) print("DataFrame saved with queryable columns.")
While executing the above code we obtain the following output −
DataFrame saved with queryable columns.