-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Description
In pandas 0.10.0 I cannot use hdf5 compression when storing sparse series for which all values are "sparsified"/censored (i.e. the same as fill_value).
Compression works fine if there's at least one non-sparse value in each series of a sparse dataframe. And dataframes with series that are completely sparse can be stored in hdf5 without compression.
But combining the two, as in case 4 below, breaks:
import pandas as pd
import numpy as np
# make sparse dataframe
df = pd.DataFrame(np.random.binomial(n=1, p=.01, size=(1e4, 1e2))).to_sparse(fill_value=0)
# case 1: store uncompressed (works)
store1 = pd.HDFStore('sparse_uncompressed.h5')
store1['sparse_df'] = df
store1.close()
# case 2: store compressed (works)
store2 = pd.HDFStore('sparse_compressed.h5', complib='zlib', complevel=9)
store2['sparse_df'] = df
store2.close()
# set one series to be completely sparse
df[0] = np.zeros(1e4)
# case 3: store df with completely sparse series uncompressed (works)
store3 = pd.HDFStore('sparser_uncompressed.h5')
store3['sparse_df'] = df
store3.close()
# case 4: try storing df with completely sparse series compressed (fails)
store4 = pd.HDFStore('sparser_compressed.h5', complib='zlib', complevel=9)
store4['sparse_df'] = df
store4.close()
The resulting error comes from tables
:
ValueError: shape parameter cannot have zero-dimensions.
Metadata
Metadata
Assignees
Labels
No labels