Skip to content

hdf5 compression breaks on sparse series which have all values censored #2931

@kforeman

Description

@kforeman

In pandas 0.10.0 I cannot use hdf5 compression when storing sparse series for which all values are "sparsified"/censored (i.e. the same as fill_value).

Compression works fine if there's at least one non-sparse value in each series of a sparse dataframe. And dataframes with series that are completely sparse can be stored in hdf5 without compression.

But combining the two, as in case 4 below, breaks:

import pandas as pd
import numpy as np

# make sparse dataframe
df = pd.DataFrame(np.random.binomial(n=1, p=.01, size=(1e4, 1e2))).to_sparse(fill_value=0)

# case 1: store uncompressed (works)
store1 = pd.HDFStore('sparse_uncompressed.h5')
store1['sparse_df'] = df
store1.close()

# case 2: store compressed (works)
store2 = pd.HDFStore('sparse_compressed.h5', complib='zlib', complevel=9)
store2['sparse_df'] = df
store2.close()

# set one series to be completely sparse
df[0] = np.zeros(1e4)

# case 3: store df with completely sparse series uncompressed (works)
store3 = pd.HDFStore('sparser_uncompressed.h5')
store3['sparse_df'] = df
store3.close()

# case 4: try storing df with completely sparse series compressed (fails)
store4 = pd.HDFStore('sparser_compressed.h5', complib='zlib', complevel=9)
store4['sparse_df'] = df
store4.close()

The resulting error comes from tables:

ValueError: shape parameter cannot have zero-dimensions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions