Skip to content

Pandas hdf functions should support the hdf5 ExternalLink functionality when reading/writing. #6019

@jasonbrent

Description

@jasonbrent

When attempting to use pandas.read_hdf() to read a link from an h5py created data file with an ExternalLink to an HDFStore() created entry, the following backtrace is received:

/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/pandas/io/pytables.pyc in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
618 # create the storer and axes
619 where = _ensure_term(where)
--> 620 s = self._create_storer(group)
621 s.infer_axes()
622

/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/pandas/io/pytables.pyc in _create_storer(self, group, format, value, append, **kwargs)
1119 )
1120
-> 1121 pt = _ensure_decoded(getattr(group._v_attrs, 'pandas_type', None))
1122 tt = _ensure_decoded(getattr(group._v_attrs, 'table_type', None))
1123

/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/tables/link.pyc in getattr(self, name)
77 def getattr(self, name):
78 raise KeyError("you cannot get attributes from this "
---> 79 "%s instance" % self.class.name)
80
81 def setattr(self, name, value):

KeyError: 'you cannot get attributes from this NoAttrs instance'

In this example, 'store.h5' was created with HDFStore() and a single Series was stored in it at the location '/banana'. h5py.File was then used to create 'external.h5' with a single entry for /external that pointed to store.h5:/banana using h5py.ExternalLink.

In my use case, I had written code to parse and store some complex data and associated metadata in hdf5 using h5py. My intention was then to read that raw data in using pandas and then re-store the cooked data using HDFStore.

Unfortunately, pandas HDFStore() did not like the metadata in my h5py written file.

--snip--
/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/tables/attributeset.py:294: DataTypeWarning: Unsupported type for attribute 'is_key' in node 'some_node'. Offending HDF5 class: 8
value = self._g_getattr(self._v_node, name)
--snip--

I expected that I could work around this by simply storing the pandas content in a native HDFStore() written file and use ExternalLinks in the parent file. Unfortunately, that did not work properly either.

This is with pandas '0.13.0'.

Thanks!

-jbl

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions