Skip to content

Presence of softlink in HDF5 file breaks HDFStore.keys() #20523

Closed
@dworvos

Description

@dworvos

Code Sample, a copy-pastable example if possible

#! /path/to/python3.6

import pandas as pd

df = pd.DataFrame({ "a": [1], "b": [2] })
print(df.to_string())

hdf = pd.HDFStore("/tmp/test.hdf", mode="w")
hdf.put("/test/key", df)

#Brittle
hdf._handle.create_soft_link(hdf._handle.root.test, "symlink", "/test/key")
hdf.close()
print("Successful write")

hdf = pd.HDFStore("/tmp/test.hdf", mode="r")
'''
Traceback (most recent call last):
  File "snippet.py", line 31, in <module>
    print(hdf.keys())
  File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 529, in keys
    return [n._v_pathname for n in self.groups()]
  File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1077, in groups
    g for g in self._handle.walk_nodes()
  File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1078, in <listcomp>
    if (getattr(g._v_attrs, 'pandas_type', None) or
  File "python3.6.3/lib/python3.6/site-packages/tables/link.py", line 79, in __getattr__
    "`%s` instance" % self.__class__.__name__)
KeyError: 'you cannot get attributes from this `NoAttrs` instance'
'''
print(hdf.keys()) #causes exception
hdf.close()

print("Successful read")

Problem description

I know I have a esoteric problem, but I'm building an HDF5 file using Pandas and then using pytables to softlink to the Pandas dataframe. I understand this is unsupported and brittle but for my use case I haven't been able to come up with a better/simpler solution.

This issue is similar to: #6019

The root cause is when we call HDFStore.keys(), it calls HDFStore.groups() and eventually g._v_attrs on a Pytables File.

https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1076

But calling g._v_attrs on a tables.link.SoftLink causes a KeyError due to:

https://github.com/PyTables/PyTables/blob/develop/tables/link.py#L76

And there doesn't look to be a way to guard against an instance of NoAttrs since that class is defined within the method. One solution may be to check the instance of g if it's a Link

        return [
            g for g in self._handle.walk_nodes()
            if (not isinstance(g, _table_mod.link.Link) and
                (getattr(g._v_attrs, 'pandas_type', None) or
                 getattr(g, 'table', None) or
                (isinstance(g, _table_mod.table.Table) and
                 g._v_name != u('table'))))
        ]

I'd be happy to write a PR and tests if you find this change acceptable.

Expected Output

   a  b
0  1  2
Successful write
['/test/key']
Successful read

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf-8
LANG: en_US.utf-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions