Description
Code Sample, a copy-pastable example if possible
#! /path/to/python3.6
import pandas as pd
df = pd.DataFrame({ "a": [1], "b": [2] })
print(df.to_string())
hdf = pd.HDFStore("/tmp/test.hdf", mode="w")
hdf.put("/test/key", df)
#Brittle
hdf._handle.create_soft_link(hdf._handle.root.test, "symlink", "/test/key")
hdf.close()
print("Successful write")
hdf = pd.HDFStore("/tmp/test.hdf", mode="r")
'''
Traceback (most recent call last):
File "snippet.py", line 31, in <module>
print(hdf.keys())
File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 529, in keys
return [n._v_pathname for n in self.groups()]
File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1077, in groups
g for g in self._handle.walk_nodes()
File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1078, in <listcomp>
if (getattr(g._v_attrs, 'pandas_type', None) or
File "python3.6.3/lib/python3.6/site-packages/tables/link.py", line 79, in __getattr__
"`%s` instance" % self.__class__.__name__)
KeyError: 'you cannot get attributes from this `NoAttrs` instance'
'''
print(hdf.keys()) #causes exception
hdf.close()
print("Successful read")
Problem description
I know I have a esoteric problem, but I'm building an HDF5 file using Pandas and then using pytables to softlink to the Pandas dataframe. I understand this is unsupported and brittle but for my use case I haven't been able to come up with a better/simpler solution.
This issue is similar to: #6019
The root cause is when we call HDFStore.keys(), it calls HDFStore.groups() and eventually g._v_attrs on a Pytables File.
https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1076
But calling g._v_attrs on a tables.link.SoftLink causes a KeyError due to:
https://github.com/PyTables/PyTables/blob/develop/tables/link.py#L76
And there doesn't look to be a way to guard against an instance of NoAttrs since that class is defined within the method. One solution may be to check the instance of g if it's a Link
return [
g for g in self._handle.walk_nodes()
if (not isinstance(g, _table_mod.link.Link) and
(getattr(g._v_attrs, 'pandas_type', None) or
getattr(g, 'table', None) or
(isinstance(g, _table_mod.table.Table) and
g._v_name != u('table'))))
]
I'd be happy to write a PR and tests if you find this change acceptable.
Expected Output
a b
0 1 2
Successful write
['/test/key']
Successful read
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf-8
LANG: en_US.utf-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None