-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Description
here's the code:
records.to_hdf(
args.output, 'records',
mode='w', format='fixed', append=False,
complib='zlib', complevel=7, fletcher32=True)
r2 = pd.read_hdf(
path_or_buf=args.output, key='records',
encoding='utf-8', start=None, stop=None)
from pandas.util.testing import assert_frame_equal
assert_frame_equal(records, r2, check_exact=True)
and the traceback:
/Users/pball/miniconda3/lib/python3.3/site-packages/pandas/io/pytables.py:2441: PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->['dataset', 'record_id', 'DOD', 'CC', 'sex', 'name', 'loc', 'manner_of_death', 'eth', 'social_group', 'occ', 'clean_loc', 'month_of_death', 'year_of_death', 'name_sorted']]
warnings.warn(ws, PerformanceWarning)
Traceback (most recent call last):
File "src/import.py", line 59, in <module>
tools.epilog(args, records, logger)
File "/Users/pball/git/CO/match/import/src/lib/import_tools.py", line 46, in epilog
assert_frame_equal(records, r2, check_exact=True)
File "/Users/pball/miniconda3/lib/python3.3/site-packages/pandas/util/testing.py", line 585, in assert_frame_equal
check_exact=check_exact)
File "/Users/pball/miniconda3/lib/python3.3/site-packages/pandas/util/testing.py", line 530, in assert_series_equal
right.values))
AssertionError: [nan nan nan ..., 'c2681113' 'c12266508' 'c2680757'] is not equal to [nan nan nan ..., 'c2681113' 'c12266508' 'c2680757'].
make: *** [output/input-records.h5] Error 1
I've been trying to figure out why upstream fixes didn't seem to appear downstream. I finally came here: apparently to_hdf is writing a file that's different when it's read back. As I've been re-running this over the last hour or so, different fields have come up in the AssertionError.
Here are a few things that do not eliminate the error: with or without compression; format table or fixed. However, changing these arguments does change which field is identified by assert_frame_equal as unequal.
I have no idea how to reproduce this without my entire dataset, which is unfortunately confidential. I'll fall back to csv for now, and I hope that I'm just doing something horribly dumb that we can fix.
Metadata
Metadata
Assignees
Labels
No labels