Skip to content

HDFStore loses data because it silently loses microseconds in datetime index conversion #513

@bshanks

Description

@bshanks
store = HDFStore('test.h5')
store.put('test', DataFrame([0, 1, 2], [datetime.utcnow(), datetime.utcnow(),datetime.utcnow(),], ['col1']), table=True)
store['test']
Duplicate entries in table, taking most recently appended
Out[50]: 
                     col1
2011-12-20 19:20:19  2   

This happens because HDFStore uses .timetuple() to serialize, but two datetimes can be unique yet have the same .timetuple(), because .timetuple discards microseconds (slightly related discussion: http://bugs.python.org/issue2736 ; i suppose that whatever they decided to do to convert datetime to datetime64 might work?).

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions