Skip to content

HDFStore: unable to create index, no error message #28156

Closed
@adamjstewart

Description

@adamjstewart

I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.

First, I create 2 dataframes and an HDFStore:

>>> import pandas as pd
>>> import numpy as np
>>> df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> st = pd.HDFStore('appends.h5', mode='w')

Now, when I append, if I do:

>>> st.append('df', df_1, data_columns=['B'], index=False)
>>> st.append('df', df_2, data_columns=['B'], index=False)

I can successfully create an index:

>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full')
>>> st.get_storer('df').table
/df/table (Table(20,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
  "B": Float64Col(shape=(), dflt=0.0, pos=2)}
  byteorder := 'little'
  chunkshape := (2730,)
  autoindex := True
  colindexes := {
    "B": Index(9, full, shuffle, zlib(1)).is_csi=True}

But if I instead leave out the data_columns:

>>> st.append('df', df_1, index=False)
>>> st.append('df', df_2, index=False)

no index is created:

>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full')
>>> st.get_storer('df').table
/df/table (Table(20,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)

This is unintuitive for 2 reasons:

  1. Why does HDFStore need to know the indexable columns during append and during create_table_index?
  2. Why doesn't create_table_index raise an error message when it isn't able to create an index?

I think fixing either 1 or 2 would make things much more intuitive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandasIO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions