Closed
Description
I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.
First, I create 2 dataframes and an HDFStore:
>>> import pandas as pd
>>> import numpy as np
>>> df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> st = pd.HDFStore('appends.h5', mode='w')
Now, when I append, if I do:
>>> st.append('df', df_1, data_columns=['B'], index=False)
>>> st.append('df', df_2, data_columns=['B'], index=False)
I can successfully create an index:
>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full')
>>> st.get_storer('df').table
/df/table (Table(20,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
"B": Float64Col(shape=(), dflt=0.0, pos=2)}
byteorder := 'little'
chunkshape := (2730,)
autoindex := True
colindexes := {
"B": Index(9, full, shuffle, zlib(1)).is_csi=True}
But if I instead leave out the data_columns
:
>>> st.append('df', df_1, index=False)
>>> st.append('df', df_2, index=False)
no index is created:
>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full')
>>> st.get_storer('df').table
/df/table (Table(20,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)}
byteorder := 'little'
chunkshape := (2730,)
This is unintuitive for 2 reasons:
- Why does HDFStore need to know the indexable columns during
append
and duringcreate_table_index
? - Why doesn't
create_table_index
raise an error message when it isn't able to create an index?
I think fixing either 1 or 2 would make things much more intuitive.