-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
I have DataFrame with DateRange index, and i store it with HDFStore to h5.
But when I retreive my DataFrame, there is a problem with the indices cause I get this error:
Exception: Index values are not unique
Here are the commands that produce the exception, I didn't test every possible case so I keep the commands in between that I executed:
dffc=pandas.DataFrame(vpp.forecast.data, index=dr_year_900[:-1])
dffc[start_dt:stop_dt]
dffc.ix[start_dt:stop_dt]
dffc.drop(['Counter'], axis=1)
dffc=dffc.drop(['Counter'], axis=1)
dffc.ix[start_dt:stop_dt]
dffc['EFor'] = dffc['Power'] - dffc['Forecast']
dffc.ix[start_dt:stop_dt]
dffc = dffc.rename(columns={'Power':'PRef', 'Forecast':'PFor'})
dffc
dffc.ix[start_dt:stop_dt]
dfs = pandas.HDFStore('test.h5', 'w')
dfs.put('dffc', dffc)
dfs.close()
del(dffc)
dfs = pandas.HDFStore('test.h5', 'r')
dffc
dffc = dfs['dffc']
dffc
dffc.ix[start_dt:stop_dt]
==> here is the output:
In [49]: dffc.ix[start_dt:stop_dt]
Out[49]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
Counter 961 non-null values
Forecast 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
Power 961 non-null values
Price 961 non-null values
dtypes: float64(6), object(1)
In [50]: dffc.drop(['Counter'], axis=1)
Out[50]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 35040 entries, 2010-01-01 00:00:00 to 2010-12-31 23:45:00
offset: <900 Seconds>
Data columns:
Forecast 35040 non-null values
Imbalance 35040 non-null values
NegImbPrice 35040 non-null values
PosImbPrice 35040 non-null values
Power 35040 non-null values
Price 35040 non-null values
dtypes: float64(6)
In [51]: dffc=dffc.drop(['Counter'], axis=1)
In [52]: dffc.ix[start_dt:stop_dt]
Out[52]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
Forecast 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
Power 961 non-null values
Price 961 non-null values
dtypes: float64(6)
In [53]: dffc['EFor'] = dffc['Power'] - dffc['Forecast']
In [54]: dffc.ix[start_dt:stop_dt]
Out[54]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
Forecast 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
Power 961 non-null values
Price 961 non-null values
EFor 961 non-null values
dtypes: float64(7)
In [55]: dffc = dffc.rename(columns={'Power':'PRef', 'Forecast':'PFor'})
In [56]: dffc
Out[56]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 35040 entries, 2010-01-01 00:00:00 to 2010-12-31 23:45:00
offset: <900 Seconds>
Data columns:
PFor 35040 non-null values
Imbalance 35040 non-null values
NegImbPrice 35040 non-null values
PosImbPrice 35040 non-null values
PRef 35040 non-null values
Price 35040 non-null values
EFor 35040 non-null values
dtypes: float64(7)
In [57]: dffc.ix[start_dt:stop_dt]
Out[57]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
PFor 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
PRef 961 non-null values
Price 961 non-null values
EFor 961 non-null values
dtypes: float64(7)
In [58]: dfs = pandas.HDFStore('test.h5', 'w')
In [59]: dfs.put('dffc', dffc)
In [60]: dfs.close()
In [61]: del(dffc)
In [62]: dfs = pandas.HDFStore('test.h5', 'r')
In [63]: dffc
Traceback (most recent call last):
File "", line 1, in
NameError: name 'dffc' is not defined
In [64]: dffc = dfs['dffc']
In [65]: dffc
Out[65]:
<class 'pandas.core.frame.DataFrame'>
Index: 35040 entries, 2010-01-01 00:00:00 to 2010-12-31 23:45:00
Data columns:
PFor 35040 non-null values
Imbalance 35040 non-null values
NegImbPrice 35040 non-null values
PosImbPrice 35040 non-null values
PRef 35040 non-null values
Price 35040 non-null values
EFor 35040 non-null values
dtypes: float64(7)
In [66]: dffc.ix[start_dt:stop_dt]
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\indexing.py", line 35, in getitem
return self._getitem_axis(key, axis=0)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\indexing.py", line 167, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\indexing.py", line 345, in _get_slice_axis
i, j = labels.slice_locs(start, stop)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\index.py", line 819, in slice_locs
beg_slice = self.get_loc(start)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\index.py", line 499, in get_loc
return self._engine.get_loc(key)
File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2498)
File "engines.pyx", line 107, in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2447)
Exception: Index values are not unique
How can I avoid this problem?