-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Milestone
Description
version 0.11.0 has introduced a confusing behavior when importing data via read_fwf (and, I'm pretty sure, read_table)
import pandas as pd from cStringIO import StringIO from datetime import datetime tzlist = [1,10,20,30,60,80,100] ntz = len(tzlist) tcolspecs = [16]+[8]*ntz tcolnames = ['SST'] + ["T%03d" % z for z in tzlist[1:]] data = ''' 2009164202000 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 2009164203000 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 2009164204000 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 2009164205000 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 2009164210000 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039''' dftemp = pd.read_fwf(StringIO(data), index_col=0, header=None, names=tcolnames, widths=tcolspecs, parse_dates=True, date_parser=lambda s: datetime.strptime(s,'%Y%j%H%M%S'))
With version 0.10.1
In [1]: pd.__version__ Out[1]: '0.10.1' In [2]: dftemp Out[2]: SST T010 T020 T030 T060 T080 T100 2009-06-13 20:20:00 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 2009-06-13 20:30:00 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 2009-06-13 20:40:00 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 2009-06-13 20:50:00 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 2009-06-13 21:00:00 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039 In [3]: dftemp.T030 Out[3]: 2009-06-13 20:20:00 7.8372 2009-06-13 20:30:00 7.8176 2009-06-13 20:40:00 7.5889 2009-06-13 20:50:00 7.4652 2009-06-13 21:00:00 7.4905 Name: T030, dtype: float64 In [4]: dftemp.T060 Out[4]: 2009-06-13 20:20:00 6.0612 2009-06-13 20:30:00 6.0804 2009-06-13 20:40:00 6.0422 2009-06-13 20:50:00 6.0322 2009-06-13 21:00:00 6.0908 Name: T060 In [5]: dftemp.T080 Out[5]: 2009-06-13 20:20:00 5.8843 2009-06-13 20:30:00 5.8728 2009-06-13 20:40:00 5.8526 2009-06-13 20:50:00 5.8189 2009-06-13 21:00:00 5.7904 Name: T080 In [6]: dftemp.T100 Out[6]: 2009-06-13 20:20:00 5.5192 2009-06-13 20:30:00 5.4869 2009-06-13 20:40:00 5.4657 2009-06-13 20:50:00 5.4379 2009-06-13 21:00:00 5.4039 Name: T100, dtype: float64
and, with version 0.11.0
In [1]: pd.__version__ Out[1]: '0.11.0' In [2]: dftemp Out[2]: SST T010 T020 T030 T060 T080 T100 2009-06-13 20:20:00 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 2009-06-13 20:30:00 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 2009-06-13 20:40:00 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 2009-06-13 20:50:00 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 2009-06-13 21:00:00 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039 In [3]: dftemp.T030 Out[3]: 2009-06-13 20:20:00 7.8372 2009-06-13 20:30:00 7.8176 2009-06-13 20:40:00 7.5889 2009-06-13 20:50:00 7.4652 2009-06-13 21:00:00 7.4905 Name: T030, dtype: float64 In [4]: dftemp.T060 Out[4]: Empty DataFrame Columns: [SST, T010, T020, T030, T060, T080, T100] Index: [] In [5]: dftemp.T080 Out[5]: Empty DataFrame Columns: [SST, T010, T020, T030, T060, T080, T100] Index: [] In [6]: dftemp.T100 Out[6]: 2009-06-13 20:20:00 5.5192 2009-06-13 20:30:00 5.4869 2009-06-13 20:40:00 5.4657 2009-06-13 20:50:00 5.4379 2009-06-13 21:00:00 5.4039 Name: T100, dtype: float64
No matter how many columns I've been importing (up to 32 in some cases), it seems like it is always the 5th and 6th columns getting hit.