Skip to content

Possible date parsing bug in read_table #2618

@DaniellaHerman

Description

@DaniellaHerman

I recently updated Pandas to 0.10.0 under Python 2.7.3. I have encountered problems with the read_table function. My program read in the following file:

ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/monthly.land_ocean.90S.90N.df_1901-2000mean.dat

Here's the head of the file:

1880  1   -0.0760
1880  2   -0.2099
1880  3   -0.2170
1880  4   -0.1180
1880  5   -0.1680
1880  6   -0.2055
1880  7   -0.1863
1880  8   -0.1128
1880  9   -0.1192
1880 10   -0.1951

The code to load the data:

import pandas as pd
noaa_file = "monthly.land_ocean.90S.90N.df_1901-2000mean.dat"

noaa = pd.read_table(noaa_file, header=None, sep=r'\s*', parse_dates=[[0,1]], index_col=0, squeeze=True, na_values='-999.0000').to_period(freq='M')

This throws an exception:

AttributeError: 'Series' object has no attribute 'to_period'

I invoke to_period because the dates that were parsed were appearing as YYYY-MM-DD. The data is monthly and there is no need for a day component. I dropped the to_period method and the error disappeared. But I noticed something strange about the index:

             2
0_1           
1880 1 -0.0760
1880 2 -0.2099
1880 3 -0.2170
1880 4 -0.1180
1880 5 -0.1680

The index is a pandas.core.index.Index object. Under the previous version of the library, the index was a pandas.tseries.period.PeriodIndex object. It looks like the dates aren't be parsed at all. If I drop the to_period method and follow-up with

noaa_temp = pd.Series(noaa_temp.values, pd.PeriodIndex(noaa_temp.index, freq='M'))

then I get exactly what I need and what the original one-liner at the top produced under the previous version. The only way I can accomplish this in "one" line is to define a parser:

from datetime import datetime
parse = lambda x: datetime.strptime(x, '%Y %m')

noaa = pd.read_table(noaa_file, header=None, delim_whitespace=True, parse_dates=[[0,1]], index_col=0, squeeze=True, na_values='-999.0000', date_parser=parse).to_period(freq='M')

Now everything works. I think this is a bug in the date parser.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions