Skip to content

[Bug] pandas.io.data.DataReader Yahoo dates bug #329

@carljv

Description

@carljv

Yahoo finance seems to set its historical data URLs with the month variables set back minus one, and DataReader doesn't seem to accommodate this.

For example, running:
ticker = '^GSPC'
start_dt = datetime(2008, 6, 30)
end_dt = datetime(2010, 12, 31)
data = DataReader(ticker, 'yahoo', start=start_dt, end=end_dt)

Then:

In[27]: data.index[0]
Out[27]: datetime.datetime(2008, 7, 30, 0, 0)

data.index[-1]
Out[28]: datetime.datetime(2011, 11, 2, 0, 0)

Shows that the pull started in July, not June as intended, and ended on the latest available day instead of December 31, 2010.

Looking at the source, DataReader looks like it constructs the following URL for the download:
http://ichart.yahoo.com/table.csv?s=^GSPC&a=6&b=30&c=2008&d=12&e=31&f=2010&g=d&ignore=.csv

If I go get this data from Yahoo manually, I get:
http://ichart.finance.yahoo.com/table.csv?s=^GSPC&a=05&b=30&c=2008&d=11&e=31&f=2010&g=d&ignore=.csv

(Whether you use .finance or not doesn't seem to matter).

The hack to get around this isn't straightforward, since you have to pass a datetime to DataReader, and you can't create datetime(2010, 11, 31). I'm assuming Yahoo is giving me all data to today because there is no month where d=12 in the URL (December is 11, January is 0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions