Skip to content

Regression: DatetimeIndex ignores timezones #11736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hcarvalhoalves opened this issue Dec 1, 2015 · 6 comments · Fixed by #21822
Closed

Regression: DatetimeIndex ignores timezones #11736

hcarvalhoalves opened this issue Dec 1, 2015 · 6 comments · Fixed by #21822
Labels
Timezones Timezone data dtype
Milestone

Comments

@hcarvalhoalves
Copy link

In 0.16.2, DatetimeIndex functions used to handle the timezone correctly:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': ["2015-05-27 19:05:17.178000-02:00"]})
In [3]: pd.DatetimeIndex(df.x)
Out[3]: DatetimeIndex(['2015-05-27 19:05:17.178000-02:00'], dtype='datetime64[ns]', freq=None, tz='tzoffset(None, -7200)')

In 0.17.0, it's now ignoring timezones:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': ["2015-05-27 19:05:17.178000-02:00"]})

In [3]: pd.DatetimeIndex(df.x)
Out[3]: DatetimeIndex(['2015-05-27 21:05:17.178000'], dtype='datetime64[ns]', freq=None)

This slipped thru since the tests are very simple and don't cover many cases:

https://github.com/pydata/pandas/blob/cda9da82a68aed42421f7f788fb0c1e949e7a60c/pandas/tests/test_format.py#L3342

I'll attempt a patch later but if someone can give pointers where the breaking change happened I appreciate.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

so there are actually quite a number of tests, but not in formatting; in fact this has nothing to do with formatting.

In 0.17.1, this 'works'

In [2]: from dateutil.tz import tzoffset

In [3]: pd.DatetimeIndex(["2015-05-27 19:05:17.178000-02:00"],tz=tzoffset(None, -7200))
Out[3]: DatetimeIndex(['2015-05-27 21:05:17.178000-02:00'], dtype='datetime64[ns, tzoffset(None, -7200)]', freq=None)

but is actually 'wrong'

In [4]: pd.DatetimeIndex(["2015-05-27 19:05:17.178000-02:00"]).tz_localize('UTC').tz_convert(tz)
Out[4]: DatetimeIndex(['2015-05-27 19:05:17.178000-02:00'], dtype='datetime64[ns, tzoffset(None, -7200)]', freq=None)

is the correct result. However, this parsing of a tz-aware then 'setting' its value via the tz arg is very very odd.

This is why pd.to_datetime doesn't even have a tz argument, because its too error prone.

E.g. what exactly do you mean by having a tz-aware date AND providing a tz? should it be localized, renamed, or converted?

I think the DatetimeIndex constructor should raise on this actually, rather than let this slip thru.

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype labels Dec 1, 2015
@hcarvalhoalves
Copy link
Author

E.g. what exactly do you mean by having a tz-aware date AND providing a tz? should it be localized, renamed, or converted?

Basically, pre 0.17.x behavior was correct - it doesn't throw away the tz info neither tries to covert when fed a TZ-aware ISO string. So, if I had analogous function using standard datetime API, it would be equivalent to:

some_parser('2015-05-27 19:05:17.178000-02:00') == datetime.datetime(2015, 5, 27, 19, 5, 17, 178, tzinfo=pytz.FixedOffset(-120)) # offset is in minutes

Which is sensible behavior for such ISO datetime parser [1].

0.17.0 behavior is to ignore the TZ and create a naive datetime series, which is bad because now we've lost tz information from the string. And I've now learned 0.17.1 behavior is even worse, since it's converting and keeping the tz info. Either you convert and turn into naive (drop tz info) or you localize (add the tz info), but never a mix of both.

so there are actually quite a number of tests, but not in formatting; in fact this has nothing to do with formatting.

I see. There are tests here (https://github.com/pydata/pandas/blob/master/pandas/tests/test_series.py#L112) but are not covering parsing an ISO datetime string either.

The parsing seems to be happening here (https://github.com/pydata/pandas/blob/master/pandas/tseries/tools.py#L279), I'll come up with some test cases.

[1] https://tools.ietf.org/html/rfc3339#section-4.2

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

@hcarvalhoalves no, the pre 0.17.x behavior is wrong. It is VERY confusing. This is why the main entry point is pd.to_datetime.

I don't know if passing a tz directly is really tested at all, look at tseries/test_timeseries, tseries/test_timezones, tseries/test_base and tests/test_index as locations for tests.

Since I have never seen this I doubt there are tests (or I would have had to resolve this when I added this feature).

You are essentially 'naming' a tz. This is not a normal things to do. Either you have a string that has a tz in it with an offset, in which case you convert to UTC, THEN convert it, or you construct it in local time, then localize it.

There are NO other operations.

@hcarvalhoalves
Copy link
Author

I see. You mean the expected usage is this?

In [16]: pd.to_datetime(['2015-05-27 19:05:17.178000-02:00'], utc=True)
Out[16]: DatetimeIndex(['2015-05-27 21:05:17.178000+00:00'], dtype='datetime64[ns, UTC]', freq=None)

If this is really intended and not a regression I'll close this specific issue - and another issue could be opened to not have DatetimeIndex fail silently when there's an offset in the string.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

no I would leave this open. I think this path is not very well tested and needs exercising / clarification.

My point though is if you have text data, pd.to_datetime is the way (you can also set parse_dates=True when reading from a csv). But DatetimeIndex 'happens' to be able to parse strings and accept a tz which IMHO is just plain confusing.

@jreback jreback added this to the 0.18.0 milestone Dec 1, 2015
@jreback jreback added the Testing pandas testing functions or related to the test suite label Dec 1, 2015
@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 30, 2016
jreback added a commit to jreback/pandas that referenced this issue Feb 1, 2016
jreback pushed a commit that referenced this issue Feb 1, 2016
closes #11693     xref #11736

Author: Varun <[email protected]>
Author: Jeff Reback <[email protected]>

Closes #12195 from jreback/concat_ts and squashes the following commits:

eaad93a [Jeff Reback] BUG: DatetimeIndex constructor to handle dtype & tz with conflicts
90d71f6 [Varun] BUG GH11693 Support NaT series concatenation
@mroeschke
Copy link
Member

xref #17697, keeping this issue open since it demonstrates the behavior with using DatetimeIndex instead of explicitly to_datetime

@mroeschke mroeschke removed Effort Low Error Reporting Incorrect or improved errors from pandas Testing pandas testing functions or related to the test suite labels Jul 7, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Jul 25, 2018
aptiko added a commit to openmeteo/htimeseries that referenced this issue Nov 27, 2022
Work around an old time zone related bug
(pandas-dev/pandas#11736), which was fixed in
Pandas 0.24.
aptiko added a commit to openmeteo/htimeseries that referenced this issue Nov 27, 2022
Work around an old time zone related bug
(pandas-dev/pandas#11736), which was fixed in
Pandas 0.24.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants