Skip to content

ENH/PERF: pyarrow timestamp & duration conversion consistency/performance #53326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 22, 2023

Conversation

lukemanley
Copy link
Member

A few related changes:

  1. ArrowExtensionArray.__getitem__(int) will now return a Timestamp/Timedelta for non-nano timestamp/duration types to be consistent with nanosecond types. Previously non-nano types returned python native datetime/timedelta.
  2. ArrowExtensionArray.__iter__ will now yield Timestamp/Timedelta objects for non-nano types to be consistent with nanosecond types.
  3. ArrowExtensionArray.to_numpy now allows for zero-copy for timestamp/duration types

Submitting as a single PR since there are a number of tests that require consistency across these methods and trying to split the nano/non-nano behavior from the performance improvements is tricky.

These were somewhat motivated by:

import pandas as pd
import pyarrow as pa

N = 1_000_000
arr = pd.array(range(N), dtype=pd.ArrowDtype(pa.timestamp("s")))

%timeit arr.astype("M8[s]")
# 5.29 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)          -> main
# 137 µs ± 4.88 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)   -> PR

%timeit pd.DatetimeIndex(arr)
# 6.31 s ± 560 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)          -> main
# 67.6 µs ± 3.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  -> pr

@lukemanley lukemanley added Performance Memory or execution speed performance Arrow pyarrow functionality Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels May 21, 2023
@lukemanley lukemanley added this to the 2.1 milestone May 21, 2023
@mroeschke mroeschke merged commit 1e61215 into pandas-dev:main May 22, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 22, 2023
…ance (pandas-dev#53326)

* ENH/PERF: pyarrow timestamp & duration conversion consistency

* gh refs

* typo

* whatsnew
@lukemanley lukemanley deleted the pyarrow-temporal-conversions branch May 30, 2023 22:16
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…ance (pandas-dev#53326)

* ENH/PERF: pyarrow timestamp & duration conversion consistency

* gh refs

* typo

* whatsnew
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Non-Nano datetime64/timedelta64 with non-nanosecond resolution Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants