-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
PERF, BENCH: Fix performance issue when indexing into non-unique DatetimeIndex/PeriodIndex. #27136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a857b09
to
1881cb5
Compare
pandas/core/indexes/period.py
Outdated
@@ -906,6 +923,10 @@ def base(self): | |||
FutureWarning, stacklevel=2) | |||
return np.asarray(self._data) | |||
|
|||
def memory_usage(self, deep=False): | |||
result = super(PeriodIndex, self).memory_usage(deep=deep) | |||
result += self._int64index.memory_usage(deep=deep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only should add the int64index if it’s actually created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for '_int64index'
in self._cache
now, which avoids triggering creation.
ea5d08d
to
e03dc41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also rebase on master; note that you will need to update with black
black .
will pretty much fix things
(pip install black)
@@ -576,6 +576,30 @@ def test_indexing_over_size_cutoff(): | |||
_index._SIZE_CUTOFF = old_cutoff | |||
|
|||
|
|||
def test_indexing_over_size_cutoff_period_index(): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here (the title is pretty good though) with the issue number as a comment as well
26b2c1e
to
3809224
Compare
…timeIndex/PeriodIndex. Additionally, fix asv benchmark bugs that hid the issue and correct PeriodIndex.memory_usage().
lgtm. ping on green. |
thanks @qwhelan |
This PR fixes a number of issues involving non-unique non-numeric indexing:
NonNumericSeriesIndexing
benchmark incorrectly assumestm.makeStringIndex()
returns a monotonicIndex
. It does not and unfortunately invalidates the existing results here.PeriodIndex
andnon_monotonic
casessetup()
function creates an index but does not trigger the engine mapping to be populated. This leads to noisy and erroneous results.PeriodIndex
orDatetimeIndex
with non-unique, monotonic data creates an entirely newIndex
when being sliced. This means the engine mapping gets recomputed on every list-like query.PeriodIndex
is actually aPeriodEngine
plus a separateInt64Index
(with associatedInt64Engine
). The former contains a mapping of{Period: value}
while being a subclass ofInt64Engine
. This means calls to_bin_search()
fail as the underlying array consists ofPeriod
objects. This only occurs when the index is non-unique, monotonic, and>= 1M
elements long.value.ordinal
to thePeriodEngine
; as that is a{Period: value}
mapping, all such calls fail.value.ordinal
queries to theInt64Engine
, where they succeed.PeriodIndex.get_memory()
does not account for the memory of its associatedInt64Index
. This is fixed by simply adding it to the total.git diff upstream/master -u -- "*.py" | flake8 --diff