Skip to content

Conversation

lukemanley
Copy link
Member

import pandas as pd
import numpy as np

N = 1_000_000
idx = pd.Index(np.arange(N))
indices = np.arange(N)

%timeit idx.take(indices)

# 11.1 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   -> main
# 1.33 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  -> PR

Motivating use-case:

idx1 = pd.Index(np.tile(np.arange(1000), 1000))
idx2 = pd.Index(np.arange(100))

%timeit idx1.join(idx2, how="left")

# 132 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  -> main
# 110 ms ± 587 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   -> PR

@lukemanley lukemanley added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Jan 10, 2024
@lukemanley lukemanley added this to the 2.3 milestone Jan 10, 2024
@mroeschke mroeschke merged commit 17cdcd9 into pandas-dev:main Jan 10, 2024
@mroeschke
Copy link
Member

Thanks @lukemanley

@lithomas1 lithomas1 modified the milestones: 2.3, 3.0 Jan 11, 2024
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* Index.take to check is_range_indexer

* whatsnew

* MultiIndex.take to check is_range_indexer

* ensure 1-dim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants