Skip to content

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 10, 2014

closes #6496

turns out indexing into an array rather than building it up as a list
using concat is faster (but have to be careful of type changes).

# this PR
In [11]: %timeit df['signal'].groupby(g).transform(np.mean)
10 loops, best of 3: 158 ms per loop

# master
In [11]: %timeit df['signal'].groupby(g).transform(np.mean)
1 loops, best of 3: 601 ms per loop
In [1]: np.random.seed(0)

In [2]: N = 120000

In [3]: N_TRANSITIONS = 1400

In [5]: transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS]

In [6]: transition_points.sort()

In [7]: transitions = np.zeros((N,), dtype=np.bool)

In [8]: transitions[transition_points] = True

In [9]: g = transitions.cumsum()

In [10]: df = DataFrame({ 'signal' : np.random.rand(N)})

@jreback jreback added this to the 0.14.1 milestone Jun 10, 2014
jreback added a commit that referenced this pull request Jun 11, 2014
PERF: Series.transform speedups (GH6496)
@jreback jreback merged commit 53c6b08 into pandas-dev:master Jun 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: transform speedup
1 participant