PERF: Series.transform speedups (GH6496) #7421

jreback · 2014-06-10T20:44:05Z

turns out indexing into an array rather than building it up as a list
using concat is faster (but have to be careful of type changes).

# this PR
In [11]: %timeit df['signal'].groupby(g).transform(np.mean)
10 loops, best of 3: 158 ms per loop

# master
In [11]: %timeit df['signal'].groupby(g).transform(np.mean)
1 loops, best of 3: 601 ms per loop

In [1]: np.random.seed(0)

In [2]: N = 120000

In [3]: N_TRANSITIONS = 1400

In [5]: transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS]

In [6]: transition_points.sort()

In [7]: transitions = np.zeros((N,), dtype=np.bool)

In [8]: transitions[transition_points] = True

In [9]: g = transitions.cumsum()

In [10]: df = DataFrame({ 'signal' : np.random.rand(N)})

PERF: Series.transform speedups (GH6496)

jreback added Performance labels Jun 10, 2014

jreback added this to the 0.14.1 milestone Jun 10, 2014

PERF: Series.transform speedups (GH6496)

dd85fa0

jreback added a commit that referenced this pull request Jun 11, 2014

Merge pull request #7421 from jreback/transform_speed

53c6b08

PERF: Series.transform speedups (GH6496)

jreback merged commit 53c6b08 into pandas-dev:master Jun 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Series.transform speedups (GH6496) #7421

PERF: Series.transform speedups (GH6496) #7421

Uh oh!

jreback commented Jun 10, 2014

Uh oh!

Uh oh!

Uh oh!

PERF: Series.transform speedups (GH6496) #7421

PERF: Series.transform speedups (GH6496) #7421

Uh oh!

Conversation

jreback commented Jun 10, 2014

Uh oh!

Uh oh!