Skip to content

Support boolean data in Cythonized groupby #315

@wesm

Description

@wesm

reported by @bburan on the mailing list


In [146]: frame = DataFrame({'a': np.random.randint(0, 5, 10), 'b': np.random.ra
ndint(0, 2, 10).astype('bool')})

In [147]: print frame
  a  b
0  1  False
1  2  True
2  2  True
3  1  True
4  3  True
5  3  True
6  3  True
7  1  True
8  2  True
9  1  False

In [150]: frame.groupby('a')['b'].mean()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
c:\users\brad\projects\sane\src\sane\scripts\<ipython-input-150-fceb5b21892e> in
 <module>()
----> 1 frame.groupby('a')['b'].mean()

C:\Python27\lib\site-packages\pandas\core\groupby.pyc in mean(self)
   297         For multiple groupings, the result index will be a MultiIndex
   298         """
--> 299         return self._cython_agg_general('mean')
   300
   301     def size(self):

C:\Python27\lib\site-packages\pandas\core\groupby.pyc in _cython_agg_general(sel
f, how)
   347             output[name] = result[mask]
   348
--> 349         return self._wrap_aggregated_output(output, mask)
   350
   351     def _get_multi_index(self, mask):

UnboundLocalError: local variable 'mask' referenced before assignment

Note that agg() does work:

In [153]: frame.groupby('a')['b'].agg(np.mean)
Out[153]:
1    0.5
2    1.0
3    1.0
Name: None, Length: 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions