Skip to content

Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

@ichipper

Description

@ichipper

Here is the bug to reproduce the bug/unexpected behavior:

from pandas import DataFrame
from pandas import MultiIndex

midx = MultiIndex.from_tuples([('f1', 's1'),('f1','s2'),('f2', 's1'),('f2', 's2'),('f3', 's1'),('f3','s2')])
df = DataFrame([[1,2,3,4,5,6],[7,8,9,10,11,12]], columns= midx)
df1 = df.select(lambda u: u[0] in ['f2', 'f3'], axis=1)
df1_group = df1.groupby(axis=1, level=0)
print df1_group.groups
print df1_group.sum()

When running the code, we can see that df1 is:

   f1          f2         f3    
   s1  s2  s1  s2  s1  s2
0   1   2   3    4    5     6
1   7   8   9   10  11   12

And df1 is selected from subblocks of df:

   f2        f3    
   s1  s2  s1  s2
0   3   4   5   6
1   9  10  11  12

After grouping df1 by the first level of multiindex of the columns,
we can see df1_group.groups is:

{'f2': [('f2', 's1'), ('f2', 's2')], 'f3': [('f3', 's1'), ('f3', 's2')]}

However, when apply a sum function to aggregate the columns inside each group, as in the example code,
df1_group.sum() results in:

   f1       f2  f3
0 NaN   7  11
1 NaN  19  23

It seems it tries to do the aggregation using the columns of df instead of df1 so the columns of the resulting dataframe
include the label 'f1', which doesn't exist in df1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions