-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Description
Here is the bug to reproduce the bug/unexpected behavior:
from pandas import DataFrame
from pandas import MultiIndex
midx = MultiIndex.from_tuples([('f1', 's1'),('f1','s2'),('f2', 's1'),('f2', 's2'),('f3', 's1'),('f3','s2')])
df = DataFrame([[1,2,3,4,5,6],[7,8,9,10,11,12]], columns= midx)
df1 = df.select(lambda u: u[0] in ['f2', 'f3'], axis=1)
df1_group = df1.groupby(axis=1, level=0)
print df1_group.groups
print df1_group.sum()
When running the code, we can see that df1 is:
f1 f2 f3
s1 s2 s1 s2 s1 s2
0 1 2 3 4 5 6
1 7 8 9 10 11 12
And df1 is selected from subblocks of df:
f2 f3
s1 s2 s1 s2
0 3 4 5 6
1 9 10 11 12
After grouping df1 by the first level of multiindex of the columns,
we can see df1_group.groups is:
{'f2': [('f2', 's1'), ('f2', 's2')], 'f3': [('f3', 's1'), ('f3', 's2')]}
However, when apply a sum function to aggregate the columns inside each group, as in the example code,
df1_group.sum() results in:
f1 f2 f3
0 NaN 7 11
1 NaN 19 23
It seems it tries to do the aggregation using the columns of df instead of df1 so the columns of the resulting dataframe
include the label 'f1', which doesn't exist in df1.