Closed
Description
In specific situations involving categorical columns, a groupby() on two or more columns runs into an error:
Code Sample
import pandas as pd
col = pd.Categorical([0, 1])
df = pd.DataFrame({'A': col, 'B': col, 'C': col})
grouped = df.groupby(['A', 'B']).first()
# ValueError: Shape of passed values is (4, 1), indices imply (2, 1)
Expected Output
Expected output would be something like:
>>> grouped
C
A B
0 0 0
1 NaN
1 0 NaN
1 1
Instead an exception is thrown, with the error message shown above.
Activity
rhshadrach commentedon Jun 23, 2020
Thanks for reporting this - I get the same exception on master. Computing this as a Series via
df.groupby(['A', 'B']).C.first()
works and gives the correct result.LucasG0 commentedon Jul 4, 2020
take
pandas-dev#34951 bug fixed and test added