Skip to content

BUG: ValueError on groupby with categoricals #34951

Closed
@LukasGelbmann

Description

@LukasGelbmann

In specific situations involving categorical columns, a groupby() on two or more columns runs into an error:

Code Sample

import pandas as pd
col = pd.Categorical([0, 1])
df = pd.DataFrame({'A': col, 'B': col, 'C': col})
grouped = df.groupby(['A', 'B']).first()
# ValueError: Shape of passed values is (4, 1), indices imply (2, 1)

Expected Output

Expected output would be something like:

>>> grouped
       C
A B     
0 0    0
  1  NaN
1 0  NaN
  1    1

Instead an exception is thrown, with the error message shown above.

Activity

added this to the Contributions Welcome milestone on Jun 23, 2020
rhshadrach

rhshadrach commented on Jun 23, 2020

@rhshadrach
Member

Thanks for reporting this - I get the same exception on master. Computing this as a Series via df.groupby(['A', 'B']).C.first() works and gives the correct result.

removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Jun 23, 2020
LucasG0

LucasG0 commented on Jul 4, 2020

@LucasG0
Contributor

take

added a commit that references this issue on Jul 12, 2020

pandas-dev#34951 bug fixed and test added

modified the milestones: Contributions Welcome, 1.1 on Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

    Development

    Participants

    @jreback@LukasGelbmann@LucasG0@rhshadrach

    Issue actions

      BUG: ValueError on groupby with categoricals · Issue #34951 · pandas-dev/pandas