BUG: Groupby lost index, when one of the agg keys had no function all… #33086

phofl · 2020-03-28T00:07:06Z

…ocated

closes [BUG] Grouped-by column loses name when empty list of aggregations is specified. #32580
closes BUG: index name is lost when doing groupby where by is a dict and at least one value in the by dict is a list of of size zero #33545 (is a duplicate of 32580)
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The issue was the concatenation with an empty DataFrame and the result of the min function. This resulted in the lost index.

I changed the input for the concatenation, so that only non empty DataFrames would be concatenated. We have to catch the case, that all DataFrames are empty, because this would result in an error.

…ocated

jreback · 2020-03-29T15:11:41Z

pandas/core/base.py

@@ -439,7 +439,13 @@ def is_any_frame() -> bool:
                # we have a dict of DataFrames
                # return a MI DataFrame

-                return concat([result[k] for k in keys], keys=keys, axis=1), True
+                keys_to_use = [k for k in keys if not result[k].empty]


I think the bug is not actually here. rather pls see where concat actually mishandles this and adjust there. concat handles None / empty frames, must not be discarding the keys when that happens.

we don't want to compare vs [], rather not len(...)

@jreback The desired behavior for concat would be, that it keeps all information and returns the DataFrame with the right index. Did I get that right?

Then I will look into this.

@jreback
I looked into this now. I think everything in concat works as expected. We have the following starting point:

The concat function receives two DataFrames as input.

the empty DataFrame with index Index([], dtype="object", name=None)

the c-DataFrame with index Int64Index([1, 2], dtype="int64", name="a")

concat performs the following relevant steps in our case.

It casts both indices to datatype object, because they do not match beforehand.

It determines the name as follows: If both names are equal, the regular name is returned. If the names differ, None is returned (our case, because None != "a"). You can look this up here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops/init.py#L139

As far as I understood the code, we could do the following to change this behavior:

Change the logic during the name definition to returne the name != None if one of them is None (this breaks tests, so I think this idea is not that good test_maybe_match_name in here

We could also change the function, which casts both indices to datatype object to avoid datatype issues in the resulting index. We would have to modify the code in here

Both parts would change the indexes part, which is not directly related to our group by issue.

Alternatively, we could modfiy the DataFrames in the concat part before defining the final index, but I think that is not a good idea.

Could you tell me how to proceed?

pandas/tests/groupby/aggregate/test_aggregate.py

jreback · 2020-04-20T22:27:02Z

I am not sure I agree with supporting this case. what does an empty list in an aggregation even mean?

phofl · 2020-04-20T22:31:59Z

I am not sure I agree with supporting this case. what does an empty list in an aggregation even mean?

Should the calculation fail as a whole in this case?
Alternatively, we could ignore the empty list as a whole in the aggregate function

jreback · 2020-04-23T00:44:36Z

will look at this again - might be ok with your soln

jreback · 2020-06-14T15:47:51Z

can you merge master and will have a look

� Conflicts: � doc/source/whatsnew/v1.1.0.rst

phofl · 2020-06-14T16:32:47Z

@jreback merged master

jreback · 2020-06-14T16:54:15Z

kk ping on green.

phofl · 2020-06-14T17:49:02Z

@jreback green, can be merged

jreback · 2020-06-14T18:04:57Z

thanks @phofl

BUG: Groupby lost index, when one of the agg keys had no function all…

e83a151

…ocated

jreback requested changes Mar 29, 2020

View reviewed changes

jreback added Bug Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 29, 2020

phofl added 2 commits March 29, 2020 20:01

BUG: Parametrize unittest

b96400a

BUG: Run Black Pandas

6095487

phofl mentioned this pull request Apr 19, 2020

BUG: index name is lost when doing groupby where by is a dict and at least one value in the by dict is a list of of size zero #33545

Closed

3 tasks

phofl requested a review from jreback April 19, 2020 21:27

Merge branch 'master' into 32580_groupy_lost_index

26ab4fc

Merge branch 'master' of https://github.com/pandas-dev/pandas into 32580

51f636e

� Conflicts: � doc/source/whatsnew/v1.1.0.rst

jreback added this to the 1.1 milestone Jun 14, 2020

jreback approved these changes Jun 14, 2020

View reviewed changes

jreback merged commit 2637cf5 into pandas-dev:master Jun 14, 2020

phofl deleted the 32580_groupy_lost_index branch June 14, 2020 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Groupby lost index, when one of the agg keys had no function all… #33086

BUG: Groupby lost index, when one of the agg keys had no function all… #33086

Uh oh!

phofl commented Mar 28, 2020 •

edited

Loading

Uh oh!

jreback Mar 29, 2020

Uh oh!

jreback Mar 29, 2020

Uh oh!

phofl Mar 29, 2020 •

edited

Loading

Uh oh!

jreback Mar 29, 2020

Uh oh!

phofl Mar 29, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jreback commented Apr 20, 2020

Uh oh!

phofl commented Apr 20, 2020 •

edited

Loading

Uh oh!

jreback commented Apr 23, 2020

Uh oh!

jreback commented Jun 14, 2020

Uh oh!

phofl commented Jun 14, 2020

Uh oh!

jreback commented Jun 14, 2020

Uh oh!

phofl commented Jun 14, 2020

Uh oh!

jreback commented Jun 14, 2020

Uh oh!

Uh oh!

Uh oh!

BUG: Groupby lost index, when one of the agg keys had no function all… #33086

BUG: Groupby lost index, when one of the agg keys had no function all… #33086

Uh oh!

Conversation

phofl commented Mar 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback Mar 29, 2020

Choose a reason for hiding this comment

Uh oh!

jreback Mar 29, 2020

Choose a reason for hiding this comment

Uh oh!

phofl Mar 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Mar 29, 2020

Choose a reason for hiding this comment

Uh oh!

phofl Mar 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jreback commented Apr 20, 2020

Uh oh!

phofl commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Apr 23, 2020

Uh oh!

jreback commented Jun 14, 2020

Uh oh!

phofl commented Jun 14, 2020

Uh oh!

jreback commented Jun 14, 2020

Uh oh!

phofl commented Jun 14, 2020

Uh oh!

jreback commented Jun 14, 2020

Uh oh!

Uh oh!

phofl commented Mar 28, 2020 •

edited

Loading

phofl Mar 29, 2020 •

edited

Loading

phofl Mar 29, 2020 •

edited

Loading

phofl commented Apr 20, 2020 •

edited

Loading