Skip to content

BUG: Groupby lost index, when one of the agg keys had no function all… #33086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 14, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 28, 2020

…ocated

The issue was the concatenation with an empty DataFrame and the result of the min function. This resulted in the lost index.

I changed the input for the concatenation, so that only non empty DataFrames would be concatenated. We have to catch the case, that all DataFrames are empty, because this would result in an error.

@@ -439,7 +439,13 @@ def is_any_frame() -> bool:
# we have a dict of DataFrames
# return a MI DataFrame

return concat([result[k] for k in keys], keys=keys, axis=1), True
keys_to_use = [k for k in keys if not result[k].empty]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bug is not actually here. rather pls see where concat actually mishandles this and adjust there. concat handles None / empty frames, must not be discarding the keys when that happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to compare vs [], rather not len(...)

Copy link
Member Author

@phofl phofl Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback The desired behavior for concat would be, that it keeps all information and returns the DataFrame with the right index. Did I get that right?

Then I will look into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

@phofl phofl Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I looked into this now. I think everything in concat works as expected. We have the following starting point:

The concat function receives two DataFrames as input.

  • the empty DataFrame with index Index([], dtype="object", name=None)
  • the c-DataFrame with index Int64Index([1, 2], dtype="int64", name="a")

concat performs the following relevant steps in our case.

As far as I understood the code, we could do the following to change this behavior:

  • Change the logic during the name definition to returne the name != None if one of them is None (this breaks tests, so I think this idea is not that good test_maybe_match_name in here
  • We could also change the function, which casts both indices to datatype object to avoid datatype issues in the resulting index. We would have to modify the code in here

Both parts would change the indexes part, which is not directly related to our group by issue.

Alternatively, we could modfiy the DataFrames in the concat part before defining the final index, but I think that is not a good idea.

Could you tell me how to proceed?

@jreback jreback added Bug Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 29, 2020
@jreback
Copy link
Contributor

jreback commented Apr 20, 2020

I am not sure I agree with supporting this case. what does an empty list in an aggregation even mean?

@phofl
Copy link
Member Author

phofl commented Apr 20, 2020

I am not sure I agree with supporting this case. what does an empty list in an aggregation even mean?

Should the calculation fail as a whole in this case?
Alternatively, we could ignore the empty list as a whole in the aggregate function

@jreback
Copy link
Contributor

jreback commented Apr 23, 2020

will look at this again - might be ok with your soln

@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

can you merge master and will have a look

@phofl
Copy link
Member Author

phofl commented Jun 14, 2020

@jreback merged master

@jreback jreback added this to the 1.1 milestone Jun 14, 2020
@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

kk ping on green.

@phofl
Copy link
Member Author

phofl commented Jun 14, 2020

@jreback green, can be merged

@jreback jreback merged commit 2637cf5 into pandas-dev:master Jun 14, 2020
@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

thanks @phofl

@phofl phofl deleted the 32580_groupy_lost_index branch June 14, 2020 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
2 participants