Skip to content

CLN: Use cython algo for groupby var with ddof != 1 #48152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 20, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Aug 19, 2022

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @rhshadrach

I think you've implemented ddof on the cython level a while back. Any reasons why var is taking a different path? Results look sensible and no failing tests

Also, gives a nice speedup

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, I didn't see a test in pandas/tests/groupby with ddof !=1. Could you confirm and add that test if there isn't one?

@phofl
Copy link
Member Author

phofl commented Aug 19, 2022

We've got the numba tests that are testing consistency between numba and cython, so I think we are good. They are testing ddof 0

@mroeschke mroeschke added this to the 1.5 milestone Aug 19, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, forgot there were cython comparison tests for numba_supported_reductions.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! From what I can tell I added it to cython but ran into issues with var, reworked the implementation and forgot to add it back in.

Could use a line in the performance section of the whatsnew

alt=lambda x: Series(x).var(ddof=ddof),
numeric_only=numeric_only,
ignore_failures=numeric_only is lib.no_default,
**{"ddof": ddof},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do ddof=ddof?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, also added whatsnew

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke merged commit 7b7beb9 into pandas-dev:main Aug 20, 2022
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the cln_groupby_var branch August 29, 2022 21:47
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* CLN: Use cython algo for groupby var with ddof != 1

* Adress review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants