Skip to content

Conversation

troels
Copy link
Contributor

@troels troels commented Sep 9, 2018

When integer arrays contained integers that were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.

@pep8speaks
Copy link

Hello @troels! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Sep 10, 2018

Codecov Report

Merging #22653 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22653      +/-   ##
==========================================
+ Coverage   92.17%   92.17%   +<.01%     
==========================================
  Files         169      169              
  Lines       50708    50711       +3     
==========================================
+ Hits        46740    46743       +3     
  Misses       3968     3968
Flag Coverage Δ
#multiple 90.58% <100%> (ø) ⬆️
#single 42.35% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby/ops.py 96.81% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0976e12...5340c0b. Read the comment docs.

@@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1,
if (values == iNaT).any():
values = ensure_float64(values)
else:
values = values.astype('int64', copy=False)
try:
values = values.astype('int64', copy=False, casting='safe')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well this certainly works though I'm wondering if there's not a more comprehensive way that we should be handling (ex: adding the casting="safe" call to our algos templates)

cc @jreback and @jbrockmendel for any input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is something that occurs many places, as e.g. int and uint64 can not always be cast into int64 and thinking they can, just for satisfying is_integer_dtype is wrong.

A few places around the code seems to have the same problem, e.g.:
pandas/core/algorithms.py:427
pandas/core/groupby/generic.py:1168

But as far as I can see, there isn't a general solution to this except for fixing the code in the respective places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u take this code and make a function out of it and put where we defined ensure_float64
define

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I've created a function in dtypes/common.py now.

@WillAyd WillAyd added Groupby Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Sep 11, 2018
@@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1,
if (values == iNaT).any():
values = ensure_float64(values)
else:
values = values.astype('int64', copy=False)
try:
values = values.astype('int64', copy=False, casting='safe')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u take this code and make a function out of it and put where we defined ensure_float64
define

…#22487)

When integer arrays contained integers that could were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.
@jreback jreback added this to the 0.24.0 milestone Sep 18, 2018
@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

lgtm. @troels there might be some other places you can now use this routine (in groupby), but can certainly be done as a followup.

back to you @WillAyd

@WillAyd WillAyd merged commit 79f7762 into pandas-dev:master Sep 18, 2018
@WillAyd
Copy link
Member

WillAyd commented Sep 18, 2018

Thanks @troels - very nice change!

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.gropuby().mean() incorrect result
5 participants