Skip to content

Conversation

swt2c
Copy link
Contributor

@swt2c swt2c commented Mar 7, 2025

This fixes this error when slicing massive dataframes:

Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in getitem
return self._getitem_bool_array(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array
return self._take_with_is_copy(indexer, axis=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy
result = self.take(indices=indices, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take
new_data = self._mgr.take(
^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take
new_labels = self.axes[axis].take(indexer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take
maybe_slice = lib.maybe_indices_to_slice(indices, len(self))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice
OverflowError: value too large to convert to int

This fixes this error when slicing massive dataframes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in __getitem__
    return self._getitem_bool_array(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array
    return self._take_with_is_copy(indexer, axis=0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy
    result = self.take(indices=indices, axis=axis)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take
    new_data = self._mgr.take(
               ^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take
    new_labels = self.axes[axis].take(indexer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take
    maybe_slice = lib.maybe_indices_to_slice(indices, len(self))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice
OverflowError: value too large to convert to int
@swt2c
Copy link
Contributor Author

swt2c commented Mar 7, 2025

This is a squash and rebase of #59537.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update pandas/_libs/lib.pyi like in https://github.com/pandas-dev/pandas/pull/59537/files?

@mroeschke mroeschke added the Internals Related to non-user accessible pandas implementation label Mar 7, 2025
@swt2c
Copy link
Contributor Author

swt2c commented Mar 7, 2025

Can you also update pandas/_libs/lib.pyi like in https://github.com/pandas-dev/pandas/pull/59537/files?

If I do that, the CI fails. See the CI run for the "Sort whatsnew entries" commit for the errors. It seems like the type hint needs to stay as int?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's OK that the stubs aren't _perfectly aligned with the Cython signature.

@mroeschke mroeschke added this to the 3.0 milestone Mar 7, 2025
@mroeschke mroeschke merged commit 0acb9a0 into pandas-dev:main Mar 7, 2025
46 checks passed
@mroeschke
Copy link
Member

Thanks @swt2c

@swt2c swt2c deleted the fix_overflowerror_slicing_massive_dataframes branch March 7, 2025 23:36
gm-oo9 pushed a commit to gm-oo9/pandas that referenced this pull request Mar 10, 2025
)

* BUG: Fix OverflowError in lib.maybe_indices_to_slice()

This fixes this error when slicing massive dataframes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in __getitem__
    return self._getitem_bool_array(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array
    return self._take_with_is_copy(indexer, axis=0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy
    result = self.take(indices=indices, axis=axis)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take
    new_data = self._mgr.take(
               ^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take
    new_labels = self.axes[axis].take(indexer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take
    maybe_slice = lib.maybe_indices_to_slice(indices, len(self))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice
OverflowError: value too large to convert to int

* Sort whatsnew entries

* Set type hint back to int

---------

Co-authored-by: benjamindonnachie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: OverflowError: value too large to convert to int when manipulating very large dataframes
3 participants