Skip to content

BUG: Fix segfault in random.permutation(x) when x is a string. #14241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 22, 2019

Conversation

maxwell-aladago
Copy link
Contributor

@maxwell-aladago maxwell-aladago commented Aug 9, 2019

Reference #14238

I made the assumption that anyone who passes a string x to the function np.random.permutation(x) most likely intend to shuffle the string in question. I suggest that if this isn't the desired behaviour, then I can flag an error instead of the segfault.

@maxwell-aladago maxwell-aladago changed the title Fixes error from numpy.random.permuation(x) where isinstance(x, str). Reference issue #14238 Fixes error from numpy.random.permuation(x) where isinstance(x, str). Referenc [ #14238](https://github.com/numpy/numpy/issues/14238) Aug 9, 2019
@maxwell-aladago maxwell-aladago changed the title Fixes error from numpy.random.permuation(x) where isinstance(x, str). Referenc [ #14238](https://github.com/numpy/numpy/issues/14238) Fixes error from numpy.random.permuation(x) where isinstance(x, str). Aug 9, 2019
@seberg
Copy link
Member

seberg commented Aug 9, 2019

To be honest, I do not think I like to special case strings like that. This also does not fix the undrelying segmentation fault, since the same error will still happen for example for floating point input.

The underlying issue is the cython boundscheck=False directive, which also applies to tuples. Either, we have to mark this (and possibly some other places) expliciteley using:

with cython.boundscheck(True):
    ...

or the other way around (could probably also mark the whole function).

Maybe the best option here specifically is to just raise a new ValueError("permutation of a 0 dimensional array is not defined.") or something nicer.

@rkern
Copy link
Member

rkern commented Aug 9, 2019

The old behavior is to raise an IndexError. We should probably do just that for RandomState.permutation(). We can think about what we want for Generator.permutation('some_string'), but we should have the IndexError as a backstop for Generator.permutation(1.5), etc.

[~]
|1> np.random.permutation('abc')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-1-eab1612dcd44> in <module>()
----> 1 np.random.permutation('abc')

mtrand.pyx in mtrand.RandomState.permutation()

IndexError: tuple index out of range

@seberg
Copy link
Member

seberg commented Aug 9, 2019

Hmm, right, at least for the legacy API its probably better to err on the save side. Another option, which works for both is to use np.AxisError (since effectively there is an axis=0 going on, even if the user cannot set it). That error subclasses from IndexError and ValueError for exactly that backward compatibility concern.

@maxwell-aladago
Copy link
Contributor Author

Thanks for the feedback. The issue about floating point numbers is true. Will you consider a PR for an np.AxisError or you'll prefer an explicit cython bounds check? @seberg

@seberg
Copy link
Member

seberg commented Aug 10, 2019

Raising an error explicitly is good, I do not mind much either way, but I feel AxisError may be nicer than IndexError.
If you have time, we may want to check the file for similar issues (e.g. bad Axis input that used to raise an IndexError automatically, and now will segfault).

@maxwell-aladago
Copy link
Contributor Author

Raising an error explicitly is good, I do not mind much either way, but I feel AxisError may be nicer than IndexError.
If you have time, we may want to check the file for similar issues (e.g. bad Axis input that used to raise an IndexError automatically, and now will segfault).

I do have some time. Will look at the bad Axis input too.

@eric-wieser
Copy link
Member

My claim would be that AxisError is only correct if you can trigger it via a sane call to normalize_axis_index

@maxwell-aladago
Copy link
Contributor Author

My claim would be that AxisError is only correct if you can trigger it via a sane call to normalize_axis_index

A check like

'.normalize_axis_index(0, arr.ndim)` will trigger an AxisError! And I think it's sane enough since intuitively, we expect all permutable inputs to be either ints or have at least 1 axis

@maxwell-aladago maxwell-aladago changed the title Fixes error from numpy.random.permuation(x) where isinstance(x, str). BUG: Addresses error from numpy.random.permuation(x) where isinstance(x, str). Aug 10, 2019
@@ -3921,7 +3921,7 @@ cdef class Generator:
Randomly permute a sequence, or return a permuted range.

If `x` is a multi-dimensional array, it is only shuffled along its
first index.
first index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the whitespace additions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spurious space is still there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still there. @maxwell-aladago, the space added at the end of this line should not be there.

@anntzer
Copy link
Contributor

anntzer commented Aug 19, 2019

fwiw I think this should raise an error: this would be consistent with np.random.choice("abcd") which currently raises an error -- in fact, this would even be consistent with np.asarray("abcd") returning a scalar array of dtype <U4 rather than a shape-(4,) array.

@maxwell-aladago
Copy link
Contributor Author

fwiw I think this should raise an error: this would be consistent with np.random.choice("abcd") which currently raises an error -- in fact, this would even be consistent with np.asarray("abcd") returning a scalar array of dtype <U4 rather than a shape-(4,) array.

np.random.choice("abc") currently gives a ValueError whereas
np.random.permutation("abc") results in a segfault. The current patch ensures it throws a sensible np.AxisError. To be consistent with np.random.choice("abc"), we'll have to update the patch to throw a ValueError instead.

@WarrenWeckesser
Copy link
Member

Wouldn't a TypeError or ValueError be more appropriate? The argument for raising AxisError is not very strong, whereas the text added to the exception message, "x must be an integer or an array_like object.", suggests that the exception should be a TypeError or a ValueError.

@maxwell-aladago
Copy link
Contributor Author

Wouldn't a TypeError or ValueError be more appropriate? The argument for raising AxisError is not very strong, whereas the text added to the exception message, "x must be an integer or an array_like object.", suggests that the exception should be a TypeError or a ValueError.

Changed it to ValueError

@eric-wieser
Copy link
Member

ValueError is fine. If we added an explicit axis argument, then we can switch to AxisError then, especially since issubclass(AxisError, ValueError).

@rkern
Copy link
Member

rkern commented Aug 19, 2019

The appropriate comparison is not np.random.choice() but the 1.16 behavior of np.random.permutation(), which, as noted above, raises an IndexError.

@seberg
Copy link
Member

seberg commented Aug 19, 2019

I think the last case we could just stick with raising an error, but either will work. While we do generally allow 0D arrays instead of scalars, the function is overloaded in a way that I think it would be fair to assume it is unintentional when it happens. Plus 0D arrays are exceedingly rare in any case.

@maxwell-aladago
Copy link
Contributor Author

Another question about fixing this issue. To what extent do we expect a scalar array (i.e. a zero-dimensional array) to act like an instance of its underlying dtype? Currently, the seg. fault occurs when a zero-dimensional array is passed to permutation:

In [2]: import numpy as np                                                                                                         

In [3]: rng = np.random.Generator(np.random.PCG64())                                                                               

In [4]: rng.permutation(5)                                                                                                         
Out[4]: array([0, 3, 4, 1, 2])

In [5]: rng.permutation(np.int64(5))                                                                                               
Out[5]: array([2, 1, 4, 3, 0])

In [6]: rng.permutation(np.array(5))                                                                                               
Segmentation fault: 11

Should that last case act the same as the previous two, and return a permutation of range(5)?

my patch actually fixed the case for 'np.random.permutation(np.array(5))

@eric-wieser
Copy link
Member

eric-wieser commented Aug 19, 2019

I don't think we should use IndexError for the new Generator objects on a compatibility argument - part of the point of the new API was to allow us to break compatibility. ValueError was in my opinion more appropriate.

@maxwell-aladago
Copy link
Contributor Author

I don't think we should use IndexError for the new Generator objects - part of the point of the new API was to allow us to break compatibility. ValueError was in my opinion more appropriate.

So, you want me to roll it back to ValueError? @eric-wieser. ValueError looks right to me too because it really isn't an indexing problem.

@rkern
Copy link
Member

rkern commented Aug 19, 2019

Ah, per the PR title, I thought we were talking about np.random.permutation() which is RandomState.permutation(). Do what you like with Generator.permutation(). AxisError would be the best exception there (and Generator.choice() to match) in the cases that you don't want to handle specially.

But it seems like this PR does not fix RandomState.permutation(), so np.random.permutation() remains unfixed as well. That still segfaults. The proper fix there is to raise an IndexError to match 1.16 behavior.

@maxwell-aladago
Copy link
Contributor Author

maxwell-aladago commented Aug 19, 2019

Ah, per the PR title, I thought we were talking about np.random.permutation() which is RandomState.permutation(). Do what you like with Generator.permutation(). AxisError would be the best exception there (and Generator.choice() to match) in the cases that you don't want to handle specially.

But it seems like this PR does not fix RandomState.permutation(), so np.random.permutation() remains unfixed as well. That still segfaults. The proper fix there is to raise an IndexError to match 1.16 behavior.

Fixes both RandomState.permutation() and Generator.permutation() now. Let me know whether this is a good fix @rkern so I can change change the Generator.choice() to match too.

@WarrenWeckesser
Copy link
Member

@rkern wrote

Do what you like with Generator.permutation(). AxisError would be the best exception there (and Generator.choice() to match) in the cases that you don't want to handle specially.

What is the rationale for raising an AxisError instead of a ValueError? For an input such as x=1.5, is it something like "The argument must be int or array_like. If the argument is not, in fact, an int, then it must be array_like, and when we try to use the first axis of the scalar 1.5 we get an AxisError"?

@@ -3921,7 +3921,7 @@ cdef class Generator:
Randomly permute a sequence, or return a permuted range.

If `x` is a multi-dimensional array, it is only shuffled along its
first index.
first index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spurious space is still there.

>>> rng.permutation("abc")
Traceback (most recent call last):
...
numpy.AxisError: x must be an integer or at least 1-dimensional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure an error example helps too much here, but maybe if someone thinks they can shuffle strings or similar objects.

@rkern
Copy link
Member

rkern commented Aug 19, 2019

What is the rationale for raising an AxisError instead of a ValueError?

Broadly speaking, we are expecting to be indexing along the first of >=1 axes, in the usual case. The integer semantic branch complicates that, and the message should incorporate information about that, but I think that AxisError is the most reasonable (as it is both a ValueError and an IndexError).

@charris
Copy link
Member

charris commented Aug 19, 2019

The refguide check is failing.

@charris charris added this to the 1.17.1 release milestone Aug 19, 2019
@charris charris changed the title BUG: Addresses error from numpy.random.permuation(x) where isinstance(x, str). BUG: Fix segfault in random.permutation(x) when x is a string. Aug 19, 2019
@charris
Copy link
Member

charris commented Aug 20, 2019

Everyone good with this?

@charris
Copy link
Member

charris commented Aug 22, 2019

@maxwell-aladago Could you make another PR against 1.16.x that only makes the fixes/tests for RandomState?

@charris charris merged commit 965ea2f into numpy:master Aug 22, 2019
charris pushed a commit to charris/numpy that referenced this pull request Aug 22, 2019
…py#14241)

* fixing segfault error in np.random.permutation(x) where x is str

* removed whitespace

* changing error type to ValueError

* changing error type to ValueError

* changing error type to ValueError

* tests

* changed error to IndexError for backward compatibility with numpy 1.16

* fixes numpy.randomstate.permutation segfault too

* Rolled back to ValueError for Generator.permutation() for all 0-dimensional

* fixes refuige erro and rolls backs to AxisError
@rkern
Copy link
Member

rkern commented Aug 22, 2019

I don't think there's a segfault in 1.16. It was introduced in 1.17.

@charris
Copy link
Member

charris commented Aug 22, 2019

I don't think there's a segfault in 1.16. It was introduced in 1.17.

Ah, just checked and you are right.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 22, 2019
@charris charris removed this from the 1.17.1 release milestone Aug 22, 2019
@seberg
Copy link
Member

seberg commented Aug 22, 2019

Did anyone check for other similar errors, or should I go through it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants