Skip to content

MAINT: Warn users when calling np.ma.MaskedArray.partition function. #8669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 28, 2017
Merged

MAINT: Warn users when calling np.ma.MaskedArray.partition function. #8669

merged 1 commit into from
Feb 28, 2017

Conversation

MSeifert04
Copy link
Contributor

@MSeifert04 MSeifert04 commented Feb 22, 2017

Using the np.median function on MaskedArrays uses the not-overriden
partition method of a plain np.ndarray without error or warning. (#7330)
This PR overrides the partition method on MaskedArrays but simply to
throw a Warning. This will make users aware that something ignores the
mask without breaking backwards-compatibility.

@eric-wieser
Copy link
Member

I feel like we could fill in the same way as argsort and sort do/should do here. (see #8664 for how that doen't quite work either).

Also, it feels like this should touch argpartition too.

@MSeifert04
Copy link
Contributor Author

MSeifert04 commented Feb 22, 2017

@eric-wieser I played around a bit and one could make it "partially" working by placing half of the masked items at the beginning of the axis and the other half of the masked items at the end of the axis and partition everything in between. However this approach still gives wrong results if there's an uneven amount of masked items.

So there is no approach that does an exact partition in the general case which is probably why that method hasn't been implemented yet.

Also, it feels like this should touch argpartition too.

You're right!

Should I go forward and fix the failing test and put a Warning in argpartition too? Or should the approach be different? I don't like the Warning much but I've posted the issue (#7330) one year ago and while a correctly working implementation would be much better (if there is one) a Warning at least gives the user some feedback why the result is wrong.

@eric-wieser
Copy link
Member

eric-wieser commented Feb 22, 2017

Yeah, adding the warning seems like a good interim solution. Perhaps it should mention marr.filled(...).partition as a workaround

I played around a bit and one could make it "partially" working by placing half of the masked items at the beginning of the axis and the other half of the masked items at the end of the axis and partition everything in between. However this approach still gives wrong results if there's an uneven amount of masked items.

This also doesn't really work for np.percentile anyway. I think the only sensible solution would be to call .compressed() or force the user to pick a fill value

@eric-wieser eric-wieser added the component: numpy.ma masked arrays label Feb 22, 2017
@charris
Copy link
Member

charris commented Feb 22, 2017

@juliantaylor Comment?

numpy/ma/core.py Outdated
@@ -5640,6 +5640,12 @@ def ptp(self, axis=None, out=None, fill_value=None):
np.subtract(out, min_value, out=out, casting='unsafe')
return out

def partition(self, *args, **kwargs):
warnings.warn("Warning: partition will ignore masked items "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ignore the mask" is different to "ignore masked items". The former is what happens, the latter is what the user expects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@juliantaylor
Copy link
Contributor

would it make more sense to ma np.median masked array aware and just call ma.median? it should have the same feature set by now.
it is a little ugly as unlike sort median is not an array method so it would be a special case for internal consistency but not external subclasses.
The alternative of implementing the partition method in masked arrays and get the already ugly median to work with that, probably possible too but not necessarily worthwhile as probably no external subclasses implement partition anyway, and getting it right to work in median is really hard.

@eric-wieser
Copy link
Member

@juliantaylor: Can we just make median a method?

@MSeifert04
Copy link
Contributor Author

MSeifert04 commented Feb 22, 2017

@juliantaylor @eric-wieser That np.median doesn't work with MaskedArrays is just a symptom of the problem. The main problem is that partition is a public not-overriden not-correctly-working method of MaskedArray. It's bound to surprise someone even if it would happen less often when np.median wouldn't call it.

Perhaps it should mention marr.filled(...).partition as a workaround

I don't think that workaround wouldn't be very helpful when the Warning arises from a np.median call. However, if you want, I'll put it in.

@@ -1039,14 +1039,14 @@ def test_empty(self):

# axis 0 and 1
b = np.ma.masked_array(np.array([], dtype=float, ndmin=2))
assert_equal(np.median(a, axis=0), b)
assert_equal(np.median(a, axis=1), b)
assert_equal(np.ma.median(a, axis=0), b)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliantaylor These np.median calls were introduced in ff4758f. Was it np.median instead of np.ma.median on purpose?

Copy link
Contributor

@juliantaylor juliantaylor Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are mostly copy pastes from median tests, likely forgot to update these. Should be ma.median.

@MSeifert04
Copy link
Contributor Author

MSeifert04 commented Feb 23, 2017

ok, I'm really stuck here. The tests with np.ma.median fail because of some IndexError which indicates that np.ma.median behaves differently than np.median. However using np.median fails because of the Warning I included.

Could someone advise me how to proceed?

The Error:

======================================================================

ERROR: test_empty (test_extras.TestMedian)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/ma/tests/test_extras.py", line 1049, in test_empty

    assert_equal(np.ma.median(a, axis=2), b)

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/ma/extras.py", line 692, in median

    overwrite_input=overwrite_input)

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4004, in _ureduce

    r = func(a, **kwargs)

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/ma/extras.py", line 750, in _median

    low = asorted[tuple(ind)]

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/ma/core.py", line 3171, in __getitem__

    dout = self.data[indx]

IndexError: index -1 is out of bounds for axis 2 with size 0

@juliantaylor
Copy link
Contributor

hm they should behave the same. Probably another bug not found due to the copy paste error in the test.

@MSeifert04
Copy link
Contributor Author

MSeifert04 commented Feb 27, 2017

Is there some easy way to restart the tests (or one test) here? I think #8705 should've fixed the failing test.

@eric-wieser
Copy link
Member

@MSeifert04: You'll need to rebase on master to actually use those changes

…tion.

Using the np.median function on MaskedArrays uses the not-overriden
partition method of a plain np.ndarray without error or warning. (#7330)
This PR overrides the partition method on MaskedArrays but simply to
throw a Warning. This will make users aware that something ignores the
mask without breaking backwards-compatibility. This also applies to
the argpartition method (even if it's not called by np.median).
@MSeifert04
Copy link
Contributor Author

Oh, I thought that it should pick up changes even without rebase. I've rebased the branch and am hoping for green tests.

@eric-wieser
Copy link
Member

@MSeifert04: I get the feeling the test is done in isolation, rather than merged with master, but I could be wrong.

def argpartition(self, *args, **kwargs):
warnings.warn("Warning: 'argpartition' will ignore the 'mask' "
"of the {}.".format(self.__class__.__name__),
stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why level 2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, this is consistent with everything else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was simply copied from another warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python docs seemed to suggest that stacklevel=2 is what you should use in a warning helper function. But yeah, this is better for consistency

@eric-wieser eric-wieser merged commit ee3ab36 into numpy:master Feb 28, 2017
@MSeifert04 MSeifert04 deleted the partition_warning branch February 28, 2017 01:09
@eric-wieser
Copy link
Member

Thanks @MSeifert04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants