-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: stats: add alternative
to masked normality tests
#13960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`skewtest` and `kurtosistest` were missing an `alternative` parameter in their masked version. It has been added and tested now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, but please review what we did in gh-13549 and apply some of that here. For instance, better document the meaning of the less
vs greater
and rely on _normtest_finish
to raise the error for an invalid alternative
argument.
Hi, @mdhaber. Thanks for the review!
Ah, I believe I didn't document the masked versions as its documentation points to the
On an unrelated note, relying on |
I do think that the definitions should appear here; we should not just refer the user to For implementing this, it would be nice to define the text (e.g. as a variable) in one place and use it in many places, but for now let's copy-paste. There are a lot of other places where we share text that could be cleaned up at the same time (if we decide to go that route at some point).
I agree it's not ideal, but there is the same problem in the |
* copy-paste docs explaining `less` and `greater` to `mstats` version. * add test with masked arrays in `test_mstats_basic`
Got it. copy-pasted the docs in the latest commit.
Makes sense, done.
👍 |
@@ -964,7 +964,7 @@ def regression_test_9033(self): | |||
@pytest.mark.parametrize("test", ["skewtest", "kurtosistest"]) | |||
@pytest.mark.parametrize("alternative", ["less", "greater"]) | |||
def test_alternative(self, test, alternative): | |||
x = stats.norm.rvs(loc=10, scale=2.5, size=20, random_state=123) | |||
x = stats.norm.rvs(loc=10, scale=2.5, size=30, random_state=123) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine - as long as it wasn't necessary to make the test pass...? What was the motivation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. It wasn't done to make the test pass but because skewtest
requires at least 20 samples. As I add nans to some samples, I had to increase the sample size.
|
||
Returns | ||
------- | ||
statistic : float | ||
statistic : array_like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More accurately, these would be "scalar or ndarray". (Could be scalar, and if it's array-like, it's going to be an ndarray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More accurately, these would be "scalar or ndarray". (Could be scalar, and if it's array-like, it's going to be an ndarray)
I think array_like
includes both scalars and array outputs. So, I thought it would be better to change it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if array_like
is formally defined anywhere, so I guess we can use it to mean whatever we want. But we could be more specific as we push toward better documentation. (Not required for this PR, though. Not really worth another CI run IMO.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if
array_like
is formally defined anywhere, so I guess we can use it to mean whatever we want. But we could be more specific as we push toward better documentation. (Not required for this PR, though.)
I think NumPy does define array_like
in https://numpydoc.readthedocs.io/en/latest/format.html#other-points-to-keep-in-mind. See gh-13621.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice. Thanks for pointing that out. Note that it only specifically says it's for documenting arguments, though. We are flexible about arguments, but we know what the return types will be.
But even something as fundamental as e.g. np.mean
doesn't document this perfectly. It says that the output type will be ndarray
but as we know:
isinstance(np.mean([[1, 2, 3]], np.ndarray) # True
isinstance(np.mean([1, 2, 3]), np.ndarray) # False
I'll go ahead and merge this as-is, if it sounds good to you?
We can consider adding this to the huge list of things we'd like the documentation to be more consistent about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it sounds good to you?
Yep, everything sounds good to me!
This PR adds support for an import numpy as np
from scipy import stats
from scipy.stats import mstats
np.random.seed(0)
for test_name in {"kurtosistest", "skewtest"}:
for alternative in {'less', 'greater', 'two-sided'}:
for i in range(100):
sample = stats.norm.rvs(size=np.random.randint(30, i+40))
p = np.random.rand()
mask = np.random.rand(*sample.shape) > (0.5 + p/2)
test = getattr(stats, test_name)
mtest = getattr(mstats, test_name)
compressed_sample = sample[~mask]
nan_sample = sample.copy()
nan_sample[mask] = np.nan
masked_sample = np.ma.masked_array(sample, mask=mask)
res1 = test(compressed_sample)
res2 = test(nan_sample, nan_policy='omit')
res3 = mtest(masked_sample)
np.testing.assert_allclose(res2, res1)
np.testing.assert_allclose(res3, res1) Tests are fine (considering the existing tests of Thanks @tirthasheshpatel! |
Reference issue
Addresses gh-12506.
gh-13549 added the alternative parameter to some normality tests but forgot to add it to the masked version. This is a continuation of that work.
What does this implement/fix?
skewtest
andkurtosistest
were missing analternative
parameterin their masked version. It has been added and tested now.
Additional information
N/A