Skip to content

MAINT: stats.make_distribution: support more existing distributions #22040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 10, 2024

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Dec 9, 2024

Reference issue

gh-21707

What does this implement/fix?

When adding stats.make_distribution, there were 10 old distributions that didn't seem to play nicely with the new infrastructure, and a few more that we omitted from the tests for one reason or another. This makes the little adjustments needed for make_distribution to support all but two distributions:

  • levy_stable because it isn't obvious what the problem is
  • vonmises because neither the old nor new infrastructure support circular distributions (yet)

@mdhaber mdhaber added scipy.stats maintenance Items related to regular maintenance tasks backport-candidate This fix should be ported by a maintainer to previous SciPy versions. labels Dec 9, 2024
@mdhaber mdhaber added this to the 1.15.0 milestone Dec 9, 2024
@mdhaber mdhaber requested a review from steppi December 9, 2024 19:05
@mdhaber mdhaber requested a review from ev-br as a code owner December 9, 2024 19:05
@lucascolley lucascolley changed the title MAINT: stats.make_distribution: support more existing distributions MAINT: stats.make_distribution: support more existing distributions Dec 9, 2024
Known failures include 'genpareto', 'genextreme', 'genhalflogistic',
'irwinhall', 'kstwo', 'kappa4', 'levy_stable', 'norminvgauss',
'tukeylambda', and `vonmises`.
Known failures include 'levy_stable' and `vonmises`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks

image

Copy link
Contributor Author

@mdhaber mdhaber Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Must have copy-posted from a list that had quotes, originally.

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 9, 2024

I think I'll raise an error if the distribution is one of the failing two. Also, should probably check whether help works with the distributions and point to Normal for rendered docs. Update: done.

Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some commentary to aid the reviewer.

@@ -5717,6 +5717,10 @@ def _cdf(self, x, a, b):
y = (1 + x / np.sqrt(a + b + x ** 2)) * 0.5
return sc.betainc(a, b, y)

def _sf(self, x, a, b):
y = (1 + x / np.sqrt(a + b + x ** 2)) * 0.5
return sc.betaincc(a, b, y)
Copy link
Contributor Author

@mdhaber mdhaber Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix for

 skip_logccdf = {'jf_skew_t', # check this out later

below. logccdf tries logexp if ccdf is defined and complement if logcdf is defined, but we have neither, so it falls back to log-quadrature rather than doing a complement and logarithm. But logpdf isn't defined either, so that doesn't do super well. Now, _sf is defined, so logexp works fine. Might be worth seeing if adding logpdf helps quadrature, but that doesn't need to be done here. jf_skew_t.pdf returns erroneous results for sufficiently negative x.

stats.jf_skew_t(8, 4).pdf(-1e10)  # np.float64(0.0)
stats.jf_skew_t(8, 4).pdf(-1e100)  # np.float64(0.0)
stats.jf_skew_t(8, 4).pdf(-1e200)  # np.float64(0.186060145344313)
stats.jf_skew_t(8, 4).pdf(-1e300)  # np.float64(0.186060145344313)

A simple fix is to replace np.sqrt(a + b + x ** 2) with abs(x) for abs(x) sufficiently large.

@@ -9406,6 +9410,7 @@ def _munp(self, order, n):
# see https://link.springer.com/content/pdf/10.1007/s10959-020-01050-9.pdf
# page 640, with m=n, j=n+order
def vmunp(order, n):
n = np.asarray(n, dtype=np.int64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the main problem with irwinhall.

@@ -10777,7 +10782,7 @@ def cond_b(loc):


truncpareto = truncpareto_gen(a=1.0, name='truncpareto')
truncpareto._support = (0.0, 'c')
truncpareto._support = (1.0, 'c')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just wrong. See docs.
image

fields = set(NumpyDocString.sections)
fields.remove('index')
if not include_examples:
fields.remove('Examples')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generating docs (for help) for old distributions, we can't easily generate examples because the code for doing so doesn't know to get the parameter values from _distparams.py. That could be added as an enhancement later.

def _moment_raw_formula(self, n, **kwargs):
return dist._munp(int(n), **kwargs)
def _moment_raw_formula(self, order, **kwargs):
return dist._munp(int(order), **kwargs)
Copy link
Contributor Author

@mdhaber mdhaber Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been order like everwhere else. This also gave irwinhall some trouble because n is a shape parameter.

Currently, irwinhall doesn't work with order_statistic because of the name conflict. We might want to rename the shapes of OrderStatisticDistribution. This does not necessarily require renaming the inputs to order_statistic, but it would probably be easiest to choose names that are unlikely to conflict with distribution parameters. Alternatively, we can let this be and just address it when we address the name conflict issue in general.

@@ -3572,7 +3679,12 @@ def _moment_standard_formula(self, order, **kwargs):
'_entropy': '_entropy_formula',
'_median': '_median_formula'}

# These are not desirable overrides for the new infrastructure
skip_override = {'norminvgauss': {'_sf', '_isf'}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were just generic algorithms (which could have just as well been put at the rv_continuous level) applied to norminvgauss. They weren't playing nicely with the way we pass data around, so might as well just avoid them because we have our own (better) generic algorithms. In fact, for performanc, it might be worth adding additional distributions/methods to this dict where there is something similar.

'levy_stable', # levy_stable does things differently...
'ksone', # tolerance issues
'norminvgauss', # private methods seem to have broadcasting issues
'levy_stable', # private methods seem to require >= 1d args
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let someone else handle this. I tried a little, but I didn't want to get sucked into an endless game of whack-a-mole.

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Looks ready to go in.

@steppi steppi merged commit 2ef9574 into scipy:main Dec 10, 2024
33 of 34 checks passed
tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request Dec 13, 2024
…scipy#22040)

* ENH: stats.make_distribution: support irwinhall

* ENH: stats.make_distribution: support norminvgauss

* ENH: stats.make_distribution: support ksone/studentized_range

* TST: stats.make_distribution: test support; fix truncpareto support

* ENH: stats.make_distribution: support distributions with _get_support override

* ENH: stats.jf_skew_t: add sf override

* MAINT: stats.make_distribution: make help work

* MAINT: stats.make_distribution: improve input validation

* DOC: stats.make_distribution: improve repr/docs of returned class

* MAINT: stats.make_distribution: refinements

* STY: stats.make_distribution: fix lint failure

[lint only]
@tylerjereddy tylerjereddy mentioned this pull request Dec 13, 2024
5 tasks
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants