-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
MAINT: stats.make_distribution
: support more existing distributions
#22040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stats.make_distribution
: support more existing distributions
Known failures include 'genpareto', 'genextreme', 'genhalflogistic', | ||
'irwinhall', 'kstwo', 'kappa4', 'levy_stable', 'norminvgauss', | ||
'tukeylambda', and `vonmises`. | ||
Known failures include 'levy_stable' and `vonmises`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Must have copy-posted from a list that had quotes, originally.
I think I'll raise an error if the distribution is one of the failing two. Also, should probably check whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commentary to aid the reviewer.
@@ -5717,6 +5717,10 @@ def _cdf(self, x, a, b): | |||
y = (1 + x / np.sqrt(a + b + x ** 2)) * 0.5 | |||
return sc.betainc(a, b, y) | |||
|
|||
def _sf(self, x, a, b): | |||
y = (1 + x / np.sqrt(a + b + x ** 2)) * 0.5 | |||
return sc.betaincc(a, b, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix for
skip_logccdf = {'jf_skew_t', # check this out later
below. logccdf
tries logexp
if ccdf
is defined and complement
if logcdf
is defined, but we have neither, so it falls back to log-quadrature rather than doing a complement and logarithm. But logpdf
isn't defined either, so that doesn't do super well. Now, _sf
is defined, so logexp
works fine. Might be worth seeing if adding logpdf
helps quadrature, but that doesn't need to be done here.jf_skew_t.pdf
returns erroneous results for sufficiently negative x
.
stats.jf_skew_t(8, 4).pdf(-1e10) # np.float64(0.0)
stats.jf_skew_t(8, 4).pdf(-1e100) # np.float64(0.0)
stats.jf_skew_t(8, 4).pdf(-1e200) # np.float64(0.186060145344313)
stats.jf_skew_t(8, 4).pdf(-1e300) # np.float64(0.186060145344313)
A simple fix is to replace np.sqrt(a + b + x ** 2)
with abs(x)
for abs(x)
sufficiently large.
@@ -9406,6 +9410,7 @@ def _munp(self, order, n): | |||
# see https://link.springer.com/content/pdf/10.1007/s10959-020-01050-9.pdf | |||
# page 640, with m=n, j=n+order | |||
def vmunp(order, n): | |||
n = np.asarray(n, dtype=np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the main problem with irwinhall
.
@@ -10777,7 +10782,7 @@ def cond_b(loc): | |||
|
|||
|
|||
truncpareto = truncpareto_gen(a=1.0, name='truncpareto') | |||
truncpareto._support = (0.0, 'c') | |||
truncpareto._support = (1.0, 'c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just wrong. See docs.
fields = set(NumpyDocString.sections) | ||
fields.remove('index') | ||
if not include_examples: | ||
fields.remove('Examples') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When generating docs (for help
) for old distributions, we can't easily generate examples because the code for doing so doesn't know to get the parameter values from _distparams.py
. That could be added as an enhancement later.
def _moment_raw_formula(self, n, **kwargs): | ||
return dist._munp(int(n), **kwargs) | ||
def _moment_raw_formula(self, order, **kwargs): | ||
return dist._munp(int(order), **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been order
like everwhere else. This also gave irwinhall
some trouble because n
is a shape parameter.
Currently, irwinhall
doesn't work with order_statistic
because of the name conflict. We might want to rename the shapes of OrderStatisticDistribution
. This does not necessarily require renaming the inputs to order_statistic
, but it would probably be easiest to choose names that are unlikely to conflict with distribution parameters. Alternatively, we can let this be and just address it when we address the name conflict issue in general.
@@ -3572,7 +3679,12 @@ def _moment_standard_formula(self, order, **kwargs): | |||
'_entropy': '_entropy_formula', | |||
'_median': '_median_formula'} | |||
|
|||
# These are not desirable overrides for the new infrastructure | |||
skip_override = {'norminvgauss': {'_sf', '_isf'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were just generic algorithms (which could have just as well been put at the rv_continuous
level) applied to norminvgauss
. They weren't playing nicely with the way we pass data around, so might as well just avoid them because we have our own (better) generic algorithms. In fact, for performanc, it might be worth adding additional distributions/methods to this dict where there is something similar.
'levy_stable', # levy_stable does things differently... | ||
'ksone', # tolerance issues | ||
'norminvgauss', # private methods seem to have broadcasting issues | ||
'levy_stable', # private methods seem to require >= 1d args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let someone else handle this. I tried a little, but I didn't want to get sucked into an endless game of whack-a-mole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Looks ready to go in.
…scipy#22040) * ENH: stats.make_distribution: support irwinhall * ENH: stats.make_distribution: support norminvgauss * ENH: stats.make_distribution: support ksone/studentized_range * TST: stats.make_distribution: test support; fix truncpareto support * ENH: stats.make_distribution: support distributions with _get_support override * ENH: stats.jf_skew_t: add sf override * MAINT: stats.make_distribution: make help work * MAINT: stats.make_distribution: improve input validation * DOC: stats.make_distribution: improve repr/docs of returned class * MAINT: stats.make_distribution: refinements * STY: stats.make_distribution: fix lint failure [lint only]
Reference issue
gh-21707
What does this implement/fix?
When adding
stats.make_distribution
, there were 10 old distributions that didn't seem to play nicely with the new infrastructure, and a few more that we omitted from the tests for one reason or another. This makes the little adjustments needed formake_distribution
to support all but two distributions:levy_stable
because it isn't obvious what the problem isvonmises
because neither the old nor new infrastructure support circular distributions (yet)