-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
MAINT: stats.dirichlet: fix interface inconsistency #16042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to facilitate review.
@@ -1578,6 +1590,128 @@ def rvs(self, size=1, random_state=None): | |||
return self._dist.rvs(self.alpha, size, random_state) | |||
|
|||
|
|||
class multivariate_beta_gen(dirichlet_gen): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste with minimal changes (dirichlet
-> multivariate_beta
as needed)
@@ -637,120 +638,125 @@ def test_moments(self): | |||
N*num_cols,num_rows).T) | |||
assert_allclose(sample_rowcov, U, atol=0.1) | |||
|
|||
class TestDirichlet: | |||
|
|||
class DirichletTest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have two subclasses - one with dist = dirichlet
, the other with dist = multivariate_beta
.
@@ -1230,7 +1231,6 @@ def _dirichlet_check_parameters(alpha): | |||
|
|||
|
|||
def _dirichlet_check_input(alpha, x): | |||
x = np.asarray(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dirichlet
and multivariate_beta
both have a new _check_input
method.
dirichlet
does x = np.asarray(x)
then calls _dirichlet_check_input(alpha, x)
.
multivariate_beta
does x = np.moveaxis(x, -1, 0)
then calls _dirichlet_check_input(alpha, x)
.
That is the only thing that's different between the two distributions, other than documentation.
return multivariate_beta_frozen(alpha, seed=seed) | ||
|
||
def _check_input(self, alpha, x): | ||
x = np.moveaxis(x, -1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably could have just done transpose. I didn't realize that dirichlet
was only written for 2D x
.
Hey, sorry I haven't gotten around to this yet. I'm in the middle of preparing a move, and am on low availability until ~middle of May. I'll try to take a look when I can, but no promises, unfortunately... 🙈 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only small nitpicks. Otherwise LGTM.
One question: Unlike univariate distributions, we are not constrained by the infrastructure to add new parameters to the pdf and logpdf methods. Instead of deprecating the whole distribution, we can alternatively add a new keyword to the pdf and logpdf method (e.g. transpose
) and default it to True. We can then emit a deprecation warning when it is true and default it to false in 1.11.0. To me, this sounds simpler than adding a new distribution. Hove you considered doing this?
I don't remember! I think I got carried away after gh-15889 (which also might interest you) and maybe didn't stop to think. Good idea. On the other hand, we'd have the unfortunate choice of getting stuck with a Would you like to submit a PR for that, and if it's merged, we'd close this? |
Actually this needs an email to the mailing list. I'll send one with both options and let the default be to merge this one if there are no comments in favor of the other? Might as well since the work is done. |
Co-authored-by: Tirth Patel <[email protected]>
Email sent 5/16/2022. |
OK, if that commit resolved the PEP8 issues, is this ready to merge after giving some time for people to respond to the email? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Added just one small comment. We can wait till Sunday for feedback on the mail. If there is none, this should be good to go in!
+1 -0 from the mailing list, @tirthasheshpatel. Would you like to follow up with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between the PR review and mailing list comments, I'll take this as +2 core developers in favor and CI is passing apart from a timeout.
I scanned through the diff and it looked well-done from a mechanical (non-stats-expert) point of view, but I'm mostly leaning on the current code review/feedback.
thanks |
Thanks @mdhaber!
Sure, I can propose a PR for |
I just picked this up in SymPy CI (sympy/sympy#23513). I went to follow the instruction to use
What should be the expected code to use if wanting to suppress the warning while supporting multiple SciPy versions? Something like this? from scipy import stats
dirichlet = getattr(stats, 'multivariate_beta', None) or stats.dirichlet And then in the future when we don't need to worry about old SciPy versions should we change the code again to just use Generally I think that if there does not already exist an undeprecated alternative API then it is better to have a period of "soft deprecation" before emitting warnings or making any breaking change. By soft deprecation I mean something like:
It's not completely clear to me what is different between If we were using those methods then presumably we would need to change something else in the code rather than simply replacing
It's important when writing something like this to consider that the person (e.g. maintainer of large codebase) trying to fix the downstream code might know very little about the API and what it is used for and really needs clear instructions for what to do. On the other hand SymPy doesn't use the |
Yes, that, or
That sounds like a project level decision. I don't think I've seen that used before.
For now, only There is a separate change that may be made to RVS soon to fix an old bug. |
@tirthasheshpatel Thanks for offering to add You are also slated to address gh-7689, which would fix the shape of the |
I will submit a PR for |
Let's do separate ones. (I think the first of those is done here, though.) Thank you! |
@tirthasheshpatel Actually, I would prioritize fixing the |
Reference issue
Closes gh-6006
gh-4984
What does this implement/fix?
The
dirichlet
distribution interface is inconsistent with other multivariate distributions and even itself: thepdf
method expects the transpose of what thervs
method produces. This PR introducesmultivariate_beta
, which is the same distribution without this inconsistency.The plan is to deprecate
dirichlet
in favor ofmultivariate_beta
. (@h-vetinari can you help with this, either by making a PR against my branch or a follow-up to this one? I can review it.) After the deprecation cycle, we can makedirichlet
an alias formultivariate_beta
if desired.Additional information
There are many other things that could be improved about the distribution and its documentation (e.g.
pdf
inputx
can only be 2D). This PR does not fix all of them, but it does fix a defect reported as early as gh-4984. Let's get this messy stuff out of the way, then those other things can be cleaned up in future PRs.The three commits are pretty clean. I'd suggest reviewing them separately and in order.
LMK if I should address gh-6474 the same sort of way... maybe
inv_wishart
?