-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Some minor fixes for stats.wishart addition. #4313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c65c8e6
to
f93e8a3
Compare
Might also want to mention it in the release notes already. |
good point, done. |
ff92bb8
to
0763413
Compare
… logdet(x) These are subtly different: In [46]: 2 * np.sum(np.log(linalg.cholesky(scale, lower=True).diagonal())) ... - np.log(np.linalg.det(scale)) Out[46]: 8.8817841970012523e-16 Use of ``np.linalg.slogdet`` is preferred. Related change in scipygh-4313.
Added a couple more bug fixes. |
The commit rgommers@e3150fe may be a performance regression. It adds extra |
It's indeed slower:
I don't think 30 us for something that is usually not called in inner loops is important though - I much prefer the maintainability/readability of |
I agree that 30 microseconds wouldn't be important except in an inner loop. In addition to the nitpick that the slowdown would be more like 3x than 2x because the Cholesky decomposition is computed anyway, I'd vote for log likelihood functions as candidates in the "most likely to be used in an inner loop" competition. Regarding the largeness of |
FWIW, I'm also in favor of reusing the Cholesky decomp here since it has been computed anyway. The readability cost is not too big IMO, and there is a cognitive burden of both "what is slogdet", and "why does it not use Cholesky, is there a reason for it" |
In principle, both maximum likelihood models and Metropolis-Hastings Bayesian computation of models fitting a multivariate normal may require a potentially large number of I haven't personally used such a model (the models I am using take advantage of Gibbs Sampling methods, which requires lots of calls instead to On the other hand, given that the Cholesky is already being computed, the determinant calculation is essentially a constant (or grows very slowly) with respect to the size of scale, whereas Thanks to both of you for your work getting this merged. |
OK it seems you all agree. Then let's fix it by removing |
Fix docstring formatting and use np.linalg.slogdet where indicated by TODOs.
… funcs. This raises DeprecationWarning's with recent numpy; should clearly be ints.
… logdet(x) These are subtly different: In [46]: 2 * np.sum(np.log(linalg.cholesky(scale, lower=True).diagonal())) ... - np.log(np.linalg.det(scale)) Out[46]: 8.8817841970012523e-16 Use of ``np.linalg.slogdet`` is preferred for readability/maintainability.
…gdet. This is faster, see discussion on scipygh-4313. Using it everywhere fixes the discrepancy between the normal and frozen distribution.
0763413
to
5047c4c
Compare
OK updated. I think it would be useful at some point to either speed up |
Also make all plots render properly.
Looks good to me.
This speedup applies only to positive semidefinite matrices (and given the lack of singular Cholesky in numpy/scipy, only to positive definite matrices in practice). Stepping back, the larger issue is that although numpy and scipy are amazingly good at n-dimensional array hacking, interfacing with BLAS/LAPACK, and automatically dealing with dtypes and dispatching to the correct underlying functions (e.g. [sdcz]gemm), they have absolutely no dispatch scheme for matrix structure except for ad-hoc python function name differences or possibly one-off keyword arguments. In other words, if LAPACK functions are named like |
@argriffing thanks for the details, and interesting perspective on matrix structures. |
I guess this PR would conflict with #4318, I'm not sure the best way to deal with that. |
I'd merge this one first; it's good to go and the other one is easy to rebase. |
>>> plt.contourf(x, y, rv.pdf(pos)) | ||
>>> fig2 = plt.figure() | ||
>>> ax2 = fig2.add_subplot(111) | ||
>>> ax2.contourf(x, y, rv.pdf(pos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to add plt.show()
, so that the plot actually renders in the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a plot in the HTML docs without including plt.show()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. What is the complete (correct) way of building the HTML docs you're using, what version of sphinx etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use make html
in the doc
directory. The numpydoc extension in the numpy source code (i.e. numpy/doc/sphinxext
) is in my python path. Sphinx version is 1.2.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do the same, make html
is the easiest. I don't have numpy/doc/sphinxext
in my path but have simply installed numpydoc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plt.show
indeed isn't necessary on my local build, but for the ones on docs.scipy.org it does seem to be. May depend on the version of the plot-directive extension used.
plt.show()
is used in most docstring examples, I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do make html-scipyorg
and I do seem to need plt.plot()
. Not sure what is different between the two.
This is surely a rather minor point, but all else being equal I'd rather have it implicit in docstrings (ditto for import matplotlib.pyplot as plt
).
In any case, it's OT for this PR; my original comment is moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just checked and for all multivariate distributions adding plt.show()
breaks generating the plots. So I suggest to leave it as is in this PR, and open a new issue for reconciling the different versions of building the docs (and/or a check on minimum Sphinx and numpydoc versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Merging, thanks Ralf, all. |
Some minor fixes for stats.wishart addition.
The issue for tracking the docs build: #4346 |
np.linalg.slogdet
where appropriate