-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
BUG/ENH: Removed non-standard scaling of the covariance matrix and added option to disable scaling completely. #11197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! Needs a mention in Also tests are missing for the new option. It would be nice to have a "demonstration of desired behavior" type of test that simply demonstrates the power of the new option, as well as a test for any new error modes. For instance, what happens if Since we are changing default behavior, a heads-up to the numpy-discussion mailing list with a link to this commit is also necessary. |
Hi Matti, thanks for the welcome. I changed the release notes and added some notes in the sections "changes" and "improvements". Best, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, definitely a good idea, but I think the name should reflect what is actually done more closely.
Also, definitely include the test!
numpy/lib/polynomial.py
Outdated
@@ -423,6 +424,19 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False): | |||
cov : bool, optional | |||
Return the estimate and the covariance matrix of the estimate | |||
If full is True, then cov is not returned. | |||
absolute_weights: bool, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it amount to bike-shedding, but I find this name confusing, since I've never encountered this term. If we stick close to this, I'd very much prefer a relative_weigths=True
since I think that more clearly indicates that there is something weird about the weights.
But really what this does is forcing the reduced chi2 to unity, so maybe that is what the parameter name should reflect? Indeed, in the actual code, the weights are not used at all. Now force_redchi2_to_unity
is a bit long... Maybe rescale_covariance=True
? Or just cov_scale
or scale_cov
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to explain the current choice of absolute_weights
: It was suggested by @josef-pkt on the mailing list. It is the analogue to scipy.optimize.curve_fit
's absolute_sigma
parameter. Its name was decided on in the lengthy discussion within this PR, which was continued in this PR. At least I somewhat like the analogy to the curve_fit
terminology but I don't know whether this really is a valid argument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the comments in the first thread do argue specifically against scale_cov
is bad ... I do think a bit of a mistake was made in scipy
to not call it relative_sigma
, but on the other hand there is an advantage for newly introduced flags to be False
for the default of "old behaviour".
Let me try another suggestion, though: unlike scipy's curve_fit
, right now we already have a flag to ask for the covariance matrix. Could we not broaden its purpose instead to also tell what type we want? If falsy, we do not return it as now, and if truthy, we do return it, but exactly what we return will depend on its value. Specifically, I suggest,
cov : bool or str, optional
If given and not `False`, return not just the estimate but also its covariance matrix.
By default, the covariance are scaled by chi2/sqrt(N-dof), i.e., the weights are presumed
to be unreliable except in a relative sense and everything is scaled such that the reduced
chi2 is unity. This scaling is omitted is ``cov='unscaled'``, as is relevant for the case that
the weights are 1/sigma**2, with sigma known to be a reliable estimate of the uncertainty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. In the docstring proper, be careful with single back quotes - with those, there should be an actual link target, i.e., something like False
works because it links to the python API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this suggestion of using cov
to indicate whether or not scaling should be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I like the simple relative_weight
better. Like @jotasi already mentioned, it behaves like absolute_sigma
in curve_fit
. Still, if you insist, I can implement your proposal. In this case it would be nice, if you could point me to another function with a similar parameter, so I can have a look what type of parameter check is performed.
numpy/lib/polynomial.py
Outdated
@@ -552,6 +566,8 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False): | |||
raise TypeError("expected 1D or 2D array for y") | |||
if x.shape[0] != y.shape[0]: | |||
raise TypeError("expected x and y to have same length") | |||
if absolute_weights and (w is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this is necessary: the rescaling could be done or omitted (as is arguably meaningful) independent of whether weights are present. I'd remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't omitting the rescaling without specifying weights effectively mean that all points' standard deviations are considered 1, or did I misunderstand that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for normal distributions that would be the case. But mostly I see no reason to force a user to pass on w=np.ones(y.shape[0])
when the flag is set. The default is not really no weights, but weight equals 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of relative weights (scaling), having all weights set to one means that the error on all data points is of equal magnitude. In the new case (absolute weights, no scaling), I'm not sure if this is any sensible default. Why would a data point have the error sigma==1? I guess, that would be just by coincidence or in special cases where you draw from a distribution with known width (like your unit test example)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wummo - it indeed implies sigma=1
- which I agree is not necessarily all that meaningful, but I don't see a reason to specifically forbid someone from entering it - it just makes the code longer and more complex for no benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, here is my last argument: Isn't sigma==1 a detail of the implementation that might (maybe) change? Then, giving no weights is something like "undefined behavior".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it isn't ;-) After all, this is a possibly weighted least-squares fit, not a chi2 one, and the meaning is clear without the weights (the meaning of the covariance admittedly less so, but I don't think one has to hand-hold people that much).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the test is useful, but having very little experience with numpy development/guidlines, I followed your hints and removed the check and the corresponding unit test.
numpy/lib/polynomial.py
Outdated
else: | ||
if len(x) <= order: | ||
raise ValueError("the number of data points must exceed order " | ||
"for estimate the covariance matrix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is not correct any more, it only needs to be true if rescaling is done. Maybe just replace "for estimate" (weird grammar anyway) with "to scale"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still to be done: "for estimate" -> "to scale" (or "in order to be able to scale"
numpy/lib/tests/test_polynomial.py
Outdated
[0, 1, 3], [0, 1, 3], deg=0, cov=True) | ||
[1], [1], deg=0, cov=True) | ||
|
||
# Check exception when option absolute_weights is True, but no weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to be removed again...
numpy/lib/polynomial.py
Outdated
"for Bayesian estimate the covariance matrix") | ||
fac = resids / (len(x) - order - 2.0) | ||
if absolute_weights: | ||
fac = 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to just use 1
here - it cooperates better with Decimal
, if that ever becomes supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I changed this.
"for estimate the covariance matrix") | ||
# note, this used to be: fac = resids / (len(x) - order - 2.0) | ||
# it was deciced that the "- 2" (originally justified by "Bayesian | ||
# uncertainty analysis") is not was the user expects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "(see gh- and gh-11197)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, except for the two remaining issues:
|
@wummo - this function is a bit of a mess already... And IIRC it is in fact recommended to use |
One argument in favor of rolling it into a trivariate (I find it annoying in some cases in statsmodels, where we have one flag to switch away from the default and another flag to choose an option for the alternative. It's easy to forget to switch the first keyword, and I often have to correct my initial code to fix it.) |
7e86c15
to
977a38a
Compare
977a38a
to
3250479
Compare
I rebased the code to 1.16 to fix the merge conflicts. Is there anything I can do to move this pull request forward? |
3250479
to
39efaf4
Compare
I just became aware of this issue in Also, since @josef-pkt expressed confusion about why a user would want this feature in mailing list discussion, it's worth noting that it is common practice in physics (my field) to determine the measurement sample variance independently, then treat it as known when fitting a model to data from the same apparatus. Introductory textbooks typically focus on this case. |
Maybe this would be more palatable as two PRs - one to remove the non-standard |
Sorry that this has slipped so far. I'd still like the opinion of @charris, because it would be good to move this to the polynomial classes. Absent that, I'm happy to merge the |
@mhvk Do you think it makes sense to wait any longer, given that we already waited 1/2 year? If you decide that a string for the type of covariance is the better interface, I will implement this and the PR can be merged. |
@wummo - fair enough. Yes, please do the string interface and we'll merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wummo - thanks for making the changes. Only some small left-overs...
doc/release/1.16.0-notes.rst
Outdated
@@ -239,6 +239,15 @@ single elementary function for four related but different signatures, | |||
The ``out`` argument to these functions is now always tested for memory overlap | |||
to avoid corrupted results when memory overlap occurs. | |||
|
|||
New option ``absolute_weights'' in ``np.polyfit'' | |||
------------------------------------------------- | |||
Like ``absolute_sigma'' in ``scipy.optimize.curve_fit`` a boolean option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to change the release notes as well...
weights are given by 1/sigma with sigma being the (known) standard errors of | ||
(Gaussian distributed) data points, in which case the unscaled matrix is already | ||
a correct estimate for the covariance matrix. In case ``absolute_weights'' is set | ||
to true, but no weights are given, a ``ValueError'' is thrown. | ||
Detailed docstrings for scalar numeric types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the rebase has removed the empty line that should be here.
covariance matrix. Namely, rather than using the standard chisq/(M-N), it | ||
scales it with chisq/(M-N-2) where M is the number of data points and N is the | ||
number of parameters. This scaling is inconsistent with other fitting programs | ||
such as e.g. ``scipy.optimize.curve_fit`` and was changed to chisq/(M-N). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another empty line to be added back in.
except in a relative sense and everything is scaled such that the | ||
reduced chi2 is unity. This scaling is omitted if ``cov='unscaled'``, | ||
as is relevant for the case that the weights are 1/sigma**2, with | ||
sigma known to be a reliable estimate of the uncertainty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clear, thanks!
numpy/lib/polynomial.py
Outdated
else: | ||
if len(x) <= order: | ||
raise ValueError("the number of data points must exceed order " | ||
"for estimate the covariance matrix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still to be done: "for estimate" -> "to scale" (or "in order to be able to scale"
"for estimate the covariance matrix") | ||
# note, this used to be: fac = resids / (len(x) - order - 2.0) | ||
# it was deciced that the "- 2" (originally justified by "Bayesian | ||
# uncertainty analysis") is not was the user expects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. While making the last changes, could you also rebase & squash the commits? Thanks again, and apologies that this has all taken so long. |
I've definitely considered adding a covariance computation to the polynomial package fitting functions, I've done it for myself in practice. I agree that For |
@mhvk I fixed the problems with the documentation. If everything looks OK, I will do the rebase & squash. |
doc/release/1.16.0-notes.rst
Outdated
----------------------------------------------------------- | ||
|
||
A further possible value has been added to the ``cov'' parameter of the | ||
``np.polyfit`` function. With ``cov=unscaled`` the scaling of the covariance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last small thing: missing quotes around unscaled, i.e., cov='unscaled'
Looks good modulo the missing quotes. Please go ahead and rebase/squash as well, and I'll merge. |
…n to disable scaling completely.
632802a
to
1837df7
Compare
@mhvk I did the rebase and squashing and just wanted to ask whether there is anything more that needs to be done. |
@wummo - I hadn't seen that the branch was pushed - now all is OK so I'll merge. Thanks for the contribution and more thanks for your patience! |
Its great that it got merge. Thanks a lot for your help. |
Fixes #11196
As discussed in the bug report polyfit uses a non-standard scaling factor for the covariance matrix, this is corrected.
Furthermore, an option is added to disable the scaling of the covariance matrix completely. It would be useful in occasions, where the weights are given by 1/sigma with sigma being the (known) standard errors of (Gaussian distributed) data points, in which case the unscaled matrix is already a correct estimate for the covariance matrix.