-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Fixes multiple bugs in mstats.theilslopes #3574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changes Unknown when pulling 8010f14 on tomflannaghan:theilslopes-fix into * on scipy:master*. |
Hi @tomflannaghan, thanks for fixing this up. Looks good from a quick browse, but I haven't had time to check the original paper yet. Adding the reference to that paper to the docstring might be useful. And bonus points for a small example of course:) The implementation was hardly touched after it was introduced in 9f05b93. @pierregm was this code indeed based on the Sen paper, and does this look OK to you? |
Also, I think this should be in the |
On Tue, Apr 29, 2014 at 5:30 PM, Ralf Gommers [email protected]:
+1 FWIW, there are several implementations of Theil-Sen estimators in R (and |
Hi @rgommers and @jseabold, thanks for the feedback! I agree that it should be in I have also added back the handling of ties in |
Add complete references, reST formatting, PEP8, etc.
@tomflannaghan I have a branch here with some minor fixes and an updated example: https://github.com/rgommers/scipy/tree/theilslopes-fix. Somehow github doesn't let me send you a PR, but maybe you can cherry-pick those commits? The example does show an issue - the confidence interval is only for slope and not for offset, which makes the plots look quite weird. Do you think it makes sense to fix this? Finally, I don't have access to the original paper, so didn't review the algorithm changes in detail. If you know of a public version maybe you can direct me to it? |
Changes Unknown when pulling a38b7d2 on tomflannaghan:theilslopes-fix into * on scipy:master*. |
@rgommers I've merged your commits with my branch. I'm not sure why github didn't allow you to send me a PR but I'm fairly new to it so maybe I messed something up. I'm happy with your changes and just merged them directly into this PR. The confidence interval for the intercept would probably be useful. However, I am not sure how to implement it properly. The Theil-Sen method only gives the slope, and there are various different methods to compute the intercept, let alone it's confidence interval. I looked at an equivalent R package (zyp) and their confidence interval calculation doesn't agree at all well with a simple bootstrap test that I performed on your example. They compute it using the standard deviation of all possible intercepts, which is not robust to outliers and doesn't fit with the distribution-free philosophy of the rest of the method. So, in summary, I think not including it is probably better than including something potentially misleading. I suspect most people would bootstrap to get the confidence interval for the intercept if they really needed it (which is very computationally expensive). As for access to the paper, sadly I don't know any way to get it publicly. I can email you a copy though. |
If you could send me the paper at After reading the stackexchange discussion you link to above, I get the impression that indeed a confidence interval on the intercept doesn't help. Maybe it's best to explain the issue with the intercept value (not well-defined), and that what the example does (which to me looked like the simplest way to use the slope confidence interval) makes little sense. |
I have changed the wording of the example a bit to reflect the fact that the intercept confidence interval is missing. I've also added a note explaining that the intercept is not defined by Sen, and that there are several definitions that are used. |
Fixes multiple bugs in mstats.theilslopes
All looks good to me, merged. Thanks Tom! |
Thanks Ralf! |
This fixes multiple bugs in
scipy.stats.mstats.theilslopes
. I believe that the original code is implementing the method given in Sen (1968) for the slope and confidence intervals (this is the standard method), and I have therefore corrected the implementation to follow Sen (1968) faithfully. I have also added a test that highlights some of the failures of the original code.An outline of the changes:
slopes
array, which caused problems when the confidence intervals are calculated by array index. I now usex.compressed()
andy.compressed()
to remove masked values.slopes
with one that only includes values wheredeltax > 0
.sigsq
does not divide the second term by 18 (line 722 in original code). It also includes ayties
term for handling repeatedy
values (line 723 in original code) that does not appear in Sen (1968) equation (2.6).slopes
array. In Sen (1968), these indices are 1-indexed, but python is 0-indexed. Therefore, we must subtract 1 from the values computed by Sen (1968) to index theslopes
array correctly (see lines 728-729 in the new code).