Fixes multiple bugs in mstats.theilslopes #3574

tomflannaghan · 2014-04-24T16:16:00Z

This fixes multiple bugs in scipy.stats.mstats.theilslopes. I believe that the original code is implementing the method given in Sen (1968) for the slope and confidence intervals (this is the standard method), and I have therefore corrected the implementation to follow Sen (1968) faithfully. I have also added a test that highlights some of the failures of the original code.

An outline of the changes:

The original implementation erroneously always masked the last element of the input data (line 699 in the original code), and so the last element was always omitted from the remaining calculations.
The code retained the masked values in the slopes array, which caused problems when the confidence intervals are calculated by array index. I now use x.compressed() and y.compressed() to remove masked values.
Slopes were being computed for repeated values of x, not in accordance with the Sen (1968) method. This is fixed by replacing the routine for computing slopes with one that only includes values where deltax > 0.
The original calculation of sigsq does not divide the second term by 18 (line 722 in original code). It also includes a yties term for handling repeated y values (line 723 in original code) that does not appear in Sen (1968) equation (2.6).
The confidence intervals are computed by computing indices in the slopes array. In Sen (1968), these indices are 1-indexed, but python is 0-indexed. Therefore, we must subtract 1 from the values computed by Sen (1968) to index the slopes array correctly (see lines 728-729 in the new code).

coveralls · 2014-04-24T17:08:15Z

Changes Unknown when pulling 8010f14 on tomflannaghan:theilslopes-fix into * on scipy:master*.

rgommers · 2014-04-29T21:27:35Z

Hi @tomflannaghan, thanks for fixing this up. Looks good from a quick browse, but I haven't had time to check the original paper yet.

Adding the reference to that paper to the docstring might be useful. And bonus points for a small example of course:)

The implementation was hardly touched after it was introduced in 9f05b93. @pierregm was this code indeed based on the Sen paper, and does this look OK to you?

rgommers · 2014-04-29T21:30:17Z

Also, I think this should be in the stats namespace instead of only in stats.mstats.

jseabold · 2014-04-29T21:42:59Z

On Tue, Apr 29, 2014 at 5:30 PM, Ralf Gommers [email protected]:

Also, I think this should be in the stats namespace instead of only in
stats.mstats.

+1

FWIW, there are several implementations of Theil-Sen estimators in R (and
BSD-licensed matlab code) to compare against, though the test cases here
look fairly straightforward.

tomflannaghan · 2014-05-02T16:00:06Z

Hi @rgommers and @jseabold, thanks for the feedback! I agree that it should be in stats too, so I have now added it. I have changed the mstats implementation to call stats.theilslopes after the masked arrays are handled to avoid repeating the code. I have also added a test for stats.theilslopes, and the reference and an example to the docstrings.

I have also added back the handling of ties in y (removed as part of my point 4 in the original pull request) as I realised that removing it was an error on my part.

Add complete references, reST formatting, PEP8, etc.

rgommers · 2014-05-11T10:49:51Z

@tomflannaghan I have a branch here with some minor fixes and an updated example: https://github.com/rgommers/scipy/tree/theilslopes-fix. Somehow github doesn't let me send you a PR, but maybe you can cherry-pick those commits?

The example does show an issue - the confidence interval is only for slope and not for offset, which makes the plots look quite weird. Do you think it makes sense to fix this?

Finally, I don't have access to the original paper, so didn't review the algorithm changes in detail. If you know of a public version maybe you can direct me to it?

coveralls · 2014-05-14T15:09:06Z

Changes Unknown when pulling a38b7d2 on tomflannaghan:theilslopes-fix into * on scipy:master*.

tomflannaghan · 2014-05-14T15:45:48Z

@rgommers I've merged your commits with my branch. I'm not sure why github didn't allow you to send me a PR but I'm fairly new to it so maybe I messed something up. I'm happy with your changes and just merged them directly into this PR.

The confidence interval for the intercept would probably be useful. However, I am not sure how to implement it properly. The Theil-Sen method only gives the slope, and there are various different methods to compute the intercept, let alone it's confidence interval. I looked at an equivalent R package (zyp) and their confidence interval calculation doesn't agree at all well with a simple bootstrap test that I performed on your example. They compute it using the standard deviation of all possible intercepts, which is not robust to outliers and doesn't fit with the distribution-free philosophy of the rest of the method.

So, in summary, I think not including it is probably better than including something potentially misleading. I suspect most people would bootstrap to get the confidence interval for the intercept if they really needed it (which is very computationally expensive).

As for access to the paper, sadly I don't know any way to get it publicly. I can email you a copy though.

rgommers · 2014-05-14T16:24:02Z

If you could send me the paper at ralf.gommers at gmail.com that would be great, thanks.

After reading the stackexchange discussion you link to above, I get the impression that indeed a confidence interval on the intercept doesn't help. Maybe it's best to explain the issue with the intercept value (not well-defined), and that what the example does (which to me looked like the simplest way to use the slope confidence interval) makes little sense.

… wording.

tomflannaghan · 2014-05-14T18:43:49Z

I have changed the wording of the example a bit to reflect the fact that the intercept confidence interval is missing. I've also added a note explaining that the intercept is not defined by Sen, and that there are several definitions that are used.

Fixes multiple bugs in mstats.theilslopes

rgommers · 2014-05-17T13:46:53Z

All looks good to me, merged. Thanks Tom!

tomflannaghan · 2014-05-17T20:41:34Z

Thanks Ralf!

Corrected the theilslopes implementation and added a test

8010f14

pv added the PR label Apr 27, 2014

rgommers added the scipy.stats label Apr 29, 2014

tomflannaghan added 2 commits May 2, 2014 11:03

Moved implementation to stats.theilslopes. Added tests to stats.

50a868c

Added y ties term to the confidence interval calculation.

e845913

PEP8 fixes

fe1ad8a

rgommers added this to the 0.15.0 milestone May 10, 2014

Ralf Gommers added 3 commits May 11, 2014 11:52

MAINT: minor cleanups to stats.theilslopes.

d381e99

Add complete references, reST formatting, PEP8, etc.

DOC: add example for stats.theilslopes.

6c1048f

DOC: remove docstring of stats.mstats.theilslopes, use stats version.

a38b7d2

tomflannaghan added 2 commits May 14, 2014 14:33

DOC: added notes about example for stats.theilslopes. Changed example…

425f3aa

… wording.

Minor changes to docs for stats.theilslopes

fcb2982

rgommers pushed a commit that referenced this pull request May 17, 2014

Merge pull request #3574 from tomflannaghan/theilslopes-fix

472fde8

Fixes multiple bugs in mstats.theilslopes

rgommers merged commit 472fde8 into scipy:master May 17, 2014

tomflannaghan deleted the theilslopes-fix branch May 17, 2014 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixes multiple bugs in mstats.theilslopes #3574

Fixes multiple bugs in mstats.theilslopes #3574

Uh oh!

tomflannaghan commented Apr 24, 2014

Uh oh!

coveralls commented Apr 24, 2014

Uh oh!

rgommers commented Apr 29, 2014

Uh oh!

rgommers commented Apr 29, 2014

Uh oh!

jseabold commented Apr 29, 2014

Uh oh!

tomflannaghan commented May 2, 2014

Uh oh!

rgommers commented May 11, 2014

Uh oh!

coveralls commented May 14, 2014

Uh oh!

tomflannaghan commented May 14, 2014

Uh oh!

rgommers commented May 14, 2014

Uh oh!

tomflannaghan commented May 14, 2014

Uh oh!

rgommers commented May 17, 2014

Uh oh!

tomflannaghan commented May 17, 2014

Uh oh!

Uh oh!

Uh oh!

Fixes multiple bugs in mstats.theilslopes #3574

Fixes multiple bugs in mstats.theilslopes #3574

Uh oh!

Conversation

tomflannaghan commented Apr 24, 2014

Uh oh!

coveralls commented Apr 24, 2014

Uh oh!

rgommers commented Apr 29, 2014

Uh oh!

rgommers commented Apr 29, 2014

Uh oh!

jseabold commented Apr 29, 2014

Uh oh!

tomflannaghan commented May 2, 2014

Uh oh!

rgommers commented May 11, 2014

Uh oh!

coveralls commented May 14, 2014

Uh oh!

tomflannaghan commented May 14, 2014

Uh oh!

rgommers commented May 14, 2014

Uh oh!

tomflannaghan commented May 14, 2014

Uh oh!

rgommers commented May 17, 2014

Uh oh!

tomflannaghan commented May 17, 2014

Uh oh!

Uh oh!