-
Notifications
You must be signed in to change notification settings - Fork 37
ENH - implement Cox estimator #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH - implement Cox estimator #157
Conversation
@PABannier, WDYT about the input conversion ( Also, I would be grateful if you could check the datafit expression 🙏 |
@Badr-MOUFAD The input conversion is sound and greatly simplifies the expression of the datafit. However I'm not sure about Breslow's estimate, I think you miss a log in the second dot product. Could you compare it with lifeline's or PySurvival? Make sure you don't have tied ties otherwise Efron's and Breslow's estimates are different. By the way, I don't think you need to compare it to an |
You are right @PABannier, thanks for having a look. |
When unit testing against
I pushed a @PABannier, @mathurinm, any thoughts? |
The script fails on my machine with: (click to expand error logs)In [1]: %run debug_cox_lifeline.py
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
File ~/workspace/skglm/debug_cox_lifeline.py:33
30 datafit = compiled_clone(Cox())
31 penalty = compiled_clone(L1(alpha))
---> 33 datafit.initialize(X, (tm, s))
35 w, _, _ = ProxNewton(fit_intercept=False).solve(
36 X, (tm, s), datafit, penalty
37 )
39 # fit lifeline estimator
File ~/mambaforge/lib/python3.10/site-packages/numba/experimental/jitclass/boxing.py:61, in _generate_method.<locals>.wrapper(*args, **kwargs)
59 @wraps(func)
60 def wrapper(*args, **kwargs):
---> 61 return method(*args, **kwargs)
File ~/mambaforge/lib/python3.10/site-packages/numba/core/dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
464 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
465 f"by the following argument(s):\n{args_str}\n")
466 e.patch_message(msg)
--> 468 error_rewrite(e, 'typing')
469 except errors.UnsupportedError as e:
470 # Something unsupported is present in the user code, add help info
471 error_rewrite(e, 'unsupported_error')
File ~/mambaforge/lib/python3.10/site-packages/numba/core/dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
407 raise e
408 else:
--> 409 raise e.with_traceback(None)
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function getitem>) found for signature:
>>> getitem(array(float64, 1d, C), Tuple(slice<a:b>, none))
There are 22 candidate implementations:
- Of which 20 did not match due to:
Overload of function 'getitem': File: <numerous>: Line N/A.
With argument(s): '(array(float64, 1d, C), Tuple(slice<a:b>, none))':
No match.
- Of which 2 did not match due to:
Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 166.
With argument(s): '(array(float64, 1d, C), Tuple(slice<a:b>, none))':
Rejected as the implementation raised a specific error:
NumbaTypeError: unsupported array index type none in Tuple(slice<a:b>, none)
raised from /home/mathurin/mambaforge/lib/python3.10/site-packages/numba/core/typing/arraydecl.py:72
During: typing of intrinsic-call at /home/mathurin/workspace/skglm/skglm/datafits/single_task.py (593)
During: typing of static-get-item at /home/mathurin/workspace/skglm/skglm/datafits/single_task.py (593)
File "skglm/datafits/single_task.py", line 593:
def initialize(self, X, y):
<source elided>
tm, s = y
self.B = (tm >= tm[:, None]).astype(X.dtype)
^
- Resolution failure for non-literal arguments:
None
During: resolving callee type: BoundFunction((<class 'numba.core.types.misc.ClassInstanceType'>, 'initialize') for instance.jitclass.Cox#7f376353e2c0<B:array(float64, 2d, C)>)
During: typing of call at <string> (3) updating numba from 0.56.3 to 0.57 fixed the issue : if we can avoid requiring numba>=0.57 it's better, otherwise we should put it in the requirements |
My bad, forgot to require |
Here is a link to benchmark results on a dummy dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Open question: do we want to make it an estimator? Once Efron is implemented, I think it's worth implementing, especially if we aim at a wider adoption by the biostat community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge once the test for gradient and hessian has been added
This adds a Cox estimator. It proceeds by
lifelines.CoxPHFitter
lifelines
Problem setup
This problem falls in the survival analysis setup. Given
The minus log-likelihood to be minimized reads [1]
which represent the datafit to be considered.
Defining the matrix$\mathbf{B} \in \mathbb{R}^{n \times n}$ with $\mathbf{B}_{i,j} = 1 \ \mathrm{if} \ y_j \geq y_i$ and $0 \ \mathrm{otherwise}$ , the datafit can be rewritten as
Cox Estimator
Referring to
skglm
toolbox, the Cox estimator can be fit using aProxNewton
solver.Benchmark
Links to results and repository to reproduce
Note
Revival of #124, and closes #121
References
[1] Lin, D. Y. "On the Breslow estimator." Lifetime data analysis 13 (2007): 471-480.