Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Brier-Score Calculation and Calibration Plots (smooth and binned version) #101

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

laqua-stack
Copy link
Contributor

@laqua-stack laqua-stack commented Mar 20, 2020

Hi @sebp,
This pull requests adds an IPCW-based Brier-Score Calculation to the metrics.py.
For details about the calculation cf. the comments in the metrics.py.

Moreover, a modified version of r's calPlot using loess smoothing on jackknife pseudovalues is suggested for visual assessment of model calibration.

For a usage scenario, I modifed your Random Survival Forrest Example. Maybe, you would like to add/modify the post on your blog?
TODOS:

  • shipping an apache license with it, as the brier score calculation is parly a modified version from pysurival, which comes with Apache 2.0 license. Hence, it may be modified and shared, but the license should be mentioned, shouldn't it?
  • changing the used product limit estimator (from lifelines) to the one of sksurv.nonparametric.
    However, for the binned version of the calibration plot confidence intervals for the Kaplan-Meier estimates are needed.

In case of any questions, feel free to ask! I hope, that I can respond accordingly ;-)

Beste Grüße, Fabian Laqua

Copy link
Owner

@sebp sebp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for helping to improve scikit-survival!

I haven't checked your code in detail, yet, but one important thing that's currently missing is unit tests. You would need to provide a couple of examples to check that the implementation returns the desired results, especially for failure cases (e.g. all times are censored). If it's based on an R implementation, you can just take the outputs from there and add a unit test that checks that the python implementation arrives at the same result.

@laqua-stack
Copy link
Contributor Author

Dear @sebp,

thank you for your reply.
Yes, we need Unit-tests.
However, even though there is a r implementation for the calibration plot, I haven't (need to admit, that I cannot read r ;-) ) translated the r code, but rather implemented, what I found in the cited papers.
Unfortunately, I am currently not able to spend much time on it since I am currently full-time involved in clinical routine work (COVID-19...). That is also why I opened the pull request with incompletely tested code. I hoped, that someone (maybe you :-) ) could take a deep look at it and perform the tests.

calculate inverse probability of censoring at time t* for all obs, not events only
@sebp
Copy link
Owner

sebp commented May 29, 2020

I had a closer look at your code. The implementation of the (integrated) Brier score seems straight forward. In addition, you implemented calibration_curve, which according to the references you provided is due to Graw et al. and Gerds et al. I'm not very familiar with these papers, but based on their abstract, they seem to focus on evaluating competing risks models. Currently, scikit-survival doesn't provide any competing risks models, so I wanted to ask how these methods are relevant for evaluating models for right censored data?

@laqua-stack
Copy link
Contributor Author

Thank you for your effort. I hope, that I can clearify some things.

Yes, their work deals with competing risks, but can be applied to "normal" right-censored survival data with only one event and uninformative censoring. In this implementation we consider a special case of competing risk analysis, where there is just one "competing" event of interest. Then the proposed Aalen-Johnson-Matrix-Estimator simplyfies to a Kaplan-Meier-Product-Limit estimator.

calibration_curve depicts either a binned calibration curve (pseudovalues = False), where the predicted survival is categorized in k-tiles (e.g. 10 is a common, but arbitrary choice) and the KM estimate for each of these bins is calculated (ie. actual survival) and plotted against the mean predicted survival in the relative bin. This is the visual analogue of the Hosmer-Lemeshow-Test.

Or (pseudovalues = True) Jacknife-pseudovalues are calculated in the given manner from KM estimate for the whole population and the subset, where the observation is left out and a (in this case non-parametric loess) smoother is applied to these pseudovalues and the smoothed curve is plotted against the predicted survival.

More on pseudo-observations including their properties in different settings: https://doi.org/10.1177%2F0962280209105020 and in the cited papers by Graw and Gerds et al.

Actually the r function calPlot in the package <pec> , this implementation is based on, works very fine with "normal" right censored survival data.

Once competing risks would be implemented (hey, what about a competing risk random survival forest? :-) ) we would ofcourse need to update the code (besides many other things to do) by exchanging the KM-product-limit estimator with the Aalen-Johnson estimator.

I hope I have clearified what it does.
Feel free to ask again, if something remains unclear.

sebp pushed a commit that referenced this pull request Jun 3, 2020
sebp pushed a commit that referenced this pull request Jun 4, 2020
@sebp
Copy link
Owner

sebp commented Jun 5, 2020

I updated the code for brier_score and integrated_brier_score and merged it with master.

I haven't looked into calibration_curve yet, but I'm a little bit hesitant as it does add two additional dependencies: matplotlib and skmisc. In any case, it needs to be rebased onto the latest changes in master and accompanied by unit tests first.

@laqua-stack
Copy link
Contributor Author

Yes, the matplotlib dependency could be resolved if we instead returned the points of the calibration curve and let the user plot it and give the matplotlib frame as an example.

There would be an option to use the sklearn.neighbors.KNeighborsRegressor instead of the nonparametric loess smoother. However,we would need a way to ascertain optimal bandwidth, like in R's pec package. The beauty of loess is its simplicity with regard to the parameters to be set.

@sebp sebp marked this pull request as draft July 3, 2020 18:48
@sebp sebp mentioned this pull request Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants