Adding Brier-Score Calculation and Calibration Plots (smooth and binned version) #101

laqua-stack · 2020-03-20T01:03:44Z

Hi @sebp,
This pull requests adds an IPCW-based Brier-Score Calculation to the metrics.py.
For details about the calculation cf. the comments in the metrics.py.

Moreover, a modified version of r's calPlot using loess smoothing on jackknife pseudovalues is suggested for visual assessment of model calibration.

For a usage scenario, I modifed your Random Survival Forrest Example. Maybe, you would like to add/modify the post on your blog?
TODOS:

shipping an apache license with it, as the brier score calculation is parly a modified version from pysurival, which comes with Apache 2.0 license. Hence, it may be modified and shared, but the license should be mentioned, shouldn't it?
changing the used product limit estimator (from lifelines) to the one of sksurv.nonparametric.
However, for the binned version of the calibration plot confidence intervals for the Kaplan-Meier estimates are needed.

In case of any questions, feel free to ask! I hope, that I can respond accordingly ;-)

Beste Grüße, Fabian Laqua

Added brier_score, integrated_brier_score, calibration_curve

This reverts commit d933299.

This reverts commit 946eca8.

sebp

Thanks so much for helping to improve scikit-survival!

I haven't checked your code in detail, yet, but one important thing that's currently missing is unit tests. You would need to provide a couple of examples to check that the implementation returns the desired results, especially for failure cases (e.g. all times are censored). If it's based on an R implementation, you can just take the outputs from there and add a unit test that checks that the python implementation arrives at the same result.

laqua-stack · 2020-03-21T02:49:02Z

Dear @sebp,

thank you for your reply.
Yes, we need Unit-tests.
However, even though there is a r implementation for the calibration plot, I haven't (need to admit, that I cannot read r ;-) ) translated the r code, but rather implemented, what I found in the cited papers.
Unfortunately, I am currently not able to spend much time on it since I am currently full-time involved in clinical routine work (COVID-19...). That is also why I opened the pull request with incompletely tested code. I hoped, that someone (maybe you :-) ) could take a deep look at it and perform the tests.

calculate inverse probability of censoring at time t* for all obs, not events only

sebp · 2020-05-29T10:15:01Z

I had a closer look at your code. The implementation of the (integrated) Brier score seems straight forward. In addition, you implemented calibration_curve, which according to the references you provided is due to Graw et al. and Gerds et al. I'm not very familiar with these papers, but based on their abstract, they seem to focus on evaluating competing risks models. Currently, scikit-survival doesn't provide any competing risks models, so I wanted to ask how these methods are relevant for evaluating models for right censored data?

laqua-stack · 2020-05-30T01:08:51Z

Thank you for your effort. I hope, that I can clearify some things.

Yes, their work deals with competing risks, but can be applied to "normal" right-censored survival data with only one event and uninformative censoring. In this implementation we consider a special case of competing risk analysis, where there is just one "competing" event of interest. Then the proposed Aalen-Johnson-Matrix-Estimator simplyfies to a Kaplan-Meier-Product-Limit estimator.

calibration_curve depicts either a binned calibration curve (pseudovalues = False), where the predicted survival is categorized in k-tiles (e.g. 10 is a common, but arbitrary choice) and the KM estimate for each of these bins is calculated (ie. actual survival) and plotted against the mean predicted survival in the relative bin. This is the visual analogue of the Hosmer-Lemeshow-Test.

Or (pseudovalues = True) Jacknife-pseudovalues are calculated in the given manner from KM estimate for the whole population and the subset, where the observation is left out and a (in this case non-parametric loess) smoother is applied to these pseudovalues and the smoothed curve is plotted against the predicted survival.

More on pseudo-observations including their properties in different settings: https://doi.org/10.1177%2F0962280209105020 and in the cited papers by Graw and Gerds et al.

Actually the r function calPlot in the package <pec> , this implementation is based on, works very fine with "normal" right censored survival data.

Once competing risks would be implemented (hey, what about a competing risk random survival forest? :-) ) we would ofcourse need to update the code (besides many other things to do) by exchanging the KM-product-limit estimator with the Aalen-Johnson estimator.

I hope I have clearified what it does.
Feel free to ask again, if something remains unclear.

See PR #101

sebp · 2020-06-05T18:28:17Z

I updated the code for brier_score and integrated_brier_score and merged it with master.

I haven't looked into calibration_curve yet, but I'm a little bit hesitant as it does add two additional dependencies: matplotlib and skmisc. In any case, it needs to be rebased onto the latest changes in master and accompanied by unit tests first.

laqua-stack · 2020-06-07T22:20:46Z

Yes, the matplotlib dependency could be resolved if we instead returned the points of the calibration curve and let the user plot it and give the matplotlib frame as an example.

There would be an option to use the sklearn.neighbors.KNeighborsRegressor instead of the nonparametric loess smoother. However,we would need a way to ascertain optimal bandwidth, like in R's pec package. The beauty of loess is its simplicity with regard to the parameters to be set.

laqua-stack added 10 commits March 19, 2020 19:07

Add files via upload

411b78f

Added brier_score, integrated_brier_score, calibration_curve

Update metrics.py

e3981ec

Update description for integrated brier score

01caaf2

Bug Fix

d933299

Revert "Bug Fix"

946eca8

This reverts commit d933299.

Revert "Revert "Bug Fix""

c2f9134

This reverts commit 946eca8.

Updated example and bugfixed metrics.py

6e47ef7

Code style enh.

4d71c77

Update metrics.py

5cdb611

Update metrics.py

fdbd0ca

sebp requested changes Mar 20, 2020

View reviewed changes

Update metrics.py

8b3ab53

calculate inverse probability of censoring at time t* for all obs, not events only

sebp pushed a commit that referenced this pull request Jun 3, 2020

Added brier_score and integrated_brier_score to sksurv.metrics

1384044

See PR #101

sebp pushed a commit that referenced this pull request Jun 4, 2020

Added brier_score and integrated_brier_score to sksurv.metrics

2b7bb92

See PR #101

sebp marked this pull request as draft July 3, 2020 18:48

sebp mentioned this pull request Dec 12, 2020

Calibration Plots #154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Brier-Score Calculation and Calibration Plots (smooth and binned version) #101

Adding Brier-Score Calculation and Calibration Plots (smooth and binned version) #101

laqua-stack commented Mar 20, 2020 •

edited

Loading

sebp left a comment

laqua-stack commented Mar 21, 2020

sebp commented May 29, 2020

laqua-stack commented May 30, 2020

sebp commented Jun 5, 2020

laqua-stack commented Jun 7, 2020

Adding Brier-Score Calculation and Calibration Plots (smooth and binned version) #101

Are you sure you want to change the base?

Adding Brier-Score Calculation and Calibration Plots (smooth and binned version) #101

Conversation

laqua-stack commented Mar 20, 2020 • edited Loading

sebp left a comment

Choose a reason for hiding this comment

laqua-stack commented Mar 21, 2020

sebp commented May 29, 2020

laqua-stack commented May 30, 2020

sebp commented Jun 5, 2020

laqua-stack commented Jun 7, 2020

laqua-stack commented Mar 20, 2020 •

edited

Loading