Objectives: Better predictions in survival prediction models are an
important challenge that can reduce economic waste and patient pain and
increase survival probability through the rapid diagnosis of events of concern
such as disease. In actual data,...
Objectives: Better predictions in survival prediction models are an
important challenge that can reduce economic waste and patient pain and
increase survival probability through the rapid diagnosis of events of concern
such as disease. In actual data, it is not often followed by proportional hazard
assumptions. Therefore, considering the nine machine learning models that are
flexible and highly accurate in predictive models, we would like to
comprehensively compare predictive performance in various survival data
through simulations and compare and evaluate the existing proposed
performance evaluation methods for survival prediction models.
Methods: This study assume three situation. 1) the change in hazard is
constant, 2) a rapidly changing hazard, and 3) the change of hazard is symmetric
similar to the normal distribution. Fitting was done to generated simulation dat
using Cox proportional hazard model and nine machine learning models(Cox
proportional hazards deep neural network (DeepSurv) model, Random survival
forest model, Survival gradient boosting decision tree (SurvXGBoost) model,
Conditional inference forest time-varying (CIF-TV) model, Relative risk forest
time-varying (RRF-TV) model, Transformation forest time-varying (TSF-TV)
model, 3 types stacking ensemble model). Next, comparison and evaluation
based on five evaluation methods which are Time-dependent brier score,
Kaplan-Meier based Time-dependent AUROC, Average positive predictive
value based Time-dependent AUROC, c-index, Greenwood-Nam-D’Agostino
Calibration. Thisstudy also compare and evaluate whether these five evaluation
methods are reliable.
Results: The Cox proportional hazard model, Cox proportional hazards
deep neural network (deepSurv) model, showed almost equally excellent
performance regardless of the change in hazard and the censoring rate. Even if
the proportional hazard assumption was not satisfied to a certain extent, the
performance of the Cox proportional hazard model did not deteriorate. Among
the many machine learning models, the Cox proportional hazards deep neural
network (deepSurv) model has the highest performance followed by
transformation forest time-varying (TSF-TV) model. In addition, for the
survival prediction model performance evaluation method, the evaluation of the
time-dependent brier score was correctly measured except for the high
censoring rate. Average positive predictive value based Time-dependent
AUROC, Kaplan-Meier based Time-dependent AUROC evaluates
performance correctly regardless of the censoring rate when the hazard changes
rapidly, but the performance decreased when the change in hazard is constant
and censoring rate is high, when the change of hazard is similar to normal
distribution symmetry and censoring rate is low. c-index evaluates performance
correctly regardless of the censoring rate when the hazard changes rapidly, but
the performance decreased when the change in hazard is constant and censoring
rate is high(30%). Also, the performance decreased regardless of the censoring
rate when the change in hazard is similar to normal distribution symmetry. The
calibration measurement method based on the Greenwood-Nam-D’Agostino
test results has decreased performance evaluation, regardless of the scenario
and censoring rate.
Conclusion: Under the assumption that you want to find a better predictive
model for complex survival data, we propose to use the Cox proportional
hazards deep neural network (deepSurv) model and a transformation forest
time-varying (TSF-TV) model of machine learning models. Also, this study
suggests fitting the Cox proportional hazard model together, even if the
proportional hazard assumption is not somewhat satisfactory. And since the
performance evaluation ability of the survival prediction model can vary
depending on changing of hazard and the censoring rate, it is proposed to use
the time-dependent brier score, the average positive predictive value based
time-dependent AUROC, Kaplan-Meier based time-dependent AUROC, and cindex together to evaluate from multiple perspectives.