Victor E Staartjes, Julius M Kernbach
Various available metrics to describe model performance in terms of discrimination (area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 Score) and calibration (slope, intercept, Brier score, expected/observed ratio, Estimated Calibration Index, Hosmer-Lemeshow goodness-of-fit) are presented. Recalibration is introduced, with Platt scaling and Isotonic regression as proposed methods. We also discuss considerations regarding the sample size required for optimal training of clinical prediction models-explaining why low sample sizes lead to unstable models, and offering the common rule of thumb of at least ten patients per class per input feature, as well as some more nuanced approaches...
2022: Acta Neurochirurgica. Supplement