VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images

A. Sekuboyina, M. Husseini, A. Bayat, M. Löffler, H. Liebl, H. Li, G. Tetteh, J. Kukačka, C. Payer, D. Stern, M. Urschler, M. Chen, D. Cheng, N. Lessmann, Y. Hu, T. Wang, D. Yang, D. Xu, F. Ambellan, T. Amiranashvili, M. Ehlke, H. Lamecker, S. Lehnert, M. Lirio, N. de Olaguer, H. Ramm, M. Sahu, A. Tack, S. Zachow, T. Jiang, X. Ma, C. Angerman, X. Wang, K. Brown, A. Kirszenberg, É. Puybareau, D. Chen, Y. Bai, B. Rapazzo, T. Yeah, A. Zhang, S. Xu, F. Hou, Z. He, C. Zeng, Z. Xiangshang, X. Liming, T. Netherton, R. Mumme, L. Court, Z. Huang, C. He, L. Wang, S. Ling, L. Huynh, N. Boutry, R. Jakubicek, J. Chmelik, S. Mulay, M. Sivaprakasam, J. Paetzold, S. Shit, I. Ezhov, B. Wiestler, B. Glocker, A. Valentinitsch, M. Rempfler, B. Menze and J. Kirschke

arXiv:2001.09193 2020.

DOI arXiv Cited by ~33

Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data. Addressing these limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with a call for algorithms towards labelling and segmentation of vertebrae. Two datasets containing a total of 374 multi-detector CT scans from 355 patients were prepared and 4505 vertebrae have individually been annotated at voxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/, https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on these datasets. In this work, we present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view. We also evaluate the generalisability of the approaches to an implicit domain shift in data by evaluating the top performing algorithms of one challenge iteration on data from the other iteration. The principal takeaway from VerSe: the performance of an algorithm in labelling and segmenting a spine scan hinges on its ability to correctly identify vertebrae in cases of rare anatomical variations. The content and code concerning VerSe can be accessed at: https://github.com/anjany/verse.