BACKGROUND: The development of deep learning (DL) models for prostate segmentation on magnetic resonance imaging (MRI) depends on expert-annotated data and reliable baselines, which are often not publicly available. This limits both reproducibility and comparability.
METHODS: Prostate158 consists of 158 expert annotated biparametric 3T prostate MRIs comprising T2w sequences and diffusion-weighted sequences with apparent diffusion coefficient maps. Two U-ResNets trained for segmentation of anatomy (central gland, peripheral zone) and suspicious lesions for prostate cancer (PCa) with a PI-RADS score of >=4 served as baseline algorithms. Segmentation performance was evaluated using the Dice similarity coefficient (DSC), the Hausdorff distance (HD), and the average surface distance (ASD). The Wilcoxon test with Bonferroni correction was used to evaluate differences in performance. The generalizability of the baseline model was assessed using the open datasets Medical Segmentation Decathlon and PROSTATEx.
RESULTS: Compared to Reader 1, the models achieved a DSC/HD/ASD of 0.88/18.3/2.2 for the central gland, 0.75/22.8/1.9 for the peripheral zone, and 0.45/36.7/17.4 for PCa. Compared with Reader 2, the DSC/HD/ASD were 0.88/17.5/2.6 for the central gland, 0.73/33.2/1.9 for the peripheral zone, and 0.4/39.5/19.1 for PCa. Interrater agreement measured in DSC/HD/ASD was 0.87/11.1/1.0 for the central gland, 0.75/15.8/0.74 for the peripheral zone, and 0.6/18.8/5.5 for PCa. Segmentation performances on the Medical Segmentation Decathlon and PROSTATEx were 0.82/22.5/3.4; 0.86/18.6/2.5 for the central gland, and 0.64/29.2/4.7; 0.71/26.3/2.2 for the peripheral zone.
CONCLUSIONS: We provide an openly accessible, expert-annotated 3T dataset of prostate MRI and a reproducible benchmark to foster the development of prostate segmentation algorithms.