EAU Plenary Gamechanging Session - Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: Preliminary Results from the PI-CAI Challenge

J. Twilt, A. Saha, J. Bosma, B. van Ginneken, D. Yakar, M. Elschot, J. Veltman, J. Fütterer, H. Huisman and M. de Rooij

Annual European Association of Urology Congress 2023.

PURPOSE: The PI-CAI (Prostate Imaging: Cancer AI) Challenge is an international study, with over 10,000 carefully-curated prostate MRI exams to validate modern AI algorithms and estimate radiologists’ performance at csPCa detection and diagnosis. PI-CAI primarily consists of two sub-studies: an AI study (Grand Challenge) and a reader study. In the end PI-CAI will benchmark state-of-the-art AI algorithms developed in the Grand Challenge, against prostate radiologists participating in the reader study. The aim is to present the study design and share the preliminary results of both sub-studies. METHODS: For the AI study, an annotated multi-center, multi-vendor dataset of 1500 bpMRI exams (including their basic clinical and acquisition variables) was made publicly available for all participating AI teams and the research community at large. Teams used this dataset to develop AI models, and at the end of this open development phase, all algorithms were ranked, based on their performance on a hidden testing cohort of 1000 unseen scans. In the ongoing closed testing phase, organizers will retrain the top-ranking 5 AI algorithms using a larger dataset of 9107 bpMRI scans (including additional training scans from a private dataset). Finally, their performance will be re-evaluated on the hidden testing cohort. For the reader study, 59 international prostate radiologists assessed a subset of 400 scans from the hidden testing cohort. Readers and cases were divided into blocks of 100 cases. For each case, readers assessed bpMRI and mpMRI in sequence to mimic clinical routine. Suspected GG≥2 cancer findings were assigned a PI-RADS 3-5 score. Additionally, a patient-level suspicion score (0-100) of harboring GG≥2 was indicated. Multi-reader multi-case (MRMC) analysis was used to compare the patient-level added value of mpMRI. RESULTS: From the AI study, the ranked results from the open development phase will be presented. From the reader study, preliminary results from the first 14 readers will be presented. Readers with 2–15 years (median: 9) of experience indicate that overall, there is little improvement in GG≥2 detections between bpMRI and mpMRI readings with AUROCs of 0.857 (95% CI: 0.83, 0.89) and 0.860 (95% CI: 0.83, 0.89), respectively. For individual readers, absolute differences in AUROC ranged between 0.00–0.03 (95% CI: 0.00, 0.01). CONCLUSION: The top 5 results from the open development phase of the AI study of the PI-CAI Challenge will be presented. Preliminary results from the PI-CAI reader study show that bpMRI had similar GG≥2 detection to mpMRI assessments at a per-case level. Multivariable influencers such as experience, workflow, image quality and protocol familiarity need to be evaluated. LIMITATIONS: Preliminary results are limited by the sample size. mpMRI readings of the original data were used to guide histologic verification. 13 out of 14 readers had high expertise as per 2020 ESUR/ESUI consensus statements.