Federated learning for prostate cancer detection in biparametric MRI: optimization of rounds, epochs, and aggregation strategy

A. Moradi, F. Zerka, J. Sander Bosma, D. Yakar, J. Geerdink, H. Huisman, T. Frost Bathen and M. Elschot

Medical Imaging 2024: Computer-Aided Diagnosis 2024.

In 2020, prostate cancer (PCa) caused about 6.8% of male cancer-related deaths. Thus, early detection of PCa and improving clinical procedures for suspected cases are crucial. Fortunately, using biparametric magnetic resonance imaging (bpMRI) along with advanced computer-aided diagnosis (CAD) systems that involve deep learning (DL) techniques leads to PCa detection rates comparable to radiologists. Achieving this requires access to a large amount of patient data; however, sharing patient-specific information raises privacy concerns. To address this, we present a Flower federated learning (FL) framework that utilizes local DL models based on nnU-Net for PCa detection. We investigated the impact of various epoch-round combinations and server-side aggregation strategies on the performance of the FL framework and compared the results with centralized learning (CL). We performed empirical experiments using the Prostate Imaging Cancer AI (PI-CAI) dataset comprising 1500 patients from three different institutions for training/validation/testing and further evaluated the results on an independent test set of 199 patients from a fourth institution. The trained models were evaluated based on the PI-CAI score, which combines patient-level diagnostic performance (area under the receiver operating characteristic) and lesion-level detection performance (average precision). We found that FL can replicate the performance of CL and substantially improve the detection of PCa compared to local training approaches. We optimized the combination of epochs and rounds, along with a server-side aggregation strategy, for prostate lesion detection on the validation set. This configuration (5 epoch, 200 rounds, FedAdagrad aggregation) led to an enhanced PI-CAI score of 0.74 on the test set, compared to 0.73 and 0.72 for the CL and FL baselines, respectively.