Performance and generalisability of a screening-trained deep learning model for pulmonary nodule malignancy risk estimation on a multicentre dataset of incidental nodules

R. Dinnessen, N. Antonissen, D. Peeters, H. Gietema, F. Mohamed Hoesein, E. Scholten, C. Schaefer-Prokop and C. Jacobs

European Congress of Radiology 2026.

Background: A deep learning (DL) model trained on screening low-dose CT for pulmonary nodule malignancy risk estimation previously demonstrated good discrimination on incidental nodules in a single-centre retrospective cohort. Here, we aimed to test its performance in a multicentre dataset, exploring its generalisability. Methods: A retrospective, multicentre, case-control dataset of solid pulmonary nodules was collected. A non-university teaching hospital and a university hospital were included as external datasets. To include the full range of nodule sizes, nodules were sampled using size bins (5-10mm, 10-15mm, 15-30mm). Per size bin, we aimed for 10 malignant and 20 benign nodules per centre. The performance of the model was tested using Receiver Operating Characteristic (ROC) curves and Area Under the ROC Curve (AUC) and compared to the Brock model for the full dataset and stratified per centre. AUCs were statistically compared using the DeLong method.

Results: The full dataset contained 177 solid nodules (60 malignant). Patients from the non-university centre were older (median 68, IQR 61-76) than those from the university centre (64, IQR 58-70, p<0.01). Otherwise, patient and nodule characteristics did not differ between centres or malignancy status. The DL model achieved an AUC of 0.75 and outperformed the Brock model (0.54, p<0.01). AUCs per centre were 0.75 (non-university) and 0.76 (university) and thus similar (p=0.88). Conclusion: The DL model performed well on a multicentre, cancer-enriched dataset and was robust across centres. The DL model outperformed the clinically established Brock model. These preliminary results suggest that this model shows potential to generalise across different patient populations with incidental nodules. Limitations: The limitations of this study are that both centres are located within the same country and case-control dataset hindering analyses that are influenced by prevalence.