Retrospective validation of nodule management based on deep learning-based malignancy thresholds in lung cancer screening

N. Antonissen, K. Venkadesh, H. Gietema, R. Vliegenthart, Z. Saghir, E. Scholten, M. Prokop, C. Schaefer-Prokop and C. Jacobs

European Congress of Radiology 2023.

Purpose: We previously developed and validated a deep learning (DL) algorithm for malignancy risk estimation of screen-detected nodules. The nodule risk cut-off for a positive screen, triggering more intensive follow-up (either short-term follow-up, PET-CT or biopsy), varies in existing nodule management protocols; 1-2% for Lung-RADS (cat 3), 6% for PanCan2b (CAT3). In this study, we investigated two DL-based malignancy thresholds to define a positive screen, compared to existing nodule management protocols.

Methods and materials: All baseline CT-scans from the Danish Lung Cancer Screening Trial were linked to lung cancer diagnosis within 2 years, resulting in 2,019 non-cancer and 18 cancer cases. The DL-based malignancy risk was computed for all screen-detected nodules using two malignancy risk cut-off points (6% and 10%), as threshold for a positive screen. For both Lung-RADS and PanCan2b, we used the published nodule-risk cut-offs for a positive screen. Sensitivity and False Positive Rate (FPR) were calculated for all baseline scans (n=2,037) using the risk dominant nodule per scan.

Results: At a threshold of 6%, DL achieved the highest sensitivity with 88.9% compared to 83.3% of Lung-RADS and 77.8% with PanCan2b. DL and PanCan2b yielded comparable FPR of 3.6% and 4.1%, respectively, while Lung-RADS had a higher FPR of 8.7%. Increasing the DL threshold to >=10% resulted in a sensitivity of 88.9%, and a FPR of 2.5%.

Conclusion: DL-based nodule risk cut-offs achieved the highest sensitivity and lowest FPR for defining a positive screen, triggering more intense diagnostic work-up. Increasing the risk cut-off from 6% to 10% further decreased the FPR without alteration of sensitivity.

Limitations: This study is a retrospective analysis on data from one screening trial and one screening round. More external validation is needed, including validation for incidence screenings.