Retrospective validation and comparison of deep learning based risk thresholds versus growth-centric protocols in pulmonary nodule assessment in screening

N. Antonissen, K. Venkadesh, H. Gietema, R. Vliegenthart, Z. Saghir, E. Scholten, M. Prokop, C. Schaefer-Prokop and C. Jacobs

Annual Meeting of the European Society of Thoracic Imaging 2024.

Purpose/Objectives:

We previously developed a deep learning (DL) algorithm for estimating malignancy risk in screen-detected nodules using a current and prior low-dose CT scan to assess 3-year malignancy risk of persisting pulmonary nodules. Existing nodule management guidelines have diverse criteria for intensified follow-up actions (short-term follow-up, PET-CT, or biopsy) based on nodule growth. Key criteria include more than 1.5 mm diameter growth within 12 months for Lung-RADS or subsequent scans in the International Lung Screen Trial (ILST) protocol, and 25% volume growth for nodules over 100 mm3 in the updated NELSON protocol, among other more detailed criteria. This study evaluates a deep learning-based method for determining the malignancy risk of persisting nodules in comparison to growth-centric protocols.

Methods and materials:

For this study we used 679 pairs of annual low-dose CT scans from the Danish Lung Cancer Screening Trial. This data set was constructed by selecting scans preceding lung cancer diagnosis for malignant cases and equivalent periods for benign cases in individuals without lung cancer, including 1,116 screen-annotated nodules across 639 non-cancer and 40 cancer cases. The DL-based malignancy risk was computed for all screen-detected nodules using a malignancy risk cut-off point of 5% as threshold for a positive screen. This 5% threshold aligns with the recommendations of the American College of Chest Physicians (ACCP), considering that nodules with less than 5% risk are deemed to have a very low risk of malignancy. We applied the published growth criteria from Lung-RADS, ILST and updated NELSON protocols to define a positive screening outcome. Sensitivity and specificity were calculated for all cases (n=679) using the risk dominant nodule per scan.

Results:

The deep learning (DL) model, with a 5% threshold, achieved a sensitivity of 90%, surpassing Lung-RADS at 77.5% and the NELSON protocol at 82.5%, while closely aligning with ILST at 92.5%. In specificity, the model excelled with 95.5%, outperforming Lung-RADS at 88.3%, ILST at 80.9%, and the NELSON protocol at 88%.

Conclusion :

In this study, a deep learning (DL) algorithm for lung cancer screening, which analyses both current and prior low-dose CT scans, showed notable performance by achieving a sensitivity of 90%, exceeding Lung-RADS, NELSON, and closely rivalling ILST. More notably, its specificity of 95.9% surpassed all compared protocols. These findings suggest that the deep learning algorithm considers other factors beyond nodule growth for malignancy risk estimation, highlighting the algorithm's efficiency not only in detecting lung cancer but also in potentially reducing false positives. In conclusion, the integration of prior CT scans coupled with the use of deep learning potentially provides a more precise assessment than traditional growth rate-based protocols.