Objective: AI models have shown promising results in estimating lung nodule malignancy risk on CT but have not been integrated in management guidelines [1,2]. We investigated how an in-house AI model [1] could be used in nodule management for incidentally detected pulmonary nodules and compared it to the BTS guidelines for pulmonary nodules [3].
Methods: A retrospective dataset of 549 solid nodules was collected (104 malignant, 445 benign), containing a subset of 345 indeterminate solid nodules with a size of 5-15 mm (46 malignant, 345 benign). Management recommendations based on BTS guidelines were determined, i.e., CT surveillance after 1 year (1y-CT), CT surveillance after 3 months (3m-CT), or PET/CT. For AI-based recommendations, thresholds at 100% sensitivity for 1y-CT and at least equal sensitivity to the Brock model for 3m-CT were determined and used as cut-off values for the management recommendations.
Results: Thresholds for AI-based recommendations were <1.6% for 1y-CT, <12% for 3m-CT, and >=12% for PET/CT. Figure 1 shows a flowchart of the AI-based recommendations. At 100% sensitivity on the full dataset, the AI model achieved a specificity of 49%, compared to 20% for the Brock model used in the BTS. In the indeterminate subset, the specificity of the AI and Brock model at 100% sensitivity was 54% and 21%, respectively. A 52% decrease in 3m-CT recommendations was seen using AI-based recommendations compared to BTS guidelines. Of these nodules, 150/198 shifted to 1y-CT, which were all benign. A 15% increase in recommended PET/CTs was seen for the AI-based recommendations compared to BTS with a lower proportion of lung cancer (AI: 53%; BTS: 60%). Similarly, in the indeterminate subset, a 56% decrease in 3m-CT (AI-based and BTS: 0% malignant) and a 46% increase in PET/CT (AI: 41% malignant; BTS: 46% malignant) was seen compared to BTS. Table 1 shows the number of nodules per category.
Conclusion: A first exploration of AI-enabled nodule management categorised less 3m-CT follow-up compared to the BTS guidelines without missing any cancers. Future studies should analyse the thresholds to be used on separate datasets with representative prevalence and determine the impact AI could have on clinical practice.