Purpose: Multi-center evaluation of the stand-alone performance of commercially available lung nodule detectionsoftware (Lunit INSIGHT CXR3).
Methods and Materials: A set of 300 posteroanterior (PA) and lateral chest radiographs from four medical centers in theNetherlands was collected. Solitary lung nodules ranging from 5 to 35 mm in size were present in 111 of the cases. Allnodules were confirmed by CT within three months of the radiograph acquisition. Control radiographs were determinedbased on a negative CT within six months. Five radiologists and three radiology residents scored the set to provide contextto the algorithm performance. All PA radiographs were processed by Lunit INSIGHT CXR3, a commercial software productthat detects ten common abnormalities in chest radiographs. Area under the receiver operating characteristics curve (AUC)and sensitivity at 90% specificity were used to measure performance. Multi-reader multi-case ROC analysis based on U-statistics (iMRMC-v4 software) was applied to compare CXR3 with the readers. Subanalysis was performed regardingnodule size (small<15mm, large>15mm) and conspicuity levels (well visible, moderately visible, subtle, very subtle).
Results: Out of the 300 radiographs, 7 could not be processed by CXR3, resulting in a set of 104 nodule cases and 189normal cases for evaluation. The CXR3 AUC was 0.93 and significantly higher than the mean reader AUC of 0.82 (p<0.001).CXR3 was also significantly better than the best reader with an AUC of 0.88 (p=0.028). At a specificity level of 90%,sensitivity was 83.2% for CXR3 and 63.3% (std±7.5%) for the reader average. Regarding conspicuity of the nodules, CXR3AUCs were 0.99 for well visible, 0.94 for moderately visible, 0.94 for subtle, and 0.78 for very subtle nodules. No significantdifference in CXR3 performance was observed between the detection of small (AUC 0.91) and large nodules (AUC 0.93).
Conclusions: Lunit INSIGHT CXR3 significantly outperforms the comparison group of eight readers in nodule detection onchest radiographs.
Clinical Relevance/Application: Generalizability of artificial intelligence algorithms is not trivial. Performance studiesincrease confidence in algorithms to the users, especially to those with similar patient populations.