"Independent combination of multiple readers for the detection of lung nodules in chest radiographs: setting a benchmark for computer-aided detection"

S. Schalekamp, N. Karssemeijer, C. Schaefer-Prokop and B. van Ginneken

Annual Meeting of the Radiological Society of North America 2013.

PURPOSE: The detection performance for lung nodules in chest radiography shows a large interreader variability. High miss rates of lung cancer have been reported though judged as being visible in retrospect. History has proven that computer intelligence is able to surpass human performance also for complex tasks (e.g., Watson). Purpose of our study was to explore the potential gain in performance by independent combination of multiple observers. That way we aimed to define the upper boundary of "visual" detectability that ideally should be achieved by a computer aided detection (CAD) system. METHODS: 111 digital chest radiographs (CXR) containing a single small nodule (average diameter 16mm.) and 189 normal controls served as study group. Nodules had to be visible on the frontal radiograph with 42% of them judged as being of low and very low conspicuity. Twelve observers were asked to localize the lung nodules in the CXRs with help of bone suppressed images. Location based ROC was used for analysis. Mean sensitivity in a false positive fraction range between 0 and 0.2 was used to measure nodule localization performance. This was done for all observers separately and subsequently for the combination of multiple observers (up to 12). Observer findings were averaged when findings were located within 1.5 cm of each other. When no finding was present at the location of another observersA-A?A 1/2 finding a zero-score was assigned in the averaging calculation. RESULTS: The mean sensitivity at a false positive fraction range between 0 and 0.2 was 64.0% for single reading (range 45.5% - 78.2%). Combining the readings of two observers improved lung nodule detection on average to a mean sensitivity of 73.1%. Adding more observers lead to a further performance increase up to a mean sensitivity for 12 observers of 82.3%. On average, 26 nodules were missed by single observers, 15 nodules by a combination of 2 observers, and only 5 nodules were missed when combining 12 observers. CONCLUSION: The variable and partially low baseline performance underlines the limitation of the "single observer". If CAD is able to reach the combined performance of multiple readers, a dramatic increase of nodule localization performance can be expected with drastic reduction of missed rates.