A Novel Multiple-Instance Learning-Based Approach to Computer-Aided Detection of Tuberculosis on Chest X-Rays

J. Melendez, B. van Ginneken, P. Maduskar, R. Philipsen, K. Reither, M. Breuninger, I. Adetifa, R. Maane, H. Ayles and C. Sánchez

IEEE Transactions on Medical Imaging 2015;34(1):179-192.

DOI PMID Cited by ~107

In order to reach performance levels comparable to those of human experts, computer-aided detection (CAD) systems are typically optimized by means of a supervised learning approach that relies on large training databases comprising manually annotated lesions. However, manually outlining those lesions constitutes a difficult and time-consuming process that renders detailedly annotated data often difficult to obtain. In this paper, we investigate an alternative pattern classification approach, namely multiple-instance learning (MIL), that does not require such detailed information for a CAD system to be optimized. We have applied MIL to a CAD system aimed at detecting textural lesions associated with tuberculosis. Only the case (or image) condition (normal or abnormal), which was determined by radiological means, was required during training. Based upon the well-known miSVM technique, we propose a novel algorithm, specifically designed for our CAD application, that overcomes serious drawbacks of the former related to underestimation of the positive instances and costly iteration. The key of the proposed method is to use probability estimates instead of decision values to guide the MIL procedure. In addition, we include countermeasures that deal with the uncertainty resulting from instance relabeling. To show the advantages of our MIL-based approach as compared with a traditional supervised one, experiments with three different image databases were conducted. The area under the receiver operating characteristic curve was utilized as a performance measure. With the first database, for which training lesion annotations were available, the supervised system was not much better than our MILbased method (0:88 vs. 0:86). Thus, the proposed approach achieved highly competitive results without resorting to lesionlevel information and the associated annotation process. When evaluating the remaining databases, given their large difference with respect to the previous image set, the most appealing strategy to maintain good performance was to retrain the CAD systems considering the new data. However, since only the image condition was available in this case, only the MIL-based system could be retrained. This scenario, which is common in realworld applications, clearly demonstrates the better adaptation capabilities of the proposed approach. After retraining, our MILbased system significantly outperformed the supervised one (0:86 vs. 0:79 and 0:91 vs. 0:85, p < 0:0001 and p = 0:0002, respectively).