Optimization Strategies for Interactive Classification of Interstitial Lung Disease Textures

T. Kockelkorn, R. Ramos, J. Ramos, P. de Jong, C. Schaefer-Prokop, R. Wittenberg, A. Tiehuis, J. Grutters, M. Viergever and B. van Ginneken

Frontiers in ICT 2016;3:33.

DOI

For computerized analysis of textures in interstitial lung disease, manual annotations of lung tissue are necessary. Since making these annotations is labor intensive, we previously proposed an interactive annotation framework. In this framework, observers iteratively trained a classifier to distinguish the different texture types by correcting its classification errors. In this work, we investigated three ways to extend this approach, in order to decrease the amount of user interaction required to annotate all lung tissue in a computed tomography scan. First, we conducted automatic classification experiments to test how data from previously annotated scans can be used for classification of the scan under consideration. We compared the performance of a classifier trained on data from one observer, a classifier trained on data from multiple observers, a classifier trained on consensus training data, and an ensemble of classifiers, each trained on data from different sources. Experiments were conducted without and with texture selection (ts). In the former case, training data from all eight textures was used. In the latter, only training data from the texture types present in the scan were used, and the observer would have to indicate textures contained in the scan to be analyzed. Second, we simulated interactive annotation to test the effects of (1) asking observers to perform ts before the start of annotation, (2) the use of a classifier trained on data from previously annotated scans at the start of annotation, when the interactive classifier is untrained, and (3) allowing observers to choose which interactive or automatic classification results they wanted to correct. Finally, various strategies for selecting the classification results that were presented to the observer were considered. Classification accuracies for all possible interactive annotation scenarios were compared. Using the best-performing protocol, in which observers select the textures that should be distinguished in the scan and in which they can choose which classification results to use for correction, a median accuracy of 88% was reached. The results obtained using this protocol were significantly better than results obtained with other interactive or automatic classification protocols.