Semi-Supervised Learning to Automate Tumor Bud Detection in Cytokeratin-Stained Whole-Slide Images of Colorectal Cancer

J. Bokhorst, I. Nagtegaal, I. Zlobec, H. Dawson, K. Sheahan, F. Simmer, R. Kirsch, M. Vieth, A. Lugli, J. van der Laak and F. Ciompi

Cancers 2023;15(7):2079.

DOI PMID Cited by ~7

Simple Summary Tumor budding is a promising and cost-effective histological biomarker with strong prognostic value in colorectal cancer. It is defined by the presence of single tumor cells or small clusters of cells within the tumor or at the tumor-invasion front. Deep learning based tumor bud assessment can potentially improve diagnostic reproducibility and efficiency. This study aimed to develop a deep learning algorithm to detect tumor buds in cytokeratin-stained images automatically. We used a semi-supervised learning technique to overcome the limitations of a small dataset. Validation of our model showed a sensitivity of 91% and a fairly strong correlation between a human annotator and our deep learning method. We demonstrate that the automated tumor bud count achieves a prognostic value similar to visual estimation. We also investigate new metrics for quantifying buds, such as density and dispersion, and report on their predictive value. Abstract Tumor budding is a histopathological biomarker associated with metastases and adverse survival outcomes in colorectal carcinoma (CRC) patients. It is characterized by the presence of single tumor cells or small clusters of cells within the tumor or at the tumor-invasion front. In order to obtain a tumor budding score for a patient, the region with the highest tumor bud density must first be visually identified by a pathologist, after which buds will be counted in the chosen hotspot field. The automation of this process will expectedly increase efficiency and reproducibility. Here, we present a deep learning convolutional neural network model that automates the above procedure. For model training, we used a semi-supervised learning method, to maximize the detection performance despite the limited amount of labeled training data. The model was tested on an independent dataset in which human- and machine-selected hotspots were mapped in relation to each other and manual and machine detected tumor bud numbers in the manually selected fields were compared. We report the results of the proposed method in comparison with visual assessment by pathologists. We show that the automated tumor bud count achieves a prognostic value comparable with visual estimation, while based on an objective and reproducible quantification. We also explore novel metrics to quantify buds such as density and dispersion and report their prognostic value. We have made the model available for research use on the grand-challenge platform.