Artificial Intelligence Assistance Significantly Improves Gleason Grading of Prostate Biopsies by Pathologists

W. Bulten, M. Balkenhol, J. Belinga, A. Brilhante, A. Çakır, L. Egevad, M. Eklund, X. Farré, K. Geronatsiou, V. Molinié, G. Pereira, P. Roy, G. Saile, P. Salles, E. Schaafsma, J. Tschui, A. Vos, B. Delahunt, H. Samaratunga, D. Grignon, A. Evans, D. Berney, C. Pan, G. Kristiansen, J. Kench, J. Oxley, K. Leite, J. McKenney, P. Humphrey, S. Fine, T. Tsuzuki, M. Varma, M. Zhou, E. Comperat, D. Bostwick, K. Iczkowski, C. Magi-Galluzzi, J. Srigley, H. Takahashi, T. van der Kwast, H. van Boven, R. Vink, J. van der Laak, C. der Hulsbergen-van Kaa and G. Litjens

Modern Pathology 2020.

DOI PMID Cited by ~100

The Gleason score is the most important prognostic marker for prostate cancer patients, but it suffers from significant observer variability. Artificial intelligence (AI) systems based on deep learning can achieve pathologist-level performance at Gleason grading. However, the performance of such systems can degrade in the presence of artifacts, foreign tissue, or other anomalies. Pathologists integrating their expertise with feedback from an AI system could result in a synergy that outperforms both the individual pathologist and the system. Despite the hype around AI assistance, existing literature on this topic within the pathology domain is limited. We investigated the value of AI assistance for grading prostate biopsies. A panel of 14 observers graded 160 biopsies with and without AI assistance. Using AI, the agreement of the panel with an expert reference standard increased significantly (quadratically weighted Cohen's kappa, 0.799 vs. 0.872; p = 0.019). On an external validation set of 87 cases, the panel showed a significant increase in agreement with a panel of international experts in prostate pathology (quadratically weighted Cohen's kappa, 0.733 vs. 0.786; p = 0.003). In both experiments, on a group-level, AI-assisted pathologists outperformed the unassisted pathologists and the standalone AI system. Our results show the potential of AI systems for Gleason grading, but more importantly, show the benefits of pathologist-AI synergy.