Common Limitations of Image Processing Metrics: A Picture Story

A. Reinke, M. Eisenmann, M. Tizabi, C. Sudre, T. Radsch, M. Antonelli, T. Arbel, S. Bakas, M. Cardoso, V. Cheplygina, K. Farahani, B. Glocker, D. Heckmann-Notzel, F. Isensee, P. Jannin, C. Kahn, J. Kleesiek, T. Kurc, M. Kozubek, B. Landman, G. Litjens, K. Maier-Hein, B. Menze, H. Muller, J. Petersen, M. Reyes, N. Rieke, B. Stieltjes, R. Summers, S. Tsaftaris, B. van Ginneken, A. Kopp-Schneider, P. Jager and L. Maier-Hein

arXiv preprint arXiv:2104.05642 2021.

DOI arXiv Cited by ~103

While the importance of automatic image analysis is increasing at an enormous pace, recent meta-research revealed major flaws with respect to algorithm validation. Specifically, performance metrics are key for objective, transparent and comparative performance assessment, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. A common mission of several international initiatives is therefore to provide researchers with guidelines and tools to choose the performance metrics in a problem-aware manner. This dynamically updated document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts.