Benchmarking radiologists and AI for indeterminate lung nodule malignancy risk estimation on screening CT: the LUNA25 Challenge

N. Antonissen, D. Peeters, B. Obreja, R. Dinnessen, Z. Saghir, M. Silva, U. Pastorino, E. Scholten, F. Mohamed Hoesein, R. Vliegenthart, H. Gietema, C. Schaefer-Prokop, M. Prokop and C. Jacobs

Annual Meeting of the Radiological Society of North America 2025.

Purpose: The global expansion of lung cancer screening will significantly increase radiologist workloads, particularly due to indeterminate lung nodules (5-15 mm) often requiring short-term follow-up to rule out malignancy. Accurate risk classification of these nodules can reduce unnecessary procedures, improve management, and ease radiologist burden. Artificial intelligence (AI) may assist in malignancy risk estimation, but rigorous benchmarking against human readers is limited. The LUNA25 challenge provides a validated benchmark to compare AI and radiologist performance in malignancy risk estimation of indeterminate nodules on low-dose CT (LDCT). Methods and Materials: The LUNA25 challenge consists of an AI competition and a reader study. AI teams develop algorithms using a publicly available dataset of 4,069 LDCT scans from the National Lung Screening Trial (NLST), containing 555 malignant and 5,608 benign nodules. Final evaluation uses a hidden external test set of indeterminate solid and part-solid nodules from baseline scans of the Danish Lung Cancer Screening Trial (DLCST), Dutch-Belgian NELSON trial, and Multicentric Italian Lung Detection (MILD) trial. The reader study, launching in May 2025, uses a subset of 300 nodules from this hidden test set. Nodules are enriched for malignancies and size-matched to benign nodules. Radiologists assess each case by assigning a malignancy risk score (0-100) and management recommendation: low risk (routine screening), intermediate risk (short-term follow-up), or high risk (referral). Performance metrics include area under the ROC curve (AUC), sensitivity, and specificity at clinically relevant thresholds. Results: Dataset curation for AI evaluation and the reader study is complete. Over 80 radiologists from diverse international centers have enrolled, with recruitment ongoing. The reader study is expected to conclude by late August 2025. Results, including a head-to-head comparison of radiologist performance and best-performing AI algorithm in malignancy risk estimation and management recommendations, will be available by year-end for presentation at RSNA. Conclusions: The LUNA25 challenge will deliver the first international benchmark comparing radiologist and AI performance for malignancy risk estimation of indeterminate lung nodules on screening CT. It will provide critical insights into the potential for AI to improve early cancer detection and minimize unnecessary follow-ups. Clinical Relevance/Application: As lung cancer screening is adopted globally, validated AI tools could help manage diagnostic burden. The LUNA25 challenge offers evidence on comparative performance of AI and radiologists for indeterminate nodules, supporting safe clinical use and regulatory evaluation.