Achieving expert level performance in quantifying 13 distinctive features of neovascular age-related macular degeneration on optical coherence tomography

B. Liefers, P. Taylor, C. González-Gonzalo, A. Tufail and C. Sánchez

European Society of Retina Specialists 2020.

Purpose:

To develop and validate an automatic model for volumetric quantification of the 13 most common abnormalities associated with neovascular age-related macular degeneration (nAMD) on optical coherence tomography (OCT).

Setting:

Clinical data and associated imaging were collected from five UK secondary care providers between February 2002 and September 2017. We identified 680 treatment-naive patients with no recent cataract surgery, at least one anti-VEGF injection, a diagnosis of nAMD, and associated OCT imaging (Topcon, Tokyo, Japan).

Methods:

A deep convolutional neural network (CNN) was used to produce a volumetric segmentation of 13 retinal abnormalities. The CNN architecture was based on a deep encoder-decoder structure that combines information from adjacent B-scans. The model was trained on 2,712 B-scans from 307 OCT volumes, with manual labels provided at a voxel-level for all abnormalities by eight graders. Abnormalities that were found in over 80 B-scans were modelled. The performance of the model and graders was assessed on an independent set of 112 B-scans from 112 OCT-volumes of nAMD cases, for which four graders independently provided annotations. To create a reference standard, the outputs of three graders were combined and defined as voxels where at least two out of three agreed. The graders' accuracy was calculated using each grader, in turn, as an observer. The Dice similarity metric was used to compare overlap, calculated per A-scan or per voxel where appropriate. Free-response receiver operator characteristic (FROC) analysis was used for the detection of small abnormalities. The intraclass correlation coefficient (ICC) was used to measure agreement on area or volume measures, with the reference area or volume defined as the average of the three graders.

Results:

Included abnormalities were: intraretinal fluid (IRF), subretinal fluid (SRF), pigment epithelial detachment (PED), subretinal hyperreflective material (SHRM), fibrosis, drusen and drusenoid PED, epiretinal membrane (ERM), outer plexiform layer (OPL) descent, ellipsoid loss, retinal pigment epithelium (RPE) loss or attenuation, hyper-transmission, hyperreflective dots and subretinal drusenoid deposits - reticular pseudodrusen (SDD - RPD). For OPL-descent and fibrosis there were insufficient examples in the test set for a reliable performance estimate.For the other features, the model obtained an average Dice score of 0.63 +- 0.15 (median 0.64), compared to 0.61 +- 0.17 (median 0.60) for the observers. The average ICC for the model was 0.66 +- 0.22 (median 0.69), compared to 0.62 +- 0.21 (median 0.55) for the observers. For individual features, differences between model and observer Dice score were within a 95% confidence interval for all features except ellipsoid loss, where model performance was slightly better (p=0.03). Regarding ICC, model performance was slightly better for IRF (p=0.04) and ellipsoid loss (p=0.006), slightly worse for drusen and drusenoid PED (p=0.03), and within the 95% confidence interval for other features. For hyperreflective dots and SDD-RPD, FROC analysis revealed that the model performed at similar sensitivity per false positives as the observers.

Conclusions:

We present a deep-learning based model that provides accurate volumetric quantification of a comprehensive set of relevant pathological components of nAMD. There was relatively large variability in grader agreement between abnormalities. Nevertheless, model performance was comparable to, and in many cases exceeded, human performance, both in terms of overlap and quantification. The model generates a precise, quantitative morphological signature of the retinal pathology that can facilitate the development of prediction models for treatment response and planning of personalized treatment intervals, as well as further research into structure/function correlation. In clinical care it can facilitate structured reporting, reducing subjectivity in clinicians' assessments and enabling implementation of refined treatment guidelines.The presented model accelerates interpretation of OCT volumes and surpasses manual reading, both in terms of attainable level of extracted information and consistency. This can potentially lead to a reduction of costs in interpretation of clinical trials and improve personalized clinical care.