Memory-efficient 2.5D convolutional transformer networks for multi-modal deformable registration with weak label supervision applied to whole-heart CT and MRI scans

A. Hering, S. Kuckertz, S. Heldmann and M. Heinrich

Computer Assisted Radiology and Surgery 2019.

PURPOSE: Despite its potential for improvements through supervision, deep learning-based registration approaches are difficult to train for large deformations in 3D scans due to excessive memory requirements. METHODS: We propose a new 2.5D convolutional transformer architecture that enables us to learn a memory-efficient weakly supervised deep learning model for multi-modal image registration. Furthermore, we firstly integrate a volume change control term into the loss function of a deep learning-based registration method to penalize occurring foldings inside the deformation field. RESULTS: Our approach succeeds at learning large deformations across multi-modal images. We evaluate our approach on 100 pair-wise registrations of CT and MRI whole-heart scans and demonstrate considerably higher Dice Scores (of 0.74) compared to a state-of-the-art unsupervised discrete registration framework (deeds with Dice of 0.71). CONCLUSION: Our proposed memory-efficient registration method performs better than state-of-the-art conventional registration methods. By using a volume change control term in the loss function, the number of occurring foldings can be considerably reduced on new registration cases.