Reparameterized multi-scale transformer for deformable retinal image registration.
Published in Machine Intelligence Research, 2024
Co-first author (1st) of this paper.
Abstract
Deformable retinal image registration is crucial in clinical diagnosis and longitudinal studies of retinal diseases. Most existing deep deformable retinal image registration methods focus on fully convolutional network (FCN) architecture design, which fails to model long-range dependencies among pixels—a significant factor in deformable retinal image registration. Transformers based on the self-attention mechanism, can capture global context dependencies, complementing local convolution. However, multi-scale spatial feature fusion and pixel-wise position selection are also crucial for the deformable retinal image registration, are often ignored by both FCNs and transformers. To fully leverage the merits of FCNs, multi-scale spatial attention, and transformers, we propose a hierarchical hybrid architecture, Reparameterized Multi-scale Transformer (RMFormer), for deformable retinal image registration. In RMFormer, we specifically develop a reparameterized multi-scale spatial attention to adaptively fuse multi-scale spatial features, with the assistance of the reparameterizing technique, thereby highlighting informative pixel-wise positions in a lightweight manner. The experimental results on two publicly available datasets demonstrate the superiority of our RMFormer over state-of-the-art methods and show that it is data-efficient in a limited medical image regime. Additionally, we are the first to provide a visualization analysis to explain how our proposed method affects the deformable retinal image registration process. The source code of our work is available at https://github.com/Tloops/RMFormer.