A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Qin, Chao; Cao, Jiale; Fu, Huazhu; Anwer, Rao Muhammad; Khan, Fahad Shahbaz

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.04702 (cs)

[Submitted on 9 Sep 2023]

Title:A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Authors:Chao Qin, Jiale Cao, Huazhu Fu, Rao Muhammad Anwer, Fahad Shahbaz Khan

View PDF

Abstract:Detecting breast lesion in videos is crucial for computer-aided diagnosis. Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation. We argue that such a strategy struggles to effectively perform deep feature aggregation and ignores the useful local information. To tackle these issues, we propose a spatial-temporal deformable attention based framework, named STNet. Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion. The spatial-temporal deformable attention module enables deep feature aggregation in each stage of both encoder and decoder. To further accelerate the detection speed, we introduce an encoder feature shuffle strategy for multi-frame prediction during inference. In our encoder feature shuffle strategy, we share the backbone and encoder features, and shuffle encoder features for decoder to generate the predictions of multiple frames. The experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance, while operating twice as fast inference speed. The code and model are available at this https URL.

Comments:	Accepted by MICCAI 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.04702 [cs.CV]
	(or arXiv:2309.04702v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.04702

Submission history

From: Chao Qin [view email]
[v1] Sat, 9 Sep 2023 07:00:10 UTC (4,221 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators