Skip to main content

A Spatial-Temporal Deformable Attention Based Framework for Breast Lesion Detection in Videos

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 (MICCAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14221))

Abstract

Detecting breast lesion in videos is crucial for computer-aided diagnosis. Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation. We argue that such a strategy struggles to effectively perform deep feature aggregation and ignores the useful local information. To tackle these issues, we propose a spatial-temporal deformable attention based framework, named STNet. Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion. The spatial-temporal deformable attention module enables deep feature aggregation in each stage of both encoder and decoder. To further accelerate the detection speed, we introduce an encoder feature shuffle strategy for multi-frame prediction during inference. In our encoder feature shuffle strategy, we share the backbone and encoder features, and shuffle encoder features for decoder to generate the predictions of multiple frames. The experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance, while operating twice as fast inference speed. The code and model are available at https://github.com/AlfredQin/STNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, S., et al.: Semi-supervised breast lesion detection in ultrasound video based on temporal coherence. arXiv:1907.06941 (2019)

  2. Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10337–10346 (2020)

    Google Scholar 

  3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  4. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)

    Google Scholar 

  5. Gong, T., et al.: Temporal Rol align for video object recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1442–1450 (2021)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  7. Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)

    Google Scholar 

  8. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  9. Lin, Z., Lin, J., Zhu, L., Fu, H., Qin, J., Wang, L.: A new dataset and a baseline model for breast lesion detection in ultrasound videos. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. LNCS, vol. 13433. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16437-8_59

  10. Movahedi, M.M., Zamani, A., Parsaei, H., Tavakoli Golpaygani, A., Haghighi Poya, M.R.: Automated analysis of ultrasound videos for detection of breast lesions. Middle East J. Cancer 11(1), 80–90 (2020)

    Google Scholar 

  11. Qi, X., et al.: Automated diagnosis of breast ultrasonography images using deep neural networks. Med. Image Anal. 52, 185–198 (2019)

    Article  Google Scholar 

  12. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)

  13. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  14. Vu, T., Jang, H., Pham, T.X., Yoo, C.D.: Cascade RPN: delving into high-quality region proposal network with adaptive convolution. arXiv preprint arXiv:1909.06720 (2019)

  15. Wu, H., Chen, Y., Wang, N., Zhang, Z.: Sequence level semantics aggregation for video object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9217–9225 (2019)

    Google Scholar 

  16. Xue, C., et al.: Global guidance network for breast lesion segmentation in ultrasound images. Med. Image Anal. 70, 101989 (2021)

    Article  Google Scholar 

  17. Yang, Z., Gong, X., Guo, Y., Liu, W.: A temporal sequence dual-branch network for classifying hybrid ultrasound data of breast cancer. IEEE Access 8, 82688–82699 (2020)

    Article  Google Scholar 

  18. Yap, M.H., et al.: Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J. Biomed. Health Inform. 22(4), 1218–1226 (2017)

    Article  Google Scholar 

  19. Zhang, E., Seiler, S., Chen, M., Lu, W., Gu, X.: BIRADS features-oriented semi-supervised deep learning for breast ultrasound computer-aided diagnosis. Phys. Med. Biol. 65(12), 125005 (2020)

    Article  Google Scholar 

  20. Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: VarifocalNet: an IoU-aware dense object detector. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8510–8519 (2021)

    Google Scholar 

  21. Zhu, L., et al.: A second-order subregion pooling network for breast lesion segmentation in ultrasound. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12266, pp. 160–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_16

    Chapter  Google Scholar 

  22. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)

    Google Scholar 

  23. Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)

    Google Scholar 

  24. Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)

    Google Scholar 

Download references

Acknowledgment

This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-TC-2021-003) and Agency for Science, Technology and Research (A*STAR) Central Research Fund (CRF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Qin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qin, C., Cao, J., Fu, H., Anwer, R.M., Khan, F.S. (2023). A Spatial-Temporal Deformable Attention Based Framework for Breast Lesion Detection in Videos. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14221. Springer, Cham. https://doi.org/10.1007/978-3-031-43895-0_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43895-0_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43894-3

  • Online ISBN: 978-3-031-43895-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics