ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Hu, Vincent Tao; Baumann, Stefan Andreas; Gui, Ming; Grebenkova, Olga; Ma, Pingchuan; Fischer, Johannes; Ommer, Björn

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.13802 (cs)

[Submitted on 20 Mar 2024 (v1), last revised 1 Apr 2024 (this version, v2)]

Title:ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Authors:Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer

View PDF HTML (experimental)

Abstract:The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce a simple, plug-and-play, zero-parameter method named Zigzag Mamba, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ $1024\times 1024$ and UCF101, MultiModal-CelebA-HQ, and MS COCO $256\times 256$ . Code will be released at this https URL

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2403.13802 [cs.CV]
	(or arXiv:2403.13802v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.13802

Submission history

From: Tao Hu [view email]
[v1] Wed, 20 Mar 2024 17:59:14 UTC (18,702 KB)
[v2] Mon, 1 Apr 2024 17:58:02 UTC (34,652 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators