Is revolution coming in deep learning architecture world? Is it early signal? Look https://lnkd.in/geWuTJkA . It beats Mumba easily. Although in early stage, but it is a potential candidate to change the transformer world. One not so contextual point: This paper is resulted from more than one year effort from the team. Salute to the authors 👏🙏. #deeplearning #RNN #transformer #llm
I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which unlocks linear complexity architectures with expressive memory, allowing us to train LLMs with millions (someday billions) of tokens in context. Arxiv to learn more: https://lnkd.in/gSCczEkF