Shubhrangshu Ghosh’s Post

TCS Research, IIT-KGP PhD Student, BITS-Pilani Alumni

5mo

Is revolution coming in deep learning architecture world? Is it early signal? Look https://lnkd.in/geWuTJkA . It beats Mumba easily. Although in early stage, but it is a potential candidate to change the transformer world. One not so contextual point: This paper is resulted from more than one year effort from the team. Salute to the authors 👏🙏. #deeplearning #RNN #transformer #llm

Karan Dalal

Berkeley

5mo

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which unlocks linear complexity architectures with expressive memory, allowing us to train LLMs with millions (someday billions) of tokens in context. Arxiv to learn more: https://lnkd.in/gSCczEkF

To view or add a comment, sign in

More Relevant Posts

Taranjeet singh

Head - AI and Data Science at SearchUnify
7mo
Report this post
looking into the most popular Dataset used to finetune the Embedding models: SPECTER The paper title says it all "Document-level Representation Learning using Citation-informed Transformers," The whole point is to use citations as a means to inform the Transformer architecture, creating embeddings that are rich and contextually aware. The model has shown impressive performance in capturing the nuanced relationships between scientific papers, surpassing traditional methods. The training utilizes a unique dataset comprising triplets of sentences, structured to fine-tune the embeddings effectively. Each entry is meticulously formatted as a dictionary with the key "set" containing a triplet (anchor, positive, negative): Example: {"set": [anchor, positive, negative]} Dataset card on Hugging face : https://lnkd.in/gXfziEkH
Like Comment
To view or add a comment, sign in
Bhargava Gouda

Attended Sri Sivani College of Engineering, Chilakapalem Junction, Etcherla Mandal, PIN-532402 (CC-W6)
4mo
Report this post
Workshop on Large Language Models (LLMs) I’m taking part in a workshop to learn more about Large Language Models (LLMs). This workshop covers how these models work and their applications. This workshop is designed to explore the intricacies of LLMs, including their architecture, applications, and the latest advancements in the field. It provides a comprehensive understanding of how these models work and their impact on various industries.
Like Comment
To view or add a comment, sign in
Rangaraj Singaravelu

Manager Projects
2mo
Report this post
CV6: Image Classification using CNN architecture and Transfer Learning we used VGG19, MobileNet, DenseNet and ResNet from Transfer Learning Concepts
Like Comment
To view or add a comment, sign in
Vishnu Sivan

Immersive Tech Lead @ TCS Rapid Labs, TCS Research and Innovation | Generative AI | Game Developer | Web Developer | Mobile App Developer
7mo
Report this post
Microsoft Research and Tsinghua University researchers have introduced a novel architecture, You Only Cache Once (YOCO), for large language models. The YOCO architecture presents a unique decoder-decoder framework that diverges from traditional approaches by caching key-value pairs only once. This method significantly reduces the computational overhead and memory usage typically associated with repetitive caching in large language models. YOCO efficiently processes long sequences by leveraging precomputed global KV caches throughout the model’s operation, streamlining the attention mechanism and enhancing overall performance by employing a self-decoder and a cross-decoder. Read more: https://lnkd.in/gkaxa_in Research paper: https://lnkd.in/gY847-hY
Like Comment
To view or add a comment, sign in
Ahmad Masarrat

Founder of Sin(Sign) | Co-Founder of PreCode | AI & Business Developer | Communication Systems Engineer
4mo
Report this post
To solve this, I started combining gradient clipping and custom gradient computations using tf.custom_gradient to make sure that the gradients remain stable throughout the training process, also monitoring the gradient norms and implementing early stopping based on gradient behavior has become a regular part of my working process. I had so much fun with this problem, the complexties of TensorFlow's automatic differentiation engine, especially when stepping outside the bounds of typical model architectures. It's a reminder that even with powerful tools like TensorFlow, there's always a layer of complexity beneath the surface that requires careful attention.

Ahmad Masarrat

Founder of Sin(Sign) | Co-Founder of PreCode | AI & Business Developer | Communication Systems Engineer
4mo Edited

TensorFlow’s automatic differentiation engine clearly struggles when you’re working with a non standard model architecture, this type of architecture’s layers, connections and operations are a real challenge to it, especially if it has complex dependencies between layers or non linear operations that dont do well during backpropagation. Its a stretch of brain cells but to me so far customizing an LSTM is a really fun challenge.
Like Comment
To view or add a comment, sign in
Thiago Iplinsky

Software Engineer | Zup Innovation
7mo
Report this post
"This course gives you a synopsis of the 𝐞𝐧𝐜𝐨𝐝𝐞𝐫-𝐝𝐞𝐜𝐨𝐝𝐞𝐫 architecture, which is a powerful and prevalent machine learning architecture for sequence-to-sequence tasks such as machine translation, text summarization, and question answering. "

Encoder-Decoder Architecture

cloudskillsboost.google
Like Comment
To view or add a comment, sign in
Rojin Varghese

Data Scientist Architect at Lam Research | Data Science | Machine Learning | Deep Learning | Artificial Intelligence | GenAI | LLMs | MLOps
7mo Edited
Report this post
Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models Description: Making waves in the world of Large Language Models (LLMs) is YOCO, an innovative decoder-decoder architecture. Its uniqueness lies in simultaneously decreasing memory demands, accelerating prefilling, and preserving global attention. It achieves this by merging a self-decoder, responsible for encoding KV caches, with a cross-decoder, which seeks to recycle these caches via cross-attention. Repository: Interested in exploring more? Visit the official Github repository at https://lnkd.in/gwFVNRV4 Paper: https://lnkd.in/gfgK6-t5
Like Comment
To view or add a comment, sign in
Hao Hoang

💻 AI Software Engineer at Spartan
8mo
Report this post
𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 𝐮𝐧𝐯𝐞𝐢𝐥𝐬 𝐃𝐁𝐑𝐗, 𝐚 𝐬𝐭𝐚𝐭𝐞-𝐨𝐟-𝐭𝐡𝐞-𝐚𝐫𝐭 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐦𝐨𝐝𝐞𝐥 With its complex architecture, DBRX has inference capabilities twice as fast as LLaMA2-70B and a model size about 40% smaller than Grok-1, while maintaining a similar number of operating parameters but with higher performance. Link to the article : https://lnkd.in/g-8ZTs7c
Like Comment
To view or add a comment, sign in
LOKA MANIKANTA VANGAVITI

Generative AI | Natural Language Processing(NLP) | Computer Vision | Machine Learning Algorithms
3mo Edited
Report this post
Developed a GAN model using TensorFlow to generate realistic synthetic images from a given dataset. Leveraged a dual-network architecture with a generator and discriminator for image generation and evaluation. Successfully trained and validated the model, demonstrating significant advancements in synthetic image quality.
Like Comment
To view or add a comment, sign in
Zakariae BENZOUITINE

Shaping the Future with Artificial Intelligence | Data Science | Machine Learning and Deep Learning Enthusiast | Master's Degree
2mo
Report this post
🔍How can the parameter overhead in modern CNNs be reduced without compromising model performance?🔍 The answer lies in the Network in Network (NiN) architecture, which replaces parameter-heavy fully connected layers with 1x1 convolutions and global average pooling. This design not only preserves spatial structure but also enhances nonlinearity, addressing key challenges in CNN design. In this latest Medium blog, the focus is on: - 💡NiN Blocks and their role in minimizing parameters. - 📊 comparison between VGG and NiN architectures. - 🖥️ An implementation of the NiN model with step-by-step code. This approach significantly streamlines model complexity while maintaining performance. 🧠 Read more: https://lnkd.in/euetuvgf Check out the post about VGG here: https://lnkd.in/esc7KBP3 #DeepLearning #CNN #Datascience #MachineLearning #AI
4 Comments
Like Comment
To view or add a comment, sign in

177 followers

23 Posts

View Profile Connect

Shubhrangshu Ghosh’s Post

More Relevant Posts

Explore topics