Ben Dickson’s Post

Software Engineer | Tech Blogger

1mo

With a few tweaks, LSTMs and GRUs can leverage parallel training and compete with Transformer models https://lnkd.in/eAP3sg3S

Minimized RNNs offer a fast and efficient alternative to Transformers

http://bdtechtalks.com

To view or add a comment, sign in

More Relevant Posts

Gürsu Gülcü

Electronics Engineer, MBA
2mo
Report this post
” 'Were RNNs all we needed?' paper shows that LSTMs & GRUs are trainable via the parallel scan algorithm by removing their hidden state dependencies from their gates, as well as removing their constraints on output range and ensuring their output is time-independent in scale, to arrive at the models called minLSTM & minGRU. These minimal versions are claimed to: (1) Address the computational limitations of their traditional counterparts, (2) Are as computationally efficient as Mamba, (3) Are competitive in performance with recent sequence models. " I hope this summary is not all you need, implementation details are in the appendices! https://lnkd.in/di8VtPVG

Were RNNs All We Needed?

arxiv.org
Like Comment
To view or add a comment, sign in
Prabhat Gaurav

Author
7mo Edited
Report this post
This one is ground breaking as now LSTM can power LLMs which can prove to be better than transformers in terms of speed and accuracy. Parallel training LSTM was issue. In future, I expect a fusion of transformer and LSTM layers can be stacked together and that can make LLMs more capable. Expecting more large context window. Expecting some more capable models will come in future which will be capable of generating algorithms on their own, based on patterns instead of just doing similarity search. Then there will be no need for ML and DL.
Like Comment
To view or add a comment, sign in
Stas B.

An AI and software developer, building startups
6mo
Report this post
Interesting tests: Thompson sampling for action explorations over an LSTM Q-learning during a Double DQN learning. Forward tests: - GRU for deep Q-values - attentions, convnets, etc, combinations - actor-critics, PPO, etc... - history summarization over replay buffers Already: - dynamic actions space - sparse reward 😝
Like Comment
To view or add a comment, sign in
Deep Learning Nerds

944 followers
7mo
Report this post
Understanding the maths behind Long Short-Term Memory (LSTM) Networks: What happens inside an LSTM cell? #lstm #schmidhuber #rnn #hochreiter #tumünchen #omr24 #transformer #deeplearning #omr #ai https://lnkd.in/gwnY4tut

Understanding the maths behind Long Short-Term Memory (LSTM) Networks: What happens inside an LSTM cell?

deeplearningnerds.com
Like Comment
To view or add a comment, sign in
Giuseppe Canale CISSP

cybersecurity | AI | ML | coding | database | art of phish founder | occasioni.it founder | secondlife.it founder.
2w
Report this post
Forget Gate in LSTM Networks 💥💥 GET FULL SOURCE CODE AT THIS LINK 👇👇 👉 https://lnkd.in/e6RvPAHD Forget gate in LSTM (Long Short-Term Memory) networks is a crucial concept in deep learning. It's responsible for controlling the flow of information within the memory cells, allowing the network to learn long-term dependencies. In this video, we'll delve into the inner workings of the forget gate, exploring its role in the LSTM architecture and how it affects the overall performance of the network. As we examine the forget gate, we'll discuss its relationship with the input gate and the output gate, and how they work together to update the memory cells. We'll also analyze the different techniques used to implement the forget gate, such as the sigmoid activation function and the tanh activation function. The forget gate plays a vital role in the LSTM network's ability to learn and remember relevant information over long periods. Its efficient removal or modification can lead to significant improvements in the network's performance, especially in tasks involving sequence data. To reinforce your understanding of the forget gate and its applications, consider exploring the following topics: * Detailed analysis of the LSTM architecture * Implementation of LSTMs in various frameworks, such as TensorFlow and PyTorch * Applications of LSTMs in natural language processing and time series forecasting Additional Resources: [None] #stem #lstm #deeplearning #ai #machinelearning #artificialintelligence #forgetgate #longshorttermmemory #memorycells #encoding #recurrentneuralnetworks #rnn Find this and all other slideshows for free on our website: https://lnkd.in/e6RvPAHD #stem #lstm #deeplearning #ai #machinelearning #artificialintelligence #forgetgate #longshorttermmemory #memorycells #encoding #recurrentneuralnetworks #rnn https://lnkd.in/ecSeWtKj

Forget Gate in LSTM Networks

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Tejaswi Kashyap
5mo
Report this post
Exploring KV Cache Quantization: Discover how reducing bit precision can optimize memory usage in LLMs, making LLMs more accessible and efficient. 🦙#AI #MachineLearning #DataScience #LLM

Memory Optimization in LLMs: Leveraging KV Cache Quantization for Efficient Inference

link.medium.com
Like Comment
To view or add a comment, sign in
Yash Jha

Computer Science & Economics @ Purdue University | John Martinson's Honors College Student | Prev. Kohler Co., Regal Rexnord
6mo
Report this post
I'm back with another publication! This week, I dove a little deeper into LSTMs and how to make LSTM models more accurate for better results. Hope you all can give it a read!

Making LSTM Models More Accurate

link.medium.com
Like Comment
To view or add a comment, sign in
Sebastian Hunte

Deep Tech Investor at AlbionVC
2mo
Report this post
Really interesting paper which shows that by simplifying traditional RNN architectures, it's possible to create efficient, parallelisable models that perform as well as state-of-the-art sequence models like Transformers and State Space Models. These versions of traditional LSTMs and GRUs address the computational inefficiencies that in part led to the rise of Transformers while maintaining strong performance across various tasks. This speaks to the idea that in the deep learning paradigm - apart from the inductive biases implicit in architectures that help with efficiency of learning/generalisability for specific data types (e.g. sequences vs spatial etc) - it is less about the architecture and more about the dataset with many architectures converging to the same performance in large and high quality data contexts. https://lnkd.in/eUV9DyB3

Were RNNs All We Needed?

arxiv.org
Like Comment
To view or add a comment, sign in
Hassaan Idrees

Machine Learning || Python || NLP & Computer Vision Enthusiast || Innovating with AI
5mo
Report this post
🚀 New Blog Post Alert! 🚀 I'm excited to share my latest Medium article: "RNN vs. LSTM vs. GRU: A Comprehensive Guide to Sequential Data Modeling". In this post, I delve into the intricacies of Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs), explaining their differences and applications in sequential data modeling. This guide is perfect for anyone looking to understand these powerful tools in machine learning. Join me in exploring these essential models and their impact on sequential data analysis! 🌟 #MachineLearning #DataScience #AI #RNN #LSTM #GRU #SequentialData #Blogging #Medium #TechEducation #MLStudents

RNN vs. LSTM vs. GRU: A Comprehensive Guide to Sequential Data Modeling

link.medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Leliuga

80 followers
6mo
Report this post
Since around 1990, the recurrent neural network (RNN) Long Short-Term Memory (LSTM) has been cruising along without any radical changes. But guess what? We've shaken things up! Did we just slap an "x" on the front and call it a day? Extended Long Short-Term Memory (xLSTM)? Nah, just kidding! Check out the full list of changes here: arxiv.org/abs/2405.04517.

xLSTM: Extended Long Short-Term Memory

arxiv.org
Like Comment
To view or add a comment, sign in

12,196 followers

View Profile Connect

Ben Dickson’s Post

Minimized RNNs offer a fast and efficient alternative to Transformers

http://bdtechtalks.com

More from this author

4 key challenges of applied machine learning

Explore topics

Ben Dickson’s Post

More Relevant Posts

Forget Gate in LSTM Networks

https://www.youtube.com/

More from this author

4 key challenges of applied machine learning

Explore topics