RNN and LSTM: YANG Jiancheng

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

RNN and LSTM

(Oct 12, 2016)

YANG Jiancheng
Outline

• I. Vanilla RNN
• II. LSTM
• III. GRU and Other Structures
• I. Vanilla RNN
GREAT Intro: Understanding LSTM Networks

In theory, RNNs are absolutely capable of handling such “long-


term dependencies.” A human could carefully pick parameters for
them to solve toy problems of this form. Sadly, in practice, RNNs
don’t seem to be able to learn them.
• I. Vanilla RNN

WILDML has a series of articles to introduce RNN (4 articles, 2 GitHub repos).


• I. Vanilla RNN
• Back Prop Through Time (BPTT)
• I. Vanilla RNN
• Back Prop Through Time (BPTT)
• I. Vanilla RNN
• Gradient Vanishing Problem
RNNs tend to be very deep

tanh and derivative. Source: http://nn.readthedocs.org/en/rtd/transfer/


• II. LSTM
• Differences of LSTM and Vanilla RNN
• II. LSTM
• Core Idea Behind LSTMs

Cell state Gates


• II. LSTM
• Step-by-Step Walk Through

0~1

0~1
• II. LSTM
• Step-by-Step Walk Through

0~1

0~1
• III. GRU and other structures
• Gated Recurrent Unit (GRU)

• Combines the forget and input gates into a single “update gate.”
• Merges the cell state and hidden state
• Other changes
• III. GRU and other structures
• Variants on Long Short Term Memory

Greff, et al. (2015) do a nice comparison of popular variants,


finding that they’re all about the same.
Bibliography

• [1] Understanding LSTM Networks


• [2] Back Propagation Through Time and Vanishing Gradients
Thanks for listening!

You might also like