Very Deep Learning
Very Deep Learning
Very Deep Learning
Lecture 09
No objects, just pixels Single Object Multiple Object This image is CC0 public domain
◼ One to One
◼ One to Many
◼ Many to One
Image Source: Chen-Wen et.al. Outpatient Text Classification Using Attention-Based Bidirectional LSTM for Robot-Assisted Servicing in Hospital
◼ Many to Many
◼ Many to Many
◼ Basic RNNs
^ We use t as the time index (in feedforward networks we used i as layer index)
^ Important: fh and fy do not change over time, unlike in layers of feedforward net
^ General form does not specify the form of the hidden and output mappings
◼ Multilayer RNNs
^ Deeper multi-layer RNNs can be constructed by stacking RNN layers
^ An alternative is to make each individual computation (=RNN cell) deeper
◼ For whh>1 gradients will explode (become very large, cause divergence)
^ Example: for whh = 1.1 and k = 100 we have wkhh = 13781
^ This problem is often addressed in practice using gradient clipping
^ Forward values do not explode due to bounded tanh(·) activation function
◼ For whh<1 gradients will vanish (no learning in earlier time steps):
^ Example: for whh = 0.9 and k = 100 we have wkhh = 0.0000266
^ Avoiding this problem requires an architectural change
^ But residual connections do not work here as the parameters are shared across time and
the input and desired output at each time step are different
◼ Reset gate controls which parts of the state are used to compute next target
state
◼ Update gate controls how much information to pass from previous time step
◼ Passes along an additional cell state c in addition to the hidden state h. Has 3
gates:
◼ Forget gate determines information to erase from cell state
◼ Input gate determines which values of cell state to update
◼ Output gate determines which elements of cell state to reveal at time t
◼ Remark: Cell update tanh(·) creates new target values st for cell state