Upload 1
Upload 1
Upload 1
The term "recurrent neural network" is used to refer to the class of networks with
an infinite impulse response, whereas "convolutional neural network" refers to the
class of finite impulse response. Both classes of networks exhibit temporal dynamic
behavior.[8] A finite impulse recurrent network is a directed acyclic graph that
can be unrolled and replaced with a strictly feedforward neural network, while an
infinite impulse recurrent network is a directed cyclic graph that can not be
unrolled.
Both finite impulse and infinite impulse recurrent networks can have additional
stored states, and the storage can be under direct control by the neural network.
The storage can also be replaced by another network or graph if that incorporates
time delays or has feedback loops. Such controlled states are referred to as gated
state or gated memory, and are part of long short-term memory networks (LSTMs) and
gated recurrent units. This is also called Feedback Neural Network (FNN).
Contents
1 History
1.1 LSTM
2 Architectures
2.1 Fully recurrent
2.2 Elman networks and Jordan networks
2.3 Hopfield
2.4 Echo state
2.5 Independently RNN (IndRNN)
2.6 Recursive
2.7 Neural history compressor
2.8 Second order RNNs
2.9 Long short-term memory
2.10 Gated recurrent unit
2.11 Bi-directional
2.12 Continuous-time
2.13 Hierarchical
2.14 Recurrent multilayer perceptron network
2.15 Multiple timescales model
2.16 Neural Turing machines
2.17 Differentiable neural computer
2.18 Neural network pushdown automata
2.19 Memristive Networks
3 Training
3.1 Gradient descent
3.2 Global optimization methods
4 Related fields and models
5 Libraries
6 Applications
7 References
8 Further reading
9 External links
History
Recurrent neural networks were based on David Rumelhart's work in 1986.[9] Hopfield
networks – a special kind of RNN – were (re-)discovered by John Hopfield in 1982.
In 1993, a neural history compressor system solved a "Very Deep Learning" task that
required more than 1000 subsequent layers in an RNN unfolded in time.[10]
LSTM
Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber
in 1997 and set accuracy records in multiple applications domains.[11]
LSTM broke records for improved machine translation,[20] Language Modeling[21] and
Multilingual Language Processing.[22] LSTM combined with convolutional neural
networks (CNNs) improved automatic image captioning.[23]
Architectures
Main article: Layer (deep learning)
RNNs come in many variants.
Fully recurrent
Compressed (left) and unfolded (right) basic recurrent neural network.
Fully recurrent neural networks (FRNN) connect the outputs of all neurons to the
inputs of all neurons. This is the most general neural network topology because all
other topologies can be represented by setting some connection weights to zero to
simulate the lack of connections between those neurons. The illustration to the
right may be misleading to many because practical neural network topologies are
frequently organized in "layers" and the drawing gives that appearance. However,
what appears to be layers are, in fact, different steps in time of the same fully
recurrent neural network. The left-most item in the illustration shows the
recurrent connections as the arc labeled 'v'. It is "unfolded" in time to produce
the appearance of layers.
Jordan networks are similar to Elman networks. The context units are fed from the
output layer instead of the hidden layer. The context units in a Jordan network are
also referred to as the state layer. They have a recurrent connection to
themselves.[24]
Elman and Jordan networks are also known as "Simple recurrent networks" (SRN).
Elman network[25]
{\displaystyle {\begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}h_{t-1}+b_{h})\\
y_{t}&=\sigma _{y}(W_{y}h_{t}+b_{y})\end{aligned}}}{\displaystyle {\
begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}h_{t-1}+b_{h})\\y_{t}&=\sigma _{y}
(W_{y}h_{t}+b_{y})\end{aligned}}}
Jordan network[26]
{\displaystyle {\begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}y_{t-1}+b_{h})\\
y_{t}&=\sigma _{y}(W_{y}h_{t}+b_{y})\end{aligned}}}{\displaystyle {\
begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}y_{t-1}+b_{h})\\y_{t}&=\sigma _{y}
(W_{y}h_{t}+b_{y})\end{aligned}}}
Variables and functions
A BAM network has two layers, either of which can be driven as an input to recall
an association and produce an output on the other layer.[29]
Echo state
Main article: Echo state network
The echo state network (ESN) has a sparsely connected random hidden layer. The
weights of output neurons are the only part of the network that can change (be
trained). ESNs are good at reproducing certain time series.[30] A variant for
spiking neurons is known as a liquid state machine.[31]
Recursive
Main article: Recursive neural network
A recursive neural network[33] is created by applying the same set of weights
recursively over a differentiable graph-like structure by traversing the structure
in topological order. Such networks are typically also trained by the reverse mode
of automatic differentiation.[34][35] They can process distributed representations
of structure, such as logical terms. A special case of recursive neural networks is
the RNN whose structure corresponds to a linear chain. Recursive neural networks
have been applied to natural language processing.[36] The Recursive Neural Tensor
Network uses a tensor-based composition function for all nodes in the tree.[37]
The system effectively minimises the description length or the negative logarithm
of the probability of the data.[39] Given a lot of learnable predictability in the
incoming data sequence, the highest level RNN can use supervised learning to easily
classify even deep sequences with long intervals between important events.
It is possible to distill the RNN hierarchy into two RNNs: the "conscious" chunker
(higher level) and the "subconscious" automatizer (lower level).[38] Once the
chunker has learned to predict and compress inputs that are unpredictable by the
automatizer, then the automatizer can be forced in the next learning phase to
predict or imitate through additional units the hidden units of the more slowly
changing chunker. This makes it easy for the automatizer to learn appropriate,
rarely changing memories across long intervals. In turn, this helps the automatizer
to make many of its once unpredictable inputs predictable, such that the chunker
can focus on the remaining unpredictable events.[38]
Many applications use stacks of LSTM RNNs[45] and train them by Connectionist
Temporal Classification (CTC)[46] to find an RNN weight matrix that maximizes the
probability of the label sequences in a training set, given the corresponding input
sequences. CTC achieves both alignment and recognition.
Bi-directional
Main article: Bidirectional recurrent neural networks
Bi-directional RNNs use a finite sequence to predict or label each element of the
sequence based on the element's past and future contexts. This is done by
concatenating the outputs of two RNNs, one processing the sequence from left to
right, the other one from right to left. The combined outputs are the predictions
of the teacher-given target signals. This technique has been proven to be
especially useful when combined with LSTM RNNs.[52][53]
Continuous-time
A continuous-time recurrent neural network (CTRNN) uses a system of ordinary
differential equations to model the effects on a neuron of the incoming inputs.
Note that, by the Shannon sampling theorem, discrete time recurrent neural networks
can be viewed as continuous-time recurrent neural networks where the differential
equations have transformed into equivalent difference equations.[57] This
transformation can be thought of as occurring after the post-synaptic node
activation functions {\displaystyle y_{i}(t)}y_{i}(t) have been low-pass filtered
but prior to sampling.
Hierarchical
[icon]
This section needs expansion. You can help by adding to it. (August 2019)
Hierarchical RNNs connect their neurons in various ways to decompose hierarchical
behavior into useful subprograms.[38][58] Such hierarchical structures of cognition
are present in theories of memory presented by philosopher Henri Bergson, whose
philosophical views have inspired hierarchical models.[59]
Memristive Networks
Greg Snider of HP Labs describes a system of cortical computing with memristive
nanodevices.[66] The memristors (memory resistors) are implemented by thin film
materials in which the resistance is electrically tuned via the transport of ions
or oxygen vacancies within the film. DARPA's SyNAPSE project has funded IBM
Research and HP Labs, in collaboration with the Boston University Department of
Cognitive and Neural Systems (CNS), to develop neuromorphic architectures which may
be based on memristive systems. Memristive networks are a particular type of
physical neural network that have very similar properties to (Little-)Hopfield
networks, as they have a continuous dynamics, have a limited memory capacity and
they natural relax via the minimization of a function which is asymptotic to the
Ising model. In this sense, the dynamics of a memristive circuit has the advantage
compared to a Resistor-Capacitor network to have a more interesting non-linear
behavior. From this point of view, engineering an analog memristive networks
accounts to a peculiar type of neuromorphic engineering in which the device
behavior depends on the circuit wiring, or topology. [67][68]
Training
Gradient descent
Main article: Gradient descent
Gradient descent is a first-order iterative optimization algorithm for finding the
minimum of a function. In neural networks, it can be used to minimize the error
term by changing each weight in proportion to the derivative of the error with
respect to that weight, provided the non-linear activation functions are
differentiable. Various methods for doing so were developed in the 1980s and early
1990s by Werbos, Williams, Robinson, Schmidhuber, Hochreiter, Pearlmutter and
others.
A major problem with gradient descent for standard RNN architectures is that error
gradients vanish exponentially quickly with the size of the time lag between
important events.[40][79] LSTM combined with a BPTT/RTRL hybrid learning method
attempts to overcome these problems.[11] This problem is also solved in the
independently recurrent neural network (IndRNN)[32] by reducing the context of a
neuron to its own past state and the cross-neuron information can then be explored
in the following layers. Memories of different range including long-term memory can
be learned without the gradient vanishing and exploding problem.
The most common global optimization method for training RNNs is genetic algorithms,
especially in unstructured networks.[83][84][85]
Initially, the genetic algorithm is encoded with the neural network weights in a
predefined manner where one gene in the chromosome represents one weight link. The
whole network is represented as a single chromosome. The fitness function is
evaluated as follows:
Each weight encoded in the chromosome is assigned to the respective weight link of
the network.
The training set is presented to the network which propagates the input signals
forward.
The mean-squared-error is returned to the fitness function.
This function drives the genetic selection process.
Many chromosomes make up the population; therefore, many different neural networks
are evolved until a stopping criterion is satisfied. A common stopping scheme is:
When the neural network has learnt a certain percentage of the training data or
When the minimum value of the mean-squared-error is satisfied or
When the maximum number of training generations has been reached.
The stopping criterion is evaluated by the fitness function as it gets the
reciprocal of the mean-squared-error from each network during training. Therefore,
the goal of the genetic algorithm is to maximize the fitness function, reducing the
mean-squared-error.
They are in fact recursive neural networks with a particular structure: that of a
linear chain. Whereas recursive neural networks operate on any hierarchical
structure, combining child representations into parent representations, recurrent
neural networks operate on the linear progression of time, combining the previous
time step and a hidden representation into the representation for the current time
step.
In particular, RNNs can appear as nonlinear versions of finite impulse response and
infinite impulse response filters and also as a nonlinear autoregressive exogenous
model (NARX).[86]
Libraries
Apache Singa
Caffe: Created by the Berkeley Vision and Learning Center (BVLC). It supports both
CPU and GPU. Developed in C++, and has Python and MATLAB wrappers.
Chainer: The first stable deep learning library that supports dynamic, define-by-
run neural networks. Fully in Python, production support for CPU, GPU, distributed
training.
Deeplearning4j: Deep learning in Java and Scala on multi-GPU-enabled Spark. A
general-purpose deep learning library for the JVM production stack running on a C++
scientific computing engine. Allows the creation of custom layers. Integrates with
Hadoop and Kafka.
Flux: includes interfaces for RNNs, including GRUs and LSTMs, written in Julia.
Keras: High-level, easy to use API, providing a wrapper to many other deep learning
libraries.
Microsoft Cognitive Toolkit
MXNet: a modern open-source deep learning framework used to train and deploy deep
neural networks.
PyTorch: Tensors and Dynamic neural networks in Python with strong GPU
acceleration.
TensorFlow: Apache 2.0-licensed Theano-like library with support for CPU, GPU and
Google's proprietary TPU,[87] mobile
Theano: The reference deep-learning library for Python with an API largely
compatible with the popular NumPy library. Allows user to write symbolic
mathematical expressions, then automatically generates their derivatives, saving
the user from having to code gradients or backpropagation. These symbolic
expressions are automatically compiled to CUDA code for a fast, on-the-GPU
implementation.
Torch (www.torch.ch): A scientific computing framework with wide support for
machine learning algorithms, written in C and lua. The main author is Ronan
Collobert, and it is now used at Facebook AI Research and Twitter.
Applications
Applications of recurrent neural networks include:
Machine translation[20]
Robot control[88]
Time series prediction[89][90][91]
Speech recognition[92][93][94]
Speech synthesis[95]
Brain–computer interfaces[96]
Time series anomaly detection[97]
Rhythm learning[98]
Music composition[99]
Grammar learning[100][101][102]
Handwriting recognition[103][104]
Human action recognition[105]
Protein homology detection[106]
Predicting subcellular localization of proteins[53]
Several prediction tasks in the area of business process management[107]
Prediction in medical care pathways[108]
References
Dupond, Samuel (2019). "A thorough review on the current advance of neural network
structures". Annual Reviews in Control. 14: 200–230.
Abiodun, Oludare Isaac; Jantan, Aman; Omolara, Abiodun Esther; Dada, Kemi
Victoria; Mohamed, Nachaat Abdelatif; Arshad, Humaira (2018-11-01). "State-of-the-
art in artificial neural network applications: A survey". Heliyon. 4 (11): e00938.
doi:10.1016/j.heliyon.2018.e00938. ISSN 2405-8440. PMC 6260436. PMID 30519653.
Tealab, Ahmed (2018-12-01). "Time series forecasting using artificial neural
networks methodologies: A systematic review". Future Computing and Informatics
Journal. 3 (2): 334–340. doi:10.1016/j.fcij.2018.10.003. ISSN 2314-7288.
Graves, Alex; Liwicki, Marcus; Fernandez, Santiago; Bertolami, Roman; Bunke,
Horst; Schmidhuber, Jürgen (2009). "A Novel Connectionist System for Improved
Unconstrained Handwriting Recognition" (PDF). IEEE Transactions on Pattern Analysis
and Machine Intelligence. 31 (5): 855–868. CiteSeerX 10.1.1.139.4502.
doi:10.1109/tpami.2008.137. PMID 19299860. S2CID 14635907.
Sak, Haşim; Senior, Andrew; Beaufays, Françoise (2014). "Long Short-Term Memory
recurrent neural network architectures for large scale acoustic modeling" (PDF).
Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based
Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition".
arXiv:1410.4281 [cs.CL].
Hyötyniemi, Heikki (1996). "Turing machines are recurrent neural networks".
Proceedings of STeP '96/Publications of the Finnish Artificial Intelligence
Society: 13–24.
Miljanovic, Milos (Feb–Mar 2012). "Comparative analysis of Recurrent and Finite
Impulse Response Neural Networks in Time Series Prediction" (PDF). Indian Journal
of Computer and Engineering. 3 (1).
Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E. (October 1986).
"Learning representations by back-propagating errors". Nature. 323 (6088): 533–536.
Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. ISSN 1476-4687. S2CID 205001834.
Schmidhuber, Jürgen (1993). Habilitation thesis: System modeling and optimization
(PDF). Page 150 ff demonstrates credit assignment across the equivalent of 1,200
layers in an unfolded RNN.
Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Memory".
Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
S2CID 1915014.
Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of
Recurrent Neural Networks to Discriminative Keyword Spotting. Proceedings of the
17th International Conference on Artificial Neural Networks. ICANN'07. Berlin,
Heidelberg: Springer-Verlag. pp. 220–229. ISBN 978-3-540-74693-5.
Schmidhuber, Jürgen (January 2015). "Deep Learning in Neural Networks: An
Overview". Neural Networks. 61: 85–117. arXiv:1404.7828.
doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
Graves, Alex; Schmidhuber, Jürgen (2009). Koller, D.; Schuurmans, D.; Bengio, Y.;
Bottou, L. (eds.). "Offline Handwriting Recognition with Multidimensional Recurrent
Neural Networks". Advances in Neural Information Processing Systems. Neural
Information Processing Systems (NIPS) Foundation. 21: 545–552.
"2000 HUB5 English Evaluation Speech - Linguistic Data Consortium".
catalog.ldc.upenn.edu.
Hannun, Awni; Case, Carl; Casper, Jared; Catanzaro, Bryan; Diamos, Greg; Elsen,
Erich; Prenger, Ryan; Satheesh, Sanjeev; Sengupta, Shubho (2014-12-17). "Deep
Speech: Scaling up end-to-end speech recognition". arXiv:1412.5567 [cs.CL].
Fan, Bo; Wang, Lijuan; Soong, Frank K.; Xie, Lei (2015) "Photo-Real Talking Head
with Deep Bidirectional LSTM", in Proceedings of ICASSP 2015
Zen, Heiga; Sak, Haşim (2015). "Unidirectional Long Short-Term Memory Recurrent
Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis" (PDF).
Google.com. ICASSP. pp. 4470–4474.
Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan
(September 2015). "Google voice search: faster and more accurate".
Sutskever, Ilya; Vinyals, Oriol; Le, Quoc V. (2014). "Sequence to Sequence
Learning with Neural Networks" (PDF). Electronic Proceedings of the Neural
Information Processing Systems Conference. 27: 5346. arXiv:1409.3215.
Bibcode:2014arXiv1409.3215S.
Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui
(2016-02-07). "Exploring the Limits of Language Modeling". arXiv:1602.02410
[cs.CL].
Gillick, Dan; Brunk, Cliff; Vinyals, Oriol; Subramanya, Amarnag (2015-11-30).
"Multilingual Language Processing From Bytes". arXiv:1512.00103 [cs.CL].
Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru (2014-11-17).
"Show and Tell: A Neural Image Caption Generator". arXiv:1411.4555 [cs.CV].
Cruse, Holk; Neural Networks as Cybernetic Systems, 2nd and revised edition
Elman, Jeffrey L. (1990). "Finding Structure in Time". Cognitive Science. 14 (2):
179–211. doi:10.1016/0364-0213(90)90002-E.
Jordan, Michael I. (1997-01-01). "Serial Order: A Parallel Distributed Processing
Approach". Neural-Network Models of Cognition - Biobehavioral Foundations. Advances
in Psychology. Neural-Network Models of Cognition. Vol. 121. pp. 471–495.
doi:10.1016/s0166-4115(97)80111-2. ISBN 9780444819314.
Kosko, Bart (1988). "Bidirectional associative memories". IEEE Transactions on
Systems, Man, and Cybernetics. 18 (1): 49–60. doi:10.1109/21.87054. S2CID 59875735.
Rakkiyappan, Rajan; Chandrasekar, Arunachalam; Lakshmanan, Subramanian; Park, Ju
H. (2 January 2015). "Exponential stability for markovian jumping stochastic BAM
neural networks with mode-dependent probabilistic time-varying delays and impulse
control". Complexity. 20 (3): 39–65. Bibcode:2015Cmplx..20c..39R.
doi:10.1002/cplx.21503.
Rojas, Rául (1996). Neural networks: a systematic introduction. Springer. p. 336.
ISBN 978-3-540-60505-8.
Jaeger, Herbert; Haas, Harald (2004-04-02). "Harnessing Nonlinearity: Predicting
Chaotic Systems and Saving Energy in Wireless Communication". Science. 304 (5667):
78–80. Bibcode:2004Sci...304...78J. CiteSeerX 10.1.1.719.2301.
doi:10.1126/science.1091277. PMID 15064413. S2CID 2184251.
Maass, Wolfgang; Natschläger, Thomas; Markram, Henry (2002-08-20). "A fresh look
at real-time computation in generic recurrent neural circuits". Technical report.
Institute for Theoretical Computer Science, Technische Universität Graz.
Li, Shuai; Li, Wanqing; Cook, Chris; Zhu, Ce; Yanbo, Gao (2018). "Independently
Recurrent Neural Network (IndRNN): Building a Longer and Deeper RNN".
arXiv:1803.04831 [cs.CV].
Goller, Christoph; Küchler, Andreas (1996). Learning task-dependent distributed
representations by backpropagation through structure. IEEE International Conference
on Neural Networks. Vol. 1. p. 347. CiteSeerX 10.1.1.52.4759.
doi:10.1109/ICNN.1996.548916. ISBN 978-0-7803-3210-2. S2CID 6536466.
Linnainmaa, Seppo (1970). The representation of the cumulative rounding error of
an algorithm as a Taylor expansion of the local rounding errors. M.Sc. thesis (in
Finnish), University of Helsinki.
Griewank, Andreas; Walther, Andrea (2008). Evaluating Derivatives: Principles and
Techniques of Algorithmic Differentiation (Second ed.). SIAM. ISBN 978-0-89871-776-
1.
Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D., "Parsing
Natural Scenes and Natural Language with Recursive Neural Networks" (PDF), 28th
International Conference on Machine Learning (ICML 2011)
Socher, Richard; Perelygin, Alex; Wu, Jean Y.; Chuang, Jason; Manning, Christopher
D.; Ng, Andrew Y.; Potts, Christopher. "Recursive Deep Models for Semantic
Compositionality Over a Sentiment Treebank" (PDF). Emnlp 2013.
Schmidhuber, Jürgen (1992). "Learning complex, extended sequences using the
principle of history compression" (PDF). Neural Computation. 4 (2): 234–242.
doi:10.1162/neco.1992.4.2.234. S2CID 18271205.
Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia. 10 (11): 32832.
Bibcode:2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.
Hochreiter, Sepp (1991), Untersuchungen zu dynamischen neuronalen Netzen, Diploma
thesis, Institut f. Informatik, Technische Univ. Munich, Advisor Jürgen Schmidhuber
Giles, C. Lee; Miller, Clifford B.; Chen, Dong; Chen, Hsing-Hen; Sun, Guo-Zheng;
Lee, Yee-Chun (1992). "Learning and Extracting Finite State Automata with Second-
Order Recurrent Neural Networks" (PDF). Neural Computation. 4 (3): 393–405.
doi:10.1162/neco.1992.4.3.393. S2CID 19666035.
Omlin, Christian W.; Giles, C. Lee (1996). "Constructing Deterministic Finite-
State Automata in Recurrent Neural Networks". Journal of the ACM. 45 (6): 937–972.
CiteSeerX 10.1.1.32.2364. doi:10.1145/235809.235811. S2CID 228941.
Gers, Felix A.; Schraudolph, Nicol N.; Schmidhuber, Jürgen (2002). "Learning
Precise Timing with LSTM Recurrent Networks" (PDF). Journal of Machine Learning
Research. 3: 115–143. Retrieved 2017-06-13.
Bayer, Justin; Wierstra, Daan; Togelius, Julian; Schmidhuber, Jürgen (2009-09-14).
Evolving Memory Cell Structures for Sequence Learning (PDF). Artificial Neural
Networks – ICANN 2009. Lecture Notes in Computer Science. Vol. 5769. Berlin,
Heidelberg: Springer. pp. 755–764. doi:10.1007/978-3-642-04277-5_76. ISBN 978-3-
642-04276-8.
Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). "Sequence labelling
in structured domains with hierarchical recurrent neural networks". Proc. 20th
International Joint Conference on Artificial Intelligence, Ijcai 2007: 774–779.
CiteSeerX 10.1.1.79.1887.
Graves, Alex; Fernández, Santiago; Gomez, Faustino J. (2006). "Connectionist
temporal classification: Labelling unsegmented sequence data with recurrent neural
networks". Proceedings of the International Conference on Machine Learning: 369–
376. CiteSeerX 10.1.1.75.6306.
Gers, Felix A.; Schmidhuber, Jürgen (November 2001). "LSTM recurrent networks
learn simple context-free and context-sensitive languages". IEEE Transactions on
Neural Networks. 12 (6): 1333–1340. doi:10.1109/72.963769. ISSN 1045-9227. PMID
18249962. S2CID 10192330.
Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit
Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].
Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit
(GRU) Neural Networks". arXiv:1701.05923 [cs.NE].
Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014).
"Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling".
arXiv:1412.3555 [cs.NE].
Britz, Denny (October 27, 2015). "Recurrent Neural Network Tutorial, Part 4 –
Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. Retrieved
May 18, 2016.
Graves, Alex; Schmidhuber, Jürgen (2005-07-01). "Framewise phoneme classification
with bidirectional LSTM and other neural network architectures". Neural Networks.
IJCNN 2005. 18 (5): 602–610. CiteSeerX 10.1.1.331.5800.
doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
Thireou, Trias; Reczko, Martin (July 2007). "Bidirectional Long Short-Term Memory
Networks for Predicting the Subcellular Localization of Eukaryotic Proteins".
IEEE/ACM Transactions on Computational Biology and Bioinformatics. 4 (3): 441–446.
doi:10.1109/tcbb.2007.1015. PMID 17666763. S2CID 11787259.
Harvey, Inman; Husbands, Phil; Cliff, Dave (1994), "Seeing the light: Artificial
evolution, real vision", 3rd international conference on Simulation of adaptive
behavior: from animals to animats 3, pp. 392–401
Quinn, Matthew (2001). "Evolving communication without dedicated communication
channels". Advances in Artificial Life. Lecture Notes in Computer Science. Vol.
2159. pp. 357–366. CiteSeerX 10.1.1.28.5890. doi:10.1007/3-540-44811-X_38. ISBN
978-3-540-42567-0. {{cite book}}: Missing or empty |title= (help)
Beer, Randall D. (1997). "The dynamics of adaptive behavior: A research program".
Robotics and Autonomous Systems. 20 (2–4): 257–289. doi:10.1016/S0921-
8890(96)00063-2.
Sherstinsky, Alex (2018-12-07). Bloem-Reddy, Benjamin; Paige, Brooks; Kusner,
Matt; Caruana, Rich; Rainforth, Tom; Teh, Yee Whye (eds.). Deriving the Recurrent
Neural Network Definition and RNN Unrolling Using Signal Processing. Critiquing and
Correcting Trends in Machine Learning Workshop at NeurIPS-2018.
Paine, Rainer W.; Tani, Jun (2005-09-01). "How Hierarchical Control Self-organizes
in Artificial Adaptive Systems". Adaptive Behavior. 13 (3): 211–225.
doi:10.1177/105971230501300303. S2CID 9932565.
"Burns, Benureau, Tani (2018) A Bergson-Inspired Adaptive Time Constant for the
Multiple Timescales Recurrent Neural Network Model. JNNS".
Tutschku, Kurt (June 1995). Recurrent Multilayer Perceptrons for Identification
and Control: The Road to Applications. Institute of Computer Science Research
Report. Vol. 118. University of Würzburg Am Hubland. CiteSeerX 10.1.1.45.3527.
Yamashita, Yuichi; Tani, Jun (2008-11-07). "Emergence of Functional Hierarchy in a
Multiple Timescale Neural Network Model: A Humanoid Robot Experiment". PLOS
Computational Biology. 4 (11): e1000220. Bibcode:2008PLSCB...4E0220Y.
doi:10.1371/journal.pcbi.1000220. PMC 2570613. PMID 18989398.
Alnajjar, Fady; Yamashita, Yuichi; Tani, Jun (2013). "The hierarchical and
functional connectivity of higher-order cognitive mechanisms: neurorobotic model to
investigate the stability and flexibility of working memory". Frontiers in
Neurorobotics. 7: 2. doi:10.3389/fnbot.2013.00002. PMC 3575058. PMID 23423881.
"Proceedings of the 28th Annual Conference of the Japanese Neural Network Society
(October, 2018)" (PDF).
Graves, Alex; Wayne, Greg; Danihelka, Ivo (2014). "Neural Turing Machines".
arXiv:1410.5401 [cs.NE].
Sun, Guo-Zheng; Giles, C. Lee; Chen, Hsing-Hen (1998). "The Neural Network
Pushdown Automaton: Architecture, Dynamics and Training". In Giles, C. Lee; Gori,
Marco (eds.). Adaptive Processing of Sequences and Data Structures. Lecture Notes
in Computer Science. Berlin, Heidelberg: Springer. pp. 296–345. CiteSeerX
10.1.1.56.8723. doi:10.1007/bfb0054003. ISBN 9783540643418.
Snider, Greg (2008), "Cortical computing with memristive nanodevices", Sci-DAC
Review, 10: 58–65
Caravelli, Francesco; Traversa, Fabio Lorenzo; Di Ventra, Massimiliano (2017).
"The complex dynamics of memristive circuits: analytical results and universal slow
relaxation". Physical Review E. 95 (2): 022140. arXiv:1608.08651.
Bibcode:2017PhRvE..95b2140C. doi:10.1103/PhysRevE.95.022140. PMID 28297937. S2CID
6758362.
Caravelli, Francesco (2019-11-07). "Asymptotic Behavior of Memristive Circuits".
Entropy. 21 (8): 789. Bibcode:2019Entrp..21..789C. doi:10.3390/e21080789. PMC 789.
PMID 33267502.
Werbos, Paul J. (1988). "Generalization of backpropagation with application to a
recurrent gas market model". Neural Networks. 1 (4): 339–356. doi:10.1016/0893-
6080(88)90007-x. S2CID 205001834.
Rumelhart, David E. (1985). Learning Internal Representations by Error
Propagation. San Diego (CA): Institute for Cognitive Science, University of
California.
Robinson, Anthony J.; Fallside, Frank (1987). The Utility Driven Dynamic Error
Propagation Network. Technical Report CUED/F-INFENG/TR.1. Department of
Engineering, University of Cambridge.
Williams, Ronald J.; Zipser, D. (1 February 2013). "Gradient-based learning
algorithms for recurrent networks and their computational complexity". In Chauvin,
Yves; Rumelhart, David E. (eds.). Backpropagation: Theory, Architectures, and
Applications. Psychology Press. ISBN 978-1-134-77581-1.
Schmidhuber, Jürgen (1989-01-01). "A Local Learning Algorithm for Dynamic
Feedforward and Recurrent Networks". Connection Science. 1 (4): 403–412.
doi:10.1080/09540098908915650. S2CID 18721007.
Príncipe, José C.; Euliano, Neil R.; Lefebvre, W. Curt (2000). Neural and adaptive
systems: fundamentals through simulations. Wiley. ISBN 978-0-471-35167-2.
Yann, Ollivier; Tallec, Corentin; Charpiat, Guillaume (2015-07-28). "Training
recurrent networks online without backtracking". arXiv:1507.07680 [cs.NE].
Schmidhuber, Jürgen (1992-03-01). "A Fixed Size Storage O(n3) Time Complexity
Learning Algorithm for Fully Recurrent Continually Running Networks". Neural
Computation. 4 (2): 243–248. doi:10.1162/neco.1992.4.2.243. S2CID 11761172.
Williams, Ronald J. (1989). "Complexity of exact gradient computation algorithms
for recurrent neural networks". Technical Report NU-CCS-89-27. Boston (MA):
Northeastern University, College of Computer Science.
Pearlmutter, Barak A. (1989-06-01). "Learning State Space Trajectories in
Recurrent Neural Networks". Neural Computation. 1 (2): 263–269.
doi:10.1162/neco.1989.1.2.263. S2CID 16813485.
Hochreiter, Sepp; et al. (15 January 2001). "Gradient flow in recurrent nets: the
difficulty of learning long-term dependencies". In Kolen, John F.; Kremer, Stefan
C. (eds.). A Field Guide to Dynamical Recurrent Networks. John Wiley & Sons. ISBN
978-0-7803-5369-5.
Campolucci, Paolo; Uncini, Aurelio; Piazza, Francesco; Rao, Bhaskar D. (1999).
"On-Line Learning Algorithms for Locally Recurrent Neural Networks". IEEE
Transactions on Neural Networks. 10 (2): 253–271. CiteSeerX 10.1.1.33.7550.
doi:10.1109/72.750549. PMID 18252525.
Wan, Eric A.; Beaufays, Françoise (1996). "Diagrammatic derivation of gradient
algorithms for neural networks". Neural Computation. 8: 182–201.
doi:10.1162/neco.1996.8.1.182. S2CID 15512077.
Campolucci, Paolo; Uncini, Aurelio; Piazza, Francesco (2000). "A Signal-Flow-Graph
Approach to On-line Gradient Calculation". Neural Computation. 12 (8): 1901–1927.
CiteSeerX 10.1.1.212.5406. doi:10.1162/089976600300015196. PMID 10953244. S2CID
15090951.
Gomez, Faustino J.; Miikkulainen, Risto (1999), "Solving non-Markovian control
tasks with neuroevolution" (PDF), IJCAI 99, Morgan Kaufmann, retrieved 5 August
2017
Syed, Omar (May 1995). "Applying Genetic Algorithms to Recurrent Neural Networks
for Learning Network Parameters and Architecture". M.Sc. thesis, Department of
Electrical Engineering, Case Western Reserve University, Advisor Yoshiyasu
Takefuji.
Gomez, Faustino J.; Schmidhuber, Jürgen; Miikkulainen, Risto (June 2008).
"Accelerated Neural Evolution Through Cooperatively Coevolved Synapses". Journal of
Machine Learning Research. 9: 937–965.
Siegelmann, Hava T.; Horne, Bill G.; Giles, C. Lee (1995). "Computational
Capabilities of Recurrent NARX Neural Networks". IEEE Transactions on Systems, Man
and Cybernetics, Part B (Cybernetics). 27 (2): 208–15. CiteSeerX 10.1.1.48.7468.
doi:10.1109/3477.558801. PMID 18255858.
Metz, Cade (May 18, 2016). "Google Built Its Very Own Chips to Power Its AI Bots".
Wired.
Mayer, Hermann; Gomez, Faustino J.; Wierstra, Daan; Nagy, Istvan; Knoll, Alois;
Schmidhuber, Jürgen (October 2006). A System for Robotic Heart Surgery that Learns
to Tie Knots Using Recurrent Neural Networks. 2006 IEEE/RSJ International
Conference on Intelligent Robots and Systems. pp. 543–548. CiteSeerX
10.1.1.218.3399. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8. S2CID
12284900.
Wierstra, Daan; Schmidhuber, Jürgen; Gomez, Faustino J. (2005). "Evolino: Hybrid
Neuroevolution/Optimal Linear Search for Sequence Learning". Proceedings of the
19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh:
853–858.
Petneházi, Gábor (2019-01-01). "Recurrent neural networks for time series
forecasting". arXiv:1901.00069 [cs.LG].
Hewamalage, Hansika; Bergmeir, Christoph; Bandara, Kasun (2020). "Recurrent Neural
Networks for Time Series Forecasting: Current Status and Future Directions".
International Journal of Forecasting. 37: 388–427. arXiv:1909.00590.
doi:10.1016/j.ijforecast.2020.06.008. S2CID 202540863.
Graves, Alex; Schmidhuber, Jürgen (2005). "Framewise phoneme classification with
bidirectional LSTM and other neural network architectures". Neural Networks. 18 (5–
6): 602–610. CiteSeerX 10.1.1.331.5800. doi:10.1016/j.neunet.2005.06.042. PMID
16112549.
Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of
Recurrent Neural Networks to Discriminative Keyword Spotting. Proceedings of the
17th International Conference on Artificial Neural Networks. ICANN'07. Berlin,
Heidelberg: Springer-Verlag. pp. 220–229. ISBN 978-3540746935.
Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey E. (2013). "Speech
Recognition with Deep Recurrent Neural Networks". Acoustics, Speech and Signal
Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649.
arXiv:1303.5778. Bibcode:2013arXiv1303.5778G. doi:10.1109/ICASSP.2013.6638947. ISBN
978-1-4799-0356-6. S2CID 206741496.
Chang, Edward F.; Chartier, Josh; Anumanchipalli, Gopala K. (24 April 2019).
"Speech synthesis from neural decoding of spoken sentences". Nature. 568 (7753):
493–498. Bibcode:2019Natur.568..493A. doi:10.1038/s41586-019-1119-1. ISSN 1476-
4687. PMID 31019317. S2CID 129946122.
Moses, David A., Sean L. Metzger, Jessie R. Liu, Gopala K. Anumanchipalli, Joseph
G. Makin, Pengfei F. Sun, Josh Chartier, et al. "Neuroprosthesis for Decoding
Speech in a Paralyzed Person with Anarthria." New England Journal of Medicine 385,
no. 3 (July 15, 2021): 217–27. https://doi.org/10.1056/NEJMoa2027540.
Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015).
"Long Short Term Memory Networks for Anomaly Detection in Time Series" (PDF).
European Symposium on Artificial Neural Networks, Computational Intelligence and
Machine Learning — ESANN 2015.
Gers, Felix A.; Schraudolph, Nicol N.; Schmidhuber, Jürgen (2002). "Learning
precise timing with LSTM recurrent networks" (PDF). Journal of Machine Learning
Research. 3: 115–143.
Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure
of the Blues. Artificial Neural Networks — ICANN 2002. Lecture Notes in Computer
Science. Vol. 2415. Berlin, Heidelberg: Springer. pp. 284–289. CiteSeerX
10.1.1.116.3620. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848.
Schmidhuber, Jürgen; Gers, Felix A.; Eck, Douglas (2002). "Learning nonregular
languages: A comparison of simple recurrent networks and LSTM". Neural Computation.
14 (9): 2039–2041. CiteSeerX 10.1.1.11.7369. doi:10.1162/089976602320263980. PMID
12184841. S2CID 30459046.
Gers, Felix A.; Schmidhuber, Jürgen (2001). "LSTM Recurrent Networks Learn Simple
Context Free and Context Sensitive Languages" (PDF). IEEE Transactions on Neural
Networks. 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.
Pérez-Ortiz, Juan Antonio; Gers, Felix A.; Eck, Douglas; Schmidhuber, Jürgen
(2003). "Kalman filters improve LSTM network performance in problems unsolvable by
traditional recurrent nets". Neural Networks. 16 (2): 241–250. CiteSeerX
10.1.1.381.1992. doi:10.1016/s0893-6080(02)00219-8. PMID 12628609.
Graves, Alex; Schmidhuber, Jürgen (2009). "Offline Handwriting Recognition with
Multidimensional Recurrent Neural Networks". Advances in Neural Information
Processing Systems 22, NIPS'22. Vancouver (BC): MIT Press: 545–552.
Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber,
Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural
Networks. Proceedings of the 20th International Conference on Neural Information
Processing Systems. NIPS'07. Curran Associates Inc. pp. 577–584. ISBN
9781605603520.
Baccouche, Moez; Mamalet, Franck; Wolf, Christian; Garcia, Christophe; Baskurt,
Atilla (2011). Salah, Albert Ali; Lepri, Bruno (eds.). "Sequential Deep Learning
for Human Action Recognition". 2nd International Workshop on Human Behavior
Understanding (HBU). Lecture Notes in Computer Science. Amsterdam, Netherlands:
Springer. 7065: 29–39. doi:10.1007/978-3-642-25446-8_4. ISBN 978-3-642-25445-1.
Hochreiter, Sepp; Heusel, Martin; Obermayer, Klaus (2007). "Fast model-based
protein homology detection without alignment". Bioinformatics. 23 (14): 1728–1736.
doi:10.1093/bioinformatics/btm247. PMID 17488755.
Tax, Niek; Verenich, Ilya; La Rosa, Marcello; Dumas, Marlon (2017). Predictive
Business Process Monitoring with LSTM neural networks. Proceedings of the
International Conference on Advanced Information Systems Engineering (CAiSE).
Lecture Notes in Computer Science. Vol. 10253. pp. 477–492. arXiv:1612.02130.
doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1. S2CID 2192354.
Choi, Edward; Bahadori, Mohammad Taha; Schuetz, Andy; Stewart, Walter F.; Sun,
Jimeng (2016). "Doctor AI: Predicting Clinical Events via Recurrent Neural
Networks". Proceedings of the 1st Machine Learning for Healthcare Conference. 56:
301–318. arXiv:1511.05942. Bibcode:2015arXiv151105942C. PMC 5341604. PMID 28286600.
Further reading
Mandic, Danilo P. & Chambers, Jonathon A. (2001). Recurrent Neural Networks for
Prediction: Learning Algorithms, Architectures and Stability. Wiley. ISBN 978-0-
471-49517-8.
External links
Recurrent Neural Networks with over 60 RNN papers by Jürgen Schmidhuber's group at
Dalle Molle Institute for Artificial Intelligence Research
Elman Neural Network implementation for WEKA
vte
Differentiable computing
Categories: Artificial intelligenceArtificial neural networks
Navigation menu
Not logged in
Talk
Contributions
Create account
Log in
ArticleTalk
ReadEditView history
Search
Search Wikipedia
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Tools
What links here
Related changes
Special pages
Permanent link
Page information
Cite this page
Wikidata item
Print/export
Download as PDF
Printable version
Languages
العربية
বাংলা
Deutsch
Español
Français
한국어
日本語
Русский
中文
14 more
Edit links
This page was last edited on 26 June 2022, at 22:42 (UTC).
Text is available under the Creative Commons Attribution-ShareAlike License 3.0;
additional terms may apply. By using this site, you agree to the Terms of Use and
Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation,
Inc., a non-profit organization.
Privacy policyAbout WikipediaDisclaimersContact WikipediaMobile
viewDevelopersStatisticsCookie statementWikimedia FoundationPowered by MediaWiki