Unit IV

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

BAPATLA ENGINEERING COLLEGE :: BAPATLA

(Autonomous)

Deep Learning (20ECJ44)


By
Dr. Naga Raju Challa
Assistant Professor,
Department of ECE,
Bapatla Engineering College,
(Autonomous)
Bapatla.
UNIT- IV
Sequence Modelling–Recurrent
and Recursive Nets
Sequence Learning Problem

 In previous chapters, Feedforward neural


networks and CNN have been discussed in
detailed.
 However, in this networks the size of the inputs
were always fixed.
 For example, we fed fixed size of (32×32)
images to CNN for image classification.
 Further, each input to the network was
independent of the previous or future inputs.
 For many applications the inputs are not in fixed
size and also the successive inputs may not be
independent each other.
Sequence Learning Problem

 Let us consider an example, consider the task of


auto completion such as typing a message or
WhatsApp messages.
 Let us consider type a “deep”, the first character
is ‘d’ you want to predict the next character ‘e’
and so on.
 This is what is my task.
Sequence Learning Problem

 This problem is known as sequence learning problem.


Sequence Learning Problem
Sequence Learning Problem
Sequence Learning Problem
Recurrent Neural Networks

 How do we model such sequence learning problem sequence?


 Wish list:
 Accounts depends between inputs
 Because that is the strong case that we have made that the output actually depends on multiple inputs and not a
single input.
 Accounts for variable number of inputs
 Because a video should be 300 seconds or it could be 20 seconds 25 seconds and an sentence could be arbitrary
lines an so on.
 Make sure that the function executed at each time step should be same
 Because in every time step you have to do the same time activity.
 So we will focus on each of these items from our wish list and try to arrive at a model for dealing with such
problems.
Recurrent Neural Networks

Where, X=Input Layer; S= Hidden Layer ;Y= Output Layer


What is the function can be executed at each step is
𝑠𝑖 =𝜎 ( 𝑈 𝑥 𝑖 +𝑏 )
𝑦 𝑖 =𝜊 ( 𝑉 𝑠𝑖 +𝑐 )
𝑖= timestep
Recurrent Neural Networks
Recurrent Neural Networks
 How do we account for dependence between the inputs?
Recurrent Neural Networks
 How do we account for dependence between the inputs?
Recurrent Neural Networks
𝑠𝑖 =𝜎 ( 𝑈𝑥 +𝑊 𝑠 +𝑏 )
𝑖− 1
 The solution is to add a recurrent connection in the network is
𝑦 𝑖 =𝜊 ( 𝑉 𝑠𝑖 +𝑐 )

𝑦 𝑖= 𝑓 ( 𝑥 𝑖 , 𝑠 𝑖 ,𝑊 ,𝑈 , 𝑉 , 𝑏, 𝑐 )
is the state of network at time step I
W, U, V, b,c which are shared across the time steps.
The same network can be computed .
Recurrent Neural Networks
𝑠𝑖 =𝜎 ( 𝑈𝑥 +𝑊 𝑠 +𝑏 ) 𝑖− 1
 This is more compact way of representing RNN
𝑦 𝑖 =𝜊 ( 𝑉 𝑠𝑖 +𝑐 )

𝑦 𝑖= 𝑓 ( 𝑥 𝑖 , 𝑠 𝑖 ,𝑊 ,𝑈 , 𝑉 , 𝑏, 𝑐 )

is the state of network at time step I


W, U, V, b,c which are shared across the time steps.
The same network can be computed .
Recurrent Neural Networks
 This is more compact way of representing RNN

𝑠𝑖 =𝜎 ( 𝑈𝑥 +𝑊 𝑠 +𝑏 ) 𝑖− 1

𝑦 𝑖 =𝜊 ( 𝑉 𝑠𝑖 +𝑐 )

𝑦 𝑖= 𝑓 ( 𝑥 𝑖 , 𝑠 𝑖 ,𝑊 ,𝑈 , 𝑉 , 𝑏, 𝑐 )

is the state of network at time step I


W, U, V, b,c which are shared across the time steps.
The same network can be computed .
Recurrent Neural Networks
 Let us revisit the Sequence Learning Problem
Recurrent Neural Networks
 Let us revisit the Sequence Learning Problem
Recurrent Neural Networks
 Let us revisit the Sequence Learning Problem
Recurrent Neural Networks
 Recurrent Neural Networks (RNNs) are a type of artificial neural network designed for sequence data.
Unlike traditional feedforward neural networks, RNNs have connections that form a directed cycle,
allowing them to maintain a hidden state that captures information about previous inputs in the sequence.
This makes RNNs particularly well-suited for tasks involving sequential or time-dependent data.
 Structure of an RNN:
 In an RNN, each node in the network represents a time step, and the connections between nodes carry
information from one time step to the next. The hidden state at each time step is updated based on the
input at that time step and the previous hidden state.
 Input Layer: The input layer represents the features of the data at each time step in the sequence. The
input at each time step is fed into the network.
 Hidden Layer: The hidden layer maintains a hidden state vector that captures information from previous
time steps. The hidden state at each time step is updated based on the current input and the previous
hidden state. The connections between the hidden layer at one time step and the next form a directed
cycle, allowing the network to maintain memory of past inputs.
Recurrent Neural Networks
 Output Layer: The output layer processes the hidden state at each time step and produces the output for
that time step. The output can be used for tasks such as sequence prediction, classification, or generation.
 Activation Function: Each node in the hidden layer typically uses a non-linear activation function, such
as the hyperbolic tangent (tanh) or rectified linear unit (ReLU), to introduce non-linearity to the model.
 Advantages of RNN
 Sequential Information Handling: RNNs can model and capture sequential dependencies in data,
making them suitable for tasks like time series prediction, speech recognition, and natural language
processing.
 Flexibility in Input/Output Length: RNNs can handle input sequences of varying lengths and produce
output sequences, making them versatile for tasks with variable-length input or output.
 Parameter Sharing: RNNs use the same set of weights for each time step, which allows them to share
parameters across different parts of the sequence. This parameter sharing can lead to more efficient
training, especially when dealing with long sequences.
Recurrent Neural Networks
 Disadvantages of RNN:
 Vanishing and Exploding Gradient Problems: RNNs often suffer from vanishing and exploding gradient
problems, which can make it difficult for the network to learn long-term dependencies. This occurs when gradients
become too small or too large during backpropagation through time.
 Limited Memory: Standard RNNs have a short memory, meaning they may struggle to capture information from
earlier time steps when the gap is too large. Long Short-Term Memory (LSTM) networks and Gated Recurrent
Unit (GRU) networks are designed to address this issue.
 Computational Intensity: Training RNNs can be computationally intensive, especially for long sequences. The
sequential nature of RNNs makes parallelization challenging, limiting their efficiency on certain hardware.
 Difficulty in Capturing Long-Term Dependencies:** Despite efforts to address the vanishing gradient problem,
RNNs may still face challenges in capturing long-term dependencies in sequences, especially when the
dependencies span a large number of time steps.
Backpropagation through Time for training RNN

 Look at the dimensions of the parameters

 We train this network using Back propagation algorithm


Backpropagation through Time for training RNN
Backpropagation through Time for training RNN
Backpropagation through Time for training RNN
Backpropagation through Time for training RNN
Backpropagation through Time for training RNN
Backpropagation through Time for training RNN
Backpropagation through Time for training RNN
Backpropagation through Time for training RNN

You might also like