Presentation RaviShankar

Transfer Learning of Neural
Networks in NLP
D Ravi Shankar
Advisor: Dr V Susheela Devi

Introduction
Transfer Learning definition:
● Def: Given a source domain DS and learning
task TS , a target domain DT and learning task
TT , transfer learning aims to help improve the
learning of the target predictive function f T(.) in
DT using the knowledge in DS and TS , where
DS≠ DT , or TS≠ TT .
Problem Statement
● Transfer Learning techniques have been
effectively used in fields like Image Processing
and were able to achieve good results.
● However in NLP, Transfer Learning has been
loosely applied and conclusions are not
consistent.
● In this project we aim to explore different neural
network based Transfer Learning schemes on a
variety of datasets.
Datasets
● IMBD: A large dataset for binary sentiment classification (positive vs. negative) - 25k
sentences.
● MR: A small dataset for binary sentiment classification ∼ 10k sentences.
● QC: A (small) ) dataset for 6-way question classification (e.g., location, time, and
number) ∼ 5000 questions.
● SNLI: A large dataset for sentence entailment recognition. The classification
objectives are entailment, contradiction, and neutral ∼ 500k pairs.
● SICK: A small dataset with exactly the same classification objective as SNLI ∼ 10k
pairs.
● MSRP: A (small) dataset for paraphrase detection.The objective is binary
classification: judging whether two sentences have the same meaning ∼ 5000 pairs.
● Quora dataset: It contains duplicate questions pairs with labels indicating whether
the pair of questions request the same information ∼ 400k question pairs.
Related Work
How Transferable are Neural Networks in NLP
Applications? - Lili Mou, et al.
● Transfer Knowledge to
– 1) Semantically similar task.
– 2) Semantically different task.
● Methods Used:
1) INIT
2) MULT
Transfer Methods:
1) INIT:
● First train the network on source dataset.
● Then directly use the tuned parameters to
initialize the network for Target dataset.
2) MULT:
● Simultaneously train both Source and Target
domains.
● Cost Function:
J = λJT + (1 − λ)JS
where JT and JS are the individual cost functions
and λ ∈ (0, 1) is a hyperparameter balancing
the two domains.
Architecture:
Conclusions:
● Whether a neural network is transferable in
NLP depends largely on how semantically
similar the tasks are.
● MULT and INIT appear to be generally
comparable to each other; combining these two
methods does not result in further gain.
Transfer Learning for Sequence Tagging with Hierarchical Recurrent
Networks - Zhilin Yang,et al.
Efficient Transfer Learning Schemes for
Personalized Language Modeling using
Recurrent Neural Network-SeunghyunYoon,et al.
● Proposed efficient transfer learning methods for

training a personalized language model using a RNN
with LSTM architecture.
● General language model is updated to a personalized
language model with a small amount of user data and
a limited computing resource.
1) A sentence completion language model, which
can complete sentences with a given n-many
sequence of words.
2)Second is a message-reply prediction language
model, which can generate a response
sentence for a given message.
● Proposed three schemes of transferring the
parameters.
● Best results are obtained when they have
added a new surplus layer, trained only its
parameters and freezed the previous layers.
Results
Conducted three experiments:
● Experiment 1: Trained a LSTM model on IMDB and
transferred the parameters to MR and QC datasets.
● Experiment 2: Trained CNN model on SNLI dataset
and transferred the parameters to SICK and MRSP
datasets.
● Experiment 3: Experimented with both INIT and
MULT methods on SNLI(source) and Quora(target)
datasets.
Transfer of parameters in semantically similar
and semantically different tasks.
Experiment 1
Experiment 2
Transfer of parameters layer wise
Experiment 3
1) Transfer Learning for various amounts of target
data.
2) INIT vs MULT.
When to transfer from source to target?
Conclusions
● Transfer Learning is successful when we are dealing with
semantically similar tasks.
● Transfer Learning also depends on what layers we are
transferring.
● It is helpful when the target dataset is small and also helps
in faster convergence.
● INIT method is performing slightly better compared to MULT.
● Are we losing general information if the model is
trained on source data for best accuracy? The answer
seems to be NO.
Future Work
● Conduct experiments and analyse which part of
the network can be transferred depending on
the type of application.
● Find the similarity between the datasets
quantitatively.
● NMT system first reads the source sentence using an
encoder to build a ”context” vector, a sequence of numbers
that represents the sentence meaning; a decoder, then,
processes the sentence vector to emit a translation.
● The context vector thus contains sufficient lexical and
semantic information to fully reconstruct a sentence in
another language.
● Transfer this knowledge (”context vector”) from NMT and
use it in variety of applications like sentiment analysis,
paraphrase detection, etc and see how much it helps in
improving the performance on the target dataset.
References
1) Lili Mou, Zhao Meng, Rui Yan, Ge Li, Yan Xu,Lu Zhang, Zhi Jin.How
Transferable are NeuralNetworks in NLP Applications?. In Proceedings
ofthe 2016 Conference on Empirical Methods in Nat-ural Language
Processing (EMNLP), pages 478–489, 2016.
2) Zhilin Yang, Ruslan Salakhutdinov, William W.Cohen.Transfer Learning
for Sequence Taggingwith Hierarchical Recurrent Networks. ICLR 2017.
3) Seunghyun Yoon, Hyeongu Yun, Yuna Kim, Gyu-tae Park, Kyomin
Jung.Efficient Transfer Learn-ing Schemes for Personalized Language
Modelingusing Recurrent Neural Network.CoRR 2017, vol-ume:
abs/1701.03578.
4) Sinno Jialin Pan and Qiang Yang.A sur-vey on transfer learning.
IEEE Transactions onKnowledge and Data Engineering 2010,
22(10),13451359.
5) Ilya Sutskever, Oriol Vinyals, and Quoc VVLe.Sequence to
Sequence Learning with NeuralNetworks.In Proceedings of EMNLP
2014,pages17241734.
6) Kyunghyun Cho, Bart van Merrienboer, aglarGlehre, Dzmitry
Bahdanau, Fethi Bougares, Hol-ger Schwenk, and Yoshua
Bengio.Learning phraserepresentations using RNN encoder-
decoder forstatistical machine translation.In Advances inNeural
Information Processing Systems 2014,pages 31043112.
7) Engineering at Quora.
https://engineering.quora.com/Semantic-Question-Matching-with-Deep
-Learning
8) Sebastian Ruder blog.
http://ruder.io/transfer-learning/
Thank you

Presentation RaviShankar

Uploaded by

Copyright:

Available Formats

Presentation RaviShankar

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation RaviShankar

Uploaded by

Copyright:

Available Formats

Transfer Learning of Neural

Advisor: Dr V Susheela Devi

● Proposed efficient transfer learning methods for

You might also like