Journal Pre-Proof: Digital Communications and Networks

Journal Pre-proof
A survey on deep learning for textual emotion analysis in social networks
Sancheng Peng, Lihong Cao, Yongmei Zhou, Zhouhao Ouyang, Aimin Yang,
Xinguang Li, Weijia Jia, Shui Yu
PII: S2352-8648(21)00083-3
DOI: https://doi.org/10.1016/j.dcan.2021.10.003
Reference: DCAN 325
To appear in: Digital Communications and Networks
Received Date: 27 December 2020

Revised Date: 22 September 2021
Accepted Date: 8 October 2021
Please cite this article as: S. Peng, L. Cao, Y. Zhou, Z. Ouyang, A. Yang, X. Li, W. Jia, S. Yu, A survey
on deep learning for textual emotion analysis in social networks, Digital Communications and Networks,
https://doi.org/10.1016/j.dcan.2021.10.003.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
© 2021 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier
B.V. on behalf of KeAi Communications Co. Ltd.
Digital Communications and Networks(DCN)
journal homepage: www.elsevier.com/locate/dcan
A survey on deep learning for textual

emotion analysis in social networks
Sancheng Penga , Lihong Caob,∗ , Yongmei Zhouc , Zhouhao Ouyangd , Aimin Yange
of
Xinguang Lia , Weijia Jia f , Shui Yug ,
a Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou, 510006, China.
ro
b School of English Education, Guangdong University of Foreign Studies, Guangzhou, 510006, China
c School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, 510006, China
d School of Computing, University of Leeds, Wood-house Lane, Leeds, West Yorkshire, LS2 9JT, United Kingdom.
p
e School of Computer, Guangdong University of Technology, Guangzhou, 510006, China
f BNU-UIC Institute of Artificial Intelligence and Future Networks, Beijing Normal University (BNU Zhuhai), Zhuhai, 519087, China
g School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
e-
Pr
Abstract
Textual Emotion Analysis (TEA) aims to extract and analyze user emotional states in texts. There has been rapid development
of various Deep Learning (DL) methods that have proven successful in many domains such as audio, image, and natural language
al
processing. This trend has drawn increasing numbers of researchers away from traditional machine learning to DL for their scientific
research. In this paper, we provide an overview on TEA based on DL methods. After introducing a background for emotion analysis that
n
includes defining emotion, emotion classification methods, and application domains of emotion analysis, we summarize DL technology,
and the word/sentence representation learning method. We then categorize existing TEA methods based on text structures and linguistic
ur
types: text-oriented monolingual methods, text conversations-oriented monolingual methods, text-oriented cross-linguistic methods,
and emoji-oriented cross-linguistic methods. We close by discussing emotion analysis challenges and future research trends. We hope
that our survey will assist interested readers in understanding the relationship between TEA and DL methods while also improving
Jo
TEA development.
c 2015 Published by Elsevier Ltd.

KEYWORDS:
Text, Emotion analysis, Deep learning, Sentiment analysis, Pre-training
1. Introduction The idiom’s “seven emotions and six desires” are joy,
love, anger, sadness, fear, evil, and desire. Among them,
Textual Emotion Analysis (TEA) is the task of extract- only a few are positive emotions and the rest are nega-
ing and analyzing user emotional states in texts. TEA tive, indicating that people are naturally more sensitive to
not only acts as a standalone tool for information extrac- negative emotions. During real-world life, negative emo-
tion but also plays an important role for various Natu- tions are also easier to propagate than positive emotions.
ral Language Processing (NLP) applications, including e- The 2011 Annual Report of China’s Internet Public Opin-
commerce [1], public opinion analysis [2], big search [3], ion Index [8, 9] stated “In 2011, there are more than 80%
information prediction [4], personalized recommendation negative events for the total number of topics. The nega-
[5], healthcare [6], and online teaching [7]. tive events on the Microblog and Tianya forum accounted
for 75.6% and 95.8%, respectively, which are higher than
∗ Corresponding author. E-mail addresses: [email protected] those of other social media.”
(S. Peng), [email protected] (L. Cao), yong-
[email protected] (Y. Zhou), [email protected] (Z. Ouyang),
With the rapid development of social networks [10,
[email protected] (A. Yang), [email protected] (X. Li), ji- 11, 12, 13], people have changed from general users to
[email protected] (W. Jia), [email protected] (S. Yu) network information producers. According to the 38th
2 S. Peng, et al.
statistical report on the development of China’s Internet 2. We make the first attempt to provide a comprehen-
published by China Internet Information Center [14], the sive review on the related TEA methods.
number of Internet users in China has reached 710 mil- 3. We provide a detailed overview of different defini-
lion, and the Internet penetration rate reached 51.7% in tions and classification models of emotion.
June 2016. Among them, 656 million were mobile In- 4. We provide a detailed overview on the related TEA
ternet users, 242 million used Microblog, and more than applications.
100 million had daily blogs. Among this massive number 5. We provide a detailed description and comparison
of short text messaging, negative emotions are the most analysis of related pre-training TEA methods.
prevalent.
6. We provide a detailed description and comparison
Emotion analysis [15] aims to automatically extract analysis of existing DL-based TEA methods.
user emotional states from their social network text activ-
ity (e.g., blogs, tweets). Early research focused on either a The remainder of this paper is organized as follows: In
positive/negative bipartition or a positive/negative/neutral Section 2, we provide an overview of the term “emotion";
tripartition of emotion analysis [16, 17]. However, such we then survey emotion analysis applications in Section
partitioning ignores subtle user emotion changes and their 3. In Section 4, we provide an overview of DL methodol-
psychological states, preventing a full expression of peo- ogy, before surveying pre-training methods in Section 5.
of
ple’s complex inner emotional world. This gave rise to In Section 6, we discuss emotion analysis methods based
bipartition-oriented emotion analysis being named “senti- on DL, and then present emotion analysis challenges in
ment analysis" [18, 19], while more encompassing emo- Section 7. In Section 8, we discuss the future trends, be-
ro
tion analysis was dubbed “fine-grained sentiment analy- fore concluding this paper in Section 9.
sis".
p
In 2012, Deep Learning (DL) methods [20, 21] were
introduced to NLP after they achieved successful object 2. Emotion Overview
e-
recognition via ImageNet [22]. DL methods improved on Due to the variability and sensitivity of human emo-
statistical learning results in many fields. At present, a tions, people have different understandings of emotion,
neural network-based NLP framework has achieved new
Pr
causing the term to be classified differently by differ-
levels of quality and become the dominating technology ent fields. At present, there is no unified standard for
for NLP tasks, such as sentiment analysis, machine trans- defining and classifying emotion academically; however,
lation, and question answering systems. researchers have performed in-depth studies of emotion
al
Popular DL methods are used to model emotion anal- classification, presenting multiple definition and classifi-
ysis, including Deep Averaging Networks (DANs) [23], cation models.
Denoising Autoencoders (DAEs) [24], Convolutional
n
Neural Networks (CNNs) [25], Recurrent Neural Net- 2.1. Defining Emotion
ur
works (RNNs) [26], Long Short-term Memory (LSTM)

networks [27], Bi-directional Long Short-term Memory In this subsection, we will explain the meaning of emo-
(Bi-LSTM) networks [28], Gated Recurrent Units (GRUs) tion. The basic understanding of emotion is summarized
Jo
[29], attention [30],and Multi-head Attention (MHA) [31, as follows.

32]. In general, researchers may integrate multiple meth- Definition 1: Emotion [33] is defined by the Merriam-
ods into a specific model (rather than just one) to improve Webster Dictionary as “A conscious mental reaction (e.g.,
emotion analysis performance. anger or fear) subjectively experienced as strong feeling
To our knowledge, this is the first paper to provide a usually directed toward a specific object and typically ac-
comprehensive survey on TEA and DL combinations. We companied by physiological and behavioral changes in the
collected papers from various sources such as ACL Web body”.
Anthology, AAAI, IEEE, ACM, Elsevier, and Springer. Definition 2: Emotion [34] denotes people’s attitude
We confined our collection to papers published from Jan- experience and corresponding behavioral responses to ob-
uary 2015 to November 2020 to gauge recent DL popu- jective things.
larity. We used keywords (such as “deep learning”, “emo- Definition 3: Emotions [35] are “generated states in
tion”, “pre-training”, “embeddings”, “natural language humans that reflect evaluative judgments of the environ-
processing”, “short text”, and “sentiment”) to retrieve 210 ment, the self and other social agents”.
relevant papers. Removing duplicate papers and those un- Definition 4: Emotion [36] is the feeling or reaction
related to TEA reduced the number to 130. We then con- that people have due to a certain event. “Happy”, “sad”,
ducted a manual review to obtain 70 of the most relevant “angry”, “fear” are few examples of emotions that can be
papers from the remainder. The main contributions of our expressed. “Emotions” and “sentiments” are often con-
survey are summarized as follows. sidered interchangeable, yet the latter represent emotional
polarity or general emotional states (i.e., positive, nega-
1. We provide a comparative study covering recent DL- tive, or neutral).
based TEA published between January 2015 and Definition 5: Emotion [37] is a mental state that arises
November 2020 by analyzing basic characteristics of spontaneously rather than through a conscious effort; it is
typical TEA methods. also often accompanied by physiological changes.
A Survey on Deep Learning for Textual Emotion Analysis 3
Definition 6: Emotion [38] is often defined as an in- marketing decisions, allowing them to reach market dom-
dividual’s mental state associated with thoughts, feelings, inance.
and behavior.
2.2. Classifying Emotion 3.2. Public Opinion Analysis

In this subsection, we will explain the classification of Network public opinion [2] refers to different views of
emotion. The basic understanding of emotion types is sur- popular social issues expressed on the Internet. This pub-
veyed as follows. lic form of social opinion carries appreciable influence
Ekman [39] categorized emotions into six basic types: and allows tendentious opinions of issues affecting reality
anger, disgust, fear, happiness, sadness, and surprise. by way of the Internet. Public opinion analysis is used to
Parrott [40] proposed an emotion classification model objectively reflect the state of public opinion by collect-
based on a tree structure with six kinds of emotions: love, ing and sorting out people’s attitudes as well as discov-
joy, surprise, anger, sadness, and fear. ering relevant opinion tendencies. Many irrational emo-
Plutchik [41] visualized a wheel-shaped emotion clas- tions (such as negative feelings towards the rich, official,
sifier (based on Ekman’s model [39] with two additional powerful, or market) are expressed and strengthened by
categories) with four bipolar sets: joy and sadness; anger
of
means Internet violence or entertainment, driving people
and fear; trust and disgust; and surprise and anticipation. towards more extreme emotional reaction. Irrational ne-
Lin Chuanding [42], a Chinese modern psychologist, tizen emotion causes national and societal security risks.
ro
divided emotions into 18 Shuowen-based categories: joy, Thus, relevant national management departments require
quiet, caress, worry, fright, pity, fear, grief, shame, sor- knowledge of network public opinion trends to guide that
row, anger, vexation, reverence, hatred, arrogance, greed, opinion properly and in a timely fashion. However, when
jealousy, and shame.
These existing emotional category approaches focus on
p such information is obtained through various channels,
e- its complexity prevents manual processing. This shortfall
modeling emotions based on distinct emotion classes or makes developing accurate and effective emotion analysis
labels. These models assume discrete emotion categories systems significant as well as the automatic processing of
Pr
exist. network public opinion information necessary to maintain
national security and social stability.
3. Emotion Analysis Application
al
Emotion analysis has been widely studied in psychol- 3.3. Big Search
ogy, neuroscience, and behavioral science, as emotions
With network space expansion, network application
n
are an important element of human nature. Such analysis

plays an important role in many application fields, includ- mode development, and the arrival of the big data era,
ur
ing e-commerce, public opinion analysis, big search, in- the Internet has become ubiquitous and given rise to big
formation prediction (e.g., financial prediction, presiden- search technology [3]. Big search is becoming a strong
tial election prediction), personalized recommendation, tool and catalyst for network development. Big search,
Jo
healthcare (e.g., depression screening), and online teach- the next generation search engine for cyberspace, is be-
ing. coming an urgent need. Compared with traditional search,
big search can understand user search intentions on a se-
3.1. E-commerce mantic level while also perceiving user needs according
to their spatio-temporal location, emotional state, and his-
With mobile Internet development, online shopping has
torical preferences. Big search can also remove false data
become more popular with users often providing per-
and protect user privacy. In addition, big search solutions
sonal comments on products purchased via Taobao, Jing-
can provide intelligent answers to users, making retrieval
dong, Amazon, and other e-commerce platforms. Us-
technology based on user emotion analysis an important
ing this source of product review [1], conducted real-time
research task for big search.
emotional analysis to obtain useful emotional and behav-
ioral consumer characteristics, enabling the prediction of
trend changes in consumer preferences. Such information 3.4. Information Prediction
would help a majority of consumers deeply understand the
quality of goods, pre-sale and after-sales services, logis- As the Internet has developed, larger numbers of people
tic services, and other related information, guiding them rely on it for information and communication sharing, par-
through their future purchases. Manufacturers would also ticularly for social network interactions (e.g., Microblog,
benefit from first-hand consumer feedback, timely prod- Wechat, stock, and futures forums). Emotion analysis
uct shortage warnings, and improved product quality and technology can be used to analyze the impact of social
design. Sellers would benefit from knowing consumer networks upon user lives and predict developing trends by
psychological states as they relate to available commodi- way of the commentary, news articles, and other content.
ties and related services. Sellers who can capture con- The main application of information prediction includes
sumer psychology can make timely sales, purchase, and the following three aspects.
4 S. Peng, et al.
3.4.1. Financial prediction: 3.6. Healthcare

Greater numbers of financial investors are turning to According to text recorded from social networks on
their networks for financial information and investment psychological counseling, the emotional state information
opinion exchanges. This makes professional forums on of patients suspected to be depressed was analyzed and
subjects such as stocks and futures rich with financial data screened using emotion analysis technology [6]. Yang et
and investor sentiment information (i.e., important factors al. [48] used the disease analysis interview corpus from
affecting investor behavior and psychology). Behavioral the University of Southern California to analyze whether
finance dictates that the psychology and behavior of irra- patients with poor mental health (e.g., those diagnosed
tional stock investors affect stock market situations, caus- with anxiety or PTSD) also suffered from depression. To
ing stock prices to deviate from their correct value. Thus, lighten the psychological burden of interviewees, inter-
we can predict stock market volatility by reviewing in- views were conducted by human-controlled robots. Col-
vestor emotion and behavior information drawn from a lected data also included recorded texts and PHQ-8 ques-
stock forum. tionnaires to determine possible depression conditions.
3.4.2. Election prediction: 3.7. Online Teaching

With the popular application of Massive Open Online
of
Emotion analysis plays an increasingly important role
in the prediction of democratic elections. Paul et al. [4] Courses (MOOCs) [49, 50], a large number of online
presented a framework, called Compass, that used the courses and reviews have been generated. Most reviews
ro
2016 U.S. presidential election as an example to ana- allow students to express their emotions and opinions.
lyze election-related crowd emotion. They built a spatial- Tucker et al. [7] found a positive correlation between stu-
temporal sentiment map through Compass for the elec- dent emotional tendencies from their forum-based reviews
tion, and used that map to match election results to an
p and their learning performance on the MOOC platform.
e-
extent. Their study showed that any political event can be Thus, using emotion analysis technology to analyze com-
described by its popularity in negative and positive senses. ment information on the MOOC platform allowed the au-
In addition, Ceron et al. [43] used emotion analysis to cal- thors to obtain course-related emotion information. Such
Pr
culate Twitter support rates for political leadership candi- information can help teachers find problems in curriculum
dates in the 2011 Italian parliamentary election and the arrangement, knowledge systems and teaching methods,
2012 French presidential election. enabling timely teaching plan and method optimization to
further improve teaching quality and student learning ef-
al
ficiency.
3.4.3. Other prediction:
Emotion analysis can also be used to predict public
n
opinion regarding various policy events (such as personal 4. DL Methodology

ur
income tax adjustment, medical insurance reform, and re-

tirement delays) as well as provide support for national 4.1. DL Overview
policy formulation. In addition, emotion analysis can be The core task of a DL method is feature learning. In
Jo
applied to natural disaster prediction and judgment, in- essence, it is a method of learning complex feature repre-
cluding epidemics [44] and earthquakes. With the appli- sentation, based on original feature input, through multi-
cation of information prediction, emotion analysis tech- layer nonlinear processing. If combined with specific
nology has received greater attention. With emotion anal- domain tasks, DL can construct new classifiers or gen-
ysis technology to analyze Internet news, blogs, and other erating tools through the feature representation of auto-
information sources, developing event trends can be pre- matic learning, and realize domain-oriented classification
dicted accurately. or other tasks. The specific steps of the algorithm for a
DL model are listed as follows [51, 52]:
Step 1: Construct a learning network with random ini-
3.5. Personalized Recommendation
tialization, set the total number of network training lay-
The emergence of personalized recommendation sys- ers n, initialize unlabeled data as the input set of network
tems [45, 46, 47] has provided users with a tool to ad- training, and initialize training network layer i = 1.
dress information overload issues. However, traditional Step 2: Based on the input set, an unsupervised learning
recommendation technology only considers overall user algorithm is used to pre-train the learning network of the
scores while ignoring emotion information contained in current layer.
user comments. Such commentary usually contains sub- Step 3: The training results of each layer are used as
jective user views, preferences, and emotions regarding input for the next layer, constructing the input set once
certain attributes of things, reflecting user emotional ten- again.
dencies for those attributes. Mining and exploiting user Step 4: If i is less than n, then i = i + 1, and return to
commentary to its fullest extent gives rise to more accu- Step 2; otherwise, proceed to Step 5.
rate personalized recommendation while helping resolve Step 5: The supervised learning method is used to ad-
issues such as cold starts, data sparsity, and low recom- just network parameters of all layers, forcing any errors to
mendation accuracy. meet practical requirements.
softmax y
h2 g(y) LH(x, z)
f(x)
h1
qD
av
x x z
Ă
c1 c2 cn Fig. 2: The framework of DAE
w1
Fig. 1: A DAN with two feed-forward layers
w2
w3
w4
Step 6: Complete classifier construction (such as neural
ĂĂĂĂĂĂĂĂ
network classifiers) or complete deep generation model
of
construction (such as a Deep Neural Network (DNN)). wn
144424443
input convolution pooling softmax
ro
4.2. DL-related Methods
Fig. 3: The framework of CNN
In this subsection, we will provide an overview of re-
lated DL methods. The basic understanding of related
methods are summarized as follows.
p 4.2.2. DAE:
e-
A DAE is an unsupervised learning algorithm that acts
as an autoencoder modification. It can form a DL net-
4.2.1. DAN:
Pr
work with multiple stacked layers. A denoising autoen-
DANs are constructed by stacking nonlinear layers over coder (shown in Fig. 2) consists of encoder, hide layer,
traditional neural bag-of-words models. For each docu- and decoder.
ment, a DAN takes the arithmetic mean of the word vec- Encoder f ( x̄) is used to reduce the dimensionality of
al
tors as input, and passes it through one or more feed- high-dimensional input. Input x is added noise to obtain a
forward layers until there is a softmax for classification. destroy version x̄, which is input into f ( x̄). Implicit cod-
The framework is shown in Fig. 1.
n
ing result y is then obtained through linear transformation

According to text classification, a DAN needs to map and activation function. Decoder g(y) is used to obtain a
ur
an input sequence of n tokens to one of k labels, and it reconstructed vector z. The specific computing for y and
also requires the following three steps to function: z is described as follows:
Step 1: Take the vector average of the embedding asso-
Jo
ciated with n input sequences of tokens n: y = f ( x̄) = S f (W x̄ + by ) (5)

Xn
av = ci /n (1)
i=1 z = g(y) = S g (W T y + bz ) (6)
where ci denotes the word embedding sequence.
Step 2: Pass the average through one or more feed-
S f = sigmoid(x) = 1 (1 + e−x )

(7)
forward layers. If there is one layer, the average word
embedding is described as follows:
S g = sigmod(y) = 1 (1 + e−y )

(8)
h1 = f (w1 · av + b1 ) (2)
where S f denotes an activation function of the linear
where b denotes the offset term and w denotes the k × d transformation, S g denotes an activation function of the
weight matrix. If there are more layers, the average word decoder, b denotes the offset term, and W denotes the
embedding is described as follows: weight matrix.
hi = f (wi · hi−1 + bi ) (3) 4.2.3. CNN:

A CNN is a kind of feed-forward neural network,
Step 3: Conduct (linear) classification on the represen- mainly consisting of input layer, convolution layer, pool-
tation of the final layer: ing layer, full connection layer, and output layer. The ba-
sic framework is shown in Fig. 3.
exp q In the field of NLP, the application steps of a CNN
soft max(q) = Pk (4)
j=1 exp q j mainly include: taking a vectorized sentence matrix as the
6 S. Peng, et al.
input, convoluting the original input with multiple convo- o

ot-1 ot ot+1
lution kernels through the convolution layer, and obtain-
ing multiple feature representations of the original input. V W V V V
Extracted features are then sent through the pooling layer s st-1 st st+1
as input and sampled to obtain more abstract features. Fi- Unfold W W W W
nally, through the full connection layer, the corresponding U U U U
classification function is used to classify results to com- x xt-1 xt xt+1
plete the corresponding task.
Input layer: It is responsible for the vectorization of
Fig. 4: The framework of RNN
input data. For a length n of a given sentence, the matrix
of input layer can be expressed as:
E ∈ n×m (9)
where m denotes the dimensionality of a word vector.
Convolution layer: It uses different convolution ker-
of
nels to perform the convolution operation for the input
matrix, extract local features from the input, and obtain
feature maps of the convolution kernel. The specific rep-
ro
resentation is described as follows:
XM
X lj = f ( Xil−1 ∗ Kil j + blj )
p
(10)
i=1
where X lj denotes the output of convolution operation be-

e-
tween the input and the j-th convolution kernel, Xil−1 de- Fig. 5: The framework of LSTM
notes the output of the previous layer (i.e., the i-th local
Pr
receiving domain of the current convolution layer l), Kil j
denotes the j-th convolution kernel of l, blj denotes the
offset term of the current convolution kernel, and M de- yt = softmax(ot ) (13)
al
notes all local received domains that the convolution ker- where yt denotes the output in time t, Whh , W xh , and Why
nel needs to traverse. denote the weight matrix, bh and by denote the bias term,
Pooling layer: It uses the corresponding sampling ht denotes the hidden state in t, and xt denotes the input
n
function to sample characteristics of the matrix generated data.

ur
by the convolution operation, and extracts more impor-

tant features to reduce matrix dimensionality. In addition, 4.2.5. LSTM:
it simplifies the calculation process while preventing key An LSTM network is a kind of RNN that can learn the
Jo
features from being discarded. Common pooling algo- relationship of long-term dependency, characterize the in-
rithms include max pooling and average pooling. formation of a time sequence, and effectively solve the
Full connection layer: It classifies input, obtains the problem of gradient vanishing or gradient exploding faced
classification results, and is responsible for taking those during RNN training. It was first proposed by Hochre-
results to the output layer. iter and Schmidhuber in 1997 [27]. After that, many re-
searchers have optimized and improved upon it, causing
4.2.4. RNN: rapid development and leading to its wide use among var-
An RNN is a kind of feed-forward neural network with ious NLP aspects.
a ring structure and a specific memory function. Its in- Each unit of an LSTM network consists of four compo-
put includes current input samples as well as information nents: a memory cell, input gate, output gate, and forget
obtained in the previous time, so that information can be gate. Memory cells are connected circularly with each
cycled in the network at any time. The framework of an other. Three nonlinear gate cells can be used to adjust the
RNN is shown in Fig. 4, in which x denotes the input information of memory cell input and output flows. The
layer, O denotes the output layer, H denotes the hidden framework of an LSTM network is shown in Fig. 5.
layer, and u, V, and w denote the weights of the above The forward computing process of an LSTM network
each respective layer. The output of the hidden layer is described as follows.
described by: Input gate:
ht = σ(W xh xt + Whh ht−1 + bh ) (11) it = σ(Wi xt + Ui ht−1 + bi ) (14)

The output of the out layer is described by: Forget gate:
ot+1 = σ(Why ht + by ) (12) ft = σ(W f xt + U f ht−1 + b f ) (15)

yt-1 yt yt+1 ht-1 ht
1-
LSTM LSTM LSTM
ht-1 ht ht+1 rt ut ht
σ σ tanh
LSTM LSTM LSTM
xt-1 xt xt+1 xt
Fig. 6: The framework of Bi-LSTM Fig. 7: The framework of GRU
of
Memory cell:
← ←
ht = LSTM(xt , ht−1 ) (20)
ro
ct = ft • ct−1 + it • tanh(Wc xt + Uc ht−1 + bc ) (16)
The output is described as follows:
Output gate:
p yt = g(W~hy~ht + W← ht + by )
←
(21)
hy
ot = σ(Wo xt + Uo ht−1 + bo ) (17)
e-
where xt denotes the input data, yt denotes the output at
Result output:
time t, W~hy and W← denote the weight matrix, and by de-
hy
Pr
ht = ot • tanh(ct ) (18) notes the bias term.
where xt denotes the input vector (such as a word vector)
at time t; f , i, and o denote the activation vectors of the 4.2.7. GRU:
al
forget gate, input gate, and output gate, respectively; c de- A GRU is an LSTM variant. It is well known that
notes the memory unit vector, h denotes the output vector LSTM can overcome the problem of gradient vanishing or
the LSTM unit, Wi , Ui , W f , U f , Wc , Uc , Wo , and Uo denote gradient exploding when dealing with the relationship of
n
the weight matrix; bi , b f , bc , and bo denote the bias vector; long-distance dependence, and it can keep the dependency
σ and tanh denote the activation function.
ur
relationship of long-distance and short-distance for tem-

poral data. A GRU retains these advantages while offer-
4.2.6. Bi-LSTM: ing a simpler network structure. Compared with an LSTM
Jo
A Bi-LSTM is an improved LSTM model. One- network’s three-gate structure, a GRU has only two gates:
directional LSTM uses previous information to deduce an update gate and a reset gate. The network structure of
subsequent information, requiring information processing a GRU is shown in Fig. 7.
and preventing it from accessing future context or inte- The forward calculation of a GRU is described as fol-
grating context information, which affects system predic- lows:
tion performance. A Bi-LSTM uses two LSTM networks Update gate:
to train together and start their respective sequences from
opposite ends while being connected back to the same out- ut = σ(Wu xt + Vu ht−1 + bu ) (22)
put layer. Thus, it can integrate the past and future infor-
mation of each point. A Bi-LSTM includes forward and Reset gate:
backward calculation. The horizontal direction represents
the bi-directional flow of the time sequence, and the verti- rt = σ(Wr xt + Vr ht−1 + br ) (23)
cal direction represents the one-directional flow from the
input layer to the hidden layer and on to the output layer. Memory cell:
The network structure of a Bi-LSTM network is shown in
Fig. 6. h̃t = tanh(Wh xt + Vh (rt ∗ ht−1 ) + br ) (24)
The forward calculation of hidden vector ~h is described
as follows: Output:
ht = ut ht−1 + (1 − ut )h̃t (25)
~ht = LSTM(xt , ~ht−1 ) (19)
←
where Wu , Ur , Wh , Vu , Vr , and Vh denote the weight ma-
The backward calculation of hidden vector h is de- trix, bi , b f , bc , and bo denote the bias vector, and tanh
scribed as follows: denotes the activation function.
8 S. Peng, et al.
key1 key2 ĂĂ keyn Linear
attention Concat
query
value
Scaled Dot-Product
value 1 value2 ĂĂ valuen Attention h
Linear
Linear Linear
Linear Linear
Linear
Linear Linear Linear
Fig. 8: The framework of attention
Q K V
4.2.8. Attention:
An attention mechanism imitates human visual pro- Fig. 9: The framework of MHA
cessing (i.e., it aligns internal experience with external
feelings to increase the observation precision of certain
of
areas). For example, when browsing a picture, people first MatMul
scan the global image quickly to obtain a target area (i.e.,
ro
attention point) that requires focus. More attention is then softmax
devoted to that point to obtain more detailed information
while other useless information is suppressed. The spe- Mas(opt.)
cific framework is shown in Fig. 8.
The calculation process is mainly divided into the fol-
p
e- Scale
lowing three steps.
Step 1: The similarity between the query and each key MatMul
Pr
is calculated to obtain the weight. Common similarity
functions include dot product, concatenating, and percep-
tron. The related descriptions are described as follows: Q K V
al
Fig. 10: The framework of SDA

QT K , dot





QT Wa K , general

n




f (Q, K) = 

(26) is repeated h times. The input of each time is the linear

Wa [Q; K] , concat
ur

transformation of the original input. The SDA framework




vta tanh(Wa Q + Ua K) , perceptron




 is shown in Fig. 10.
An SDA is an attention mechanism of similarity calcu-
Jo
Step 2: Generally, a softmax function (see Equation 27) lation using a point product, a series of queries, a series
is used to normalize these weights, of keys with dimensions dk , and a series of values with
dimensions dv . The calculation process is described as
exp( f (Q, Ki )) follows:
ai = softmax( f (Q, K)) = Pn (27)
j=1 exp( f (Q, K j )
QK T )
Step 3: The final attention is obtained by weighting and attention(Q, K, V) = softmax( √ )V (29)
dk
summing the weight and corresponding key value, which
“Multi-head" denotes that a head is calculated each
is described as follows:
time, parameter W of the linear transformation for Q, K,
X
attention (Q, K, V) = ai Vi (28) and V is different each time, and the results of the h times
i SDA mechanism are concatenated. They are then con-
At present, NLP research tends to use identical keys ducted in a linear transformation once more to obtain a
and values. value, which is the MHA result. The calculation is de-
scribed as follows:
4.2.9. MHA:
MHA is an attention mechanism variant that uses multi- multihead(Q, K, V) = concat(head1 , head2 , · · · , headh )W o
ple queries to extract multiple groups of different informa- (30)
tion in parallel from input information for concatenating.
The multiple attention mechanism is shown in Fig. 9. headi = attention(QWiQ , KWiK , VWiV ) (31)
First, a linear transformation is made for the query, key,
and value. Then, they are input into the Scaled Dot Prod- where WiQ ∈ Rdk d̄ , WiK ∈ Rdk d̄ , WiV ∈ RdV d̄ , and W o ∈ RhdV d̄
uct Attention (SDA) mechanism, and the same operation denote a single attention function with d-dimensional
keys, values, and queries. 5.1.5. Emoji2Vec:

Emoji2Vec [57] is a method for learning emoji repre-
sentations that contains a complete set of unicode emoji
5. Pre-training Method Overview representations, can be a pre-trained embedding for all
unicode emoji methods, and can be used to pre-train
a large text corpus. It can provide emoji embedding,
The emergence and development of pre-trained meth- but does not capture the context-dependent definitions of
ods has brought NLP into a new era. The related methods emoji (e.g., sarcasm, appropriation via other cultural phe-
have been divided into two categories: word-oriented rep- nomena).
resentation learning and sentence-oriented representation
learning. 5.1.6. Sentiment-specific Word Embedding (SSWE):
SSWE [58] can encode both positive/negative senti-
5.1. Word-oriented Representation Learning ment and syntactic contextual information in a vector
space. It has the effectiveness of incorporating sen-
In this subsection, we will provide an overview of timent labels in word-level information for sentiment-
related tasks compared with other word embedding meth-
of
word-oriented representation learning. The basic under-
standing for related approaches are summarized as fol- ods. However, it only focuses on binary labels, weakening
lows. its generalization ability on other affect tasks.
ro
5.1.7. fastText:
5.1.1. word2vec
p
The fastText library [59] can handle Out-Of-
The word2vec technique [53] consists of two models: Vocabulary (OOV) words by predicting their word
e-
continuous bag-of-words (CBOW) and continuous skip- vectors based on learned character n-grams embedding.
gram. The CBOW model uses the average/sum of context While it requires little training time, without sharing
words as input to predict a current word. The skip-gram parameters, it has poor generalization for large output
Pr
model uses a current word as input to predict each con- spaces.
textual word. Word2vec has fewer dimensions than previ-
ous embedding methods, making it faster, more versatile, 5.1.8. context2vec:
and able to be used by various NLP tasks. Although it has
al
The context2vec model [60] is a learning contextual

strong generality, it cannot be dynamically optimized for representation method for predicting a single word from
specific tasks or solve the problem of polysemy. both left and right contexts, based on Bi-LSTM. It can
n
learn generic context embedding of wide sentential con-

ur
texts, and can encode the context around a pivot word.

5.1.2. Bandwidth Expansions (BWEs):
BWEs [54] is an unsupervised neural model for learn- 5.1.9. REF:
Jo
ing bilingual semantic embedding of words across Chi-

REF [61] is a word vector refinement model that refines
nese and English. It is a bilingual word embedding
existing semantically oriented word vectors used senti-
method that applies initialization and optimization con-
ment lexicons. It can be applied to existing pre-trained
straints while using machine translation alignments. How-
word vectors (e.g., word2vec and GloVe). Both seman-
ever, it also cannot solve the problem of polysemy.
tic and sentiment word vectors can be obtained with this
model.
5.1.3. GloVe:
GloVe [55] is a word representation tool based on count 5.1.10. ELMo:
base and global corpus statistics. It calculates global co- ELMo [62] is a deep contextualized word representa-
occurrence statistics first by using a fixed-size context tion method based on Bi-LSTM that can solve the prob-
window, and then minimizes its least squares objective lem of polysemy. It can pre-train a large text corpus,
function by using the stochastic gradient descent, which and can model polysemy, but it also requires long train-
essentially factorizes the log co-occurrence matrix. It can ing times and cannot solve long distance dependency.
support parallelization and has appreciable speed, but also
requires more memory resources than word2vec. 5.1.11. KUBWE:
KUBWE [63] is a word embedding algorithm that
builds a symmetric co-occurrence matrix from the corpus,
5.1.4. BiDRL: then calculates an adjusted form of the pointwise mutual
A Bilingual Document Representation Learning information matrix to remove insignificant and uninfor-
(BiDRL) method [56] is used for cross-lingual sentiment mative co-occurrences. It uses a spherical representation
classification. It can learn vector representations for both for the latent space in which points are located on the sur-
words and documents in bilingual texts. face of a hypersphere.
10 S. Peng, et al.
5.1.12. Maximum A Posteriori (MAP): 5.2. Sentence-oriented Representation Learning

MAP [64] is a probabilistic word embedding model In this subsection, we will provide an overview of
based on MAP estimation. It is a generalized word embed- sentence-oriented representation learning. The basic un-
ding model that considers a wide range of parametrized derstanding for related approaches are summarized as fol-
GloVe variants and incorporates priors on those parame- lows.
ters. In the model, word vectors are learned by finding the
parameters that maximize a posterior probability.
5.2.1. Paragraph vector:
This unsupervised algorithm learns fixed-length se-
5.1.13. CoVe:
mantic representations from variable text lengths [72]. It
CoVe [65] is a contextualized representations method is a strong alternative sentence embedding model, and has
that uses LSTM. It can train context-based word vectors been widely applied to learning representations for se-
for machine translation. Due to the high training com- quential data.
plexity and high decoding delay of LSTM, this model’s
training time is excessive.
5.2.2. Skip-Thoughts:
Skip-Thoughts [73] is a method used to train a sentence
of
5.1.14. Emo2Vec:
encoder by predicting preceding and following sentences
Emo2Vec [66] is a multi-task learning method that en- using a current sentence. It follows the same idea as the
codes emotional semantics into vectors by using a CNN.
ro
skip-gram model of the word2vec embedding method. It
It is trained by six different emotion-related tasks and can can predict the probability of a sentence appearing in a
encode emotional semantics into real-valued, fixed-sized given context through the current sentence, but its model
p
word vectors. e- training speed is slow.
5.1.15. ULMFit: 5.2.3. DeepMoji:

ULMFit [67] is a transfer learning method that can be DeepMoji [74] is a model for detecting sentiment, emo-
Pr
applied to any NLP task. It consists of two pre-trained tion, and sarcasm by using an attention mechanism and a
models: a forward model trained from left to right, and a two-layer Bi-LSTM. However, the model still faces the
backward model trained from right to left. problem of long distance dependency.
al
5.1.16. NTUA SLP: 5.2.4. BERT:

NTUA-SLP [68] is a word embedding method based BERT [75] is a language representation model for bidi-
n
on word2vec. It consists of a two-layer Bi-LSTM network rectional encoder representations through transformers. It
with a deep self-attention mechanism. It can overcome the
ur
consists of two steps: pre-training and fine-tuning. It is

problem of OOV words. designed to pre-train deep bidirectional representations
from unlabeled text by jointly conditioning both left and
Jo
5.1.17. SVD NS: right contexts in all layers. It can obtain context sensi-
SVD NS [69] is a word embedding method in the tive bidirectional feature representation. However, there
context of NLP. It not only learns word-context co- is inconsistency between its pre-training process and gen-
occurrences, but also learns the abundance of unobserved eration process, which leads to poor effect on the gener-
or insignificant co-occurrences, improving word distribu- ation task. It also consumes more computing resources
tions in latent embedded space. than other existing models.
5.1.18. cw2vec: 5.2.5. InferSent:

The cw2vec [70] method is used for learning Chinese InferSent [76] is a universal representation method for
word embedding. It can use stroke n-grams to capture se- learning sentence embedding, based on a Bi-LSTM ar-
mantic and morphological level information of Chinese chitecture with max pooling. It is the first attempt to use
words. However, it only learns Chinese word embedding. the Stanford natural language inference to build sentence
encoders. However, the problem of long-distance depen-
5.1.19. MNLM: dency exists for this model.
This unsupervised Multilingual Neural Language
Model (MNLM) [71] is used for word embedding. It 5.2.6. CCTSenEmb:
can jointly learn word embedding of different languages CCTSenEmb [77] is an unsupervised method for dis-
in the same space, and can generate multilingual embed- covering hidden associations between sentences and inte-
ding without any parallel data or pre-training. However, it grating discriminative topics into the learning process. It
cannot exploit character and subword information. can leverage latent associations between sentences by di-
The specific comparison of existing word representa- rectly predicting a sentence given the semantic informa-
tion learning methods is listed in Table 1. tion of a neighboring sentence.
Table 1: Comparison of existing word representation learning methods
No. Name Method T asktype Language Year A f f iliation

feed forward neural
word2vec unsupervised multi-
1 network, logistic re- 2013 Google
[53] language
gression
unsupervised Chinese and
2 BWEs [54] neural network 2013 Stanford University
English
of
weighted least unsupervised multi-
3 Glove [55] 2014 Stanford University
squares regression language
semi- cross-
ro
4 BiDRL [56] logistics regression 2016 Peking University
supervised language
Emoji2Vec Princeton University and
5 logistic regression supervised English 2016
[57] University College London
6 SSWE [58]
feed forward neural
p
supervised English 2016
Harbin Institute of Technol-
network
e- ogy, et al.
multi-
7 fastText [59] probability statistics supervised 2016 Facebook
language
Pr
context2vec unsupervised multi-
8 Bi-LSTM 2016 Bar-Ilan University
[60] language
nearest neighbor
9 REF [61] NA English 2017 Yuan Ze University, et al.
ranking
al
semi- multi- Allen Institute for Artificial

10 ELMo [62] Bi-LSTM, CNN 2018
supervised language Intelligence, et al.
n
unsupervised
11 KUBWE [63] kernel-based English 2019 Dalhousie University
ur
weighted least unsupervised

12 MAP [64] English 2019 University of Kent, et al.
squares regression
English and
Jo
13 CoVe [65] LSTM supervised 2017 NA

German
Emo2Vec Hong Kong University of
14 CNN supervised English 2018
[66] Science and Technology
supervised,
multi- University of San Fran-
15 ULMFit [67] LSTM semi- 2018
language cisco, et al.
supervised
NTUA-SLP unsupervised multi- National Technical Univer-
16 Bi-LSTM, attention 2018
[68] language sity of Athens, et al.
singular value de- unsupervised
17 SVD-NS [69] English 2018 Dalhousie University
composition
unsupervised Ant Financial Services
18 cw2vec [70] stroke n-grams Chinese 2018
Group, et al.
unsupervised multi- Nara Institute of Science
19 MNLM [71] LSTM 2019
language and Technology, et al.
12 S. Peng, et al.
5.2.7. CAMSE: 5.2.14. DisSent:

CAMSE [78] is a multi-scale sentence embedding DisSent [85] uses a Bi-LSTM sentence encoder to yield
method for encoding sentences into an embedding tensor, high quality sentence embedding, using global max pool-
based on contextual self-attention and multi-scale tech- ing to construct the encoding for each sentence. It can
niques. It is a supervised learning framework sentence serve as a supervised fine-tuning dataset for large models
embedding to answer medical questions. However, this (e.g., BERT).
model also suffers from long-distance dependency. The specific comparison of existing sentence represen-
tation learning methods is listed in Table 2.
5.2.8. OPAI GPT:

6. DL Methods For TEA
OPAI GPT [79] is a semi-supervised method for lan-
guage understanding tasks that uses a combination of TEA can characterize the emotional attitudes of people
unsupervised pre-training and supervised fine-tuning. It from a multi-dimensional view. Existing TEA methods
is a two-stage training procedure that starts by using a based on DL are divided into four categories according
language modeling objective on unlabeled data to learn to their text structures and linguistic types: text-oriented
the initial parameters of a neural network model. It then monolingual methods, text conversation-oriented mono-
of
adapts these parameters to a target task using the corre- lingual methods, text-oriented cross-linguistic methods,
sponding supervised objective. It is a unidirectional auto- and emoji-oriented cross-linguistic methods.
ro
regressive language model, and cannot obtain context sen-
sitive feature representation. 6.1. Text-oriented Monolingual Emotion Analysis Models
p
DL methods have been proven effective for many NLP
5.2.9. FastSent: tasks, including sentiment and emotion analysis. The fol-
e-
FastSent [80] is a model for obtaining sentence embed- lowing are emotion analysis models for single language
ding that can predict words in context sentences based on based on DL methods.
Abdul-Mageed and Ungar [86] proposed a fine-grained
Pr
a current sentence. Its disadvantage is that it loses sen-
tence sequencing information. emotion detection method using Gated Recurrent Neu-
ral Networks (GRNNs). Tafreshi and Diab [87] proposed
a joint multi-task learning model using a GRNN, and
5.2.10. ERNIE: trained it with a multigenre emotion corpus to predict
al
ERNIE [81] uses a multi-layer transformer as basic en- emotions for four types of genres. Kulshreshtha et al. [88]
coder to capture contextual information. It is a method for proposed a neural architecture, Linguistic-featured Emoji-
n
learning language representation enhanced by knowledge based Partial Combination of Deep Neural Networks (LE-
PC-DNNs), for emotion intensity detection based on a
ur
masking strategies, which includes basic-level masking,

entity-level masking, and phrase-level masking. CNN. LE-PC-DNNs can combine CNN layers with fully
connected layers in a non-sequential or parallel fashion to
Jo
improve system performance.

5.2.11. GenSen: Mohammadi et al. [89] proposed a neural feature ex-
GenSen[82] is a sentence representation method that traction method for contextual emotion detection. The
combines the benefits of many sentence-representation model utilized an attention-based RNN and conducted ex-
learning models into a multi-task framework. It is a large- periments with Glove and ELMo embeddings, alongside
scale reusable sentence representation model obtained by Part-of-speech (POS) tags as input, LSTM and GRU as
combining a set of training objectives with the level of recurrent units, and a neural or a Support Vector Machine
diversity studied here (e.g., Skip-Thoughts), natural lan- (SVM) classifier. Li et al. [90] presented a method for
guage inference, machine translation, and constituency emotion classification of short text based on the skip-gram
parsing. model and LSTM. Rathnayaka et al. [91] presented an
approach for implicit emotion detection, called Sentylic,
based on bidirectional GRUs and a capsule network.
5.2.12. Universal Sentence Encoder (USE): Akhtar et al. [92] proposed a stacked ensemble method
A USE [83] provides sentence-level embedding in En- to predict emotion and sentiment intensity, designing
glish. It can achieve the best performance by making use three DL models based on CNN, LSTM, and GRU, re-
of sentence-level and word-level transfer. spectively, and one classical supervised model based on
Support Vector Regression (SVR). Batbaatar et al. [93]
proposed a neural network architecture, called Semantic-
5.2.13. Sent2Vec: emotion Neural Network (SENN), able to use both seman-
Sent2Vec [84] is a simple unsupervised model for tic/syntactic and emotion information by adopting pre-
learning universal sentence embedding by using word trained word representations. There are two sub-networks
vectors along with n-gram embedding. It can be used to for SENN: the first uses Bi-LSTM to capture contextual
train distributed representations of sentences. information and focuses on semantic relationship, while
Table 2: Comparison of existing sentence representation learning methods
No. Name Method T asktype Language Year A f f iliation
of
Paragraph unsupervised
1 log-bilinear English 2014 Google
Vector [72]
ro
Skip-
unsupervised
2 Thoughts RNN, GRU English 2015 University of Toronto, et al.
[73]
3
DeepMoji
Bi-LSTM, attention
p
supervised English 2017
Massachusetts Institute of
[74]
e- Technology, et al.
unsupervised multi-
4 BERT [75] Transformer 2018 Google
language
Pr
Bi-LSTM, max pool-
5 InferSent [76] supervised English 2017 Facebook, et al.
ing
CCTSenEmb unsupervised Beijing Institute of Tech-
6 Gaussian English 2019
[77] nology
al
Self-attention, Bi-
7 CAMSE [78] supervised English 2019 Tsinghua University
LSTM
n
OPAI GPT semi- multi-

8 Transformer 2018 OpenAI
[79] supervised language
ur
unsupervised multi- University of Cambridge, et

9 FastSent [80] log-bilinear 2016
language al.
unsupervised multi-
Jo
10 ERNIE [81] Transformer 2019 Baidu

language
multi- Microsoft Research Mon-
11 GenSen [82] GRU supervised 2018
language treal
unsupervised,
12 USE[83] Transformer, DAN English 2018 Google
supervised
Optimization theory, unsupervised
13 Sent2Vec [84] English 2018 Iprova SA, Switzerland
n-grams
14 DisSent [85] Bi-LSTM supervised English 2019 Stanford University
14 S. Peng, et al.
the second uses a CNN to extract emotion features. Zhang graph attention mechanism was designed to leverage com-
et al. [94] proposed a multi-task CNN for TEA, based on monsense knowledge, which augments the semantic in-
emotion distribution learning. formation of the utterance; an incremental transformer
Khanpour and Caragea [95] proposed a method for was used to encode multi-turn contextual utterances. in
emotion detection in online health communities, called addition, multi-task learning was used to improve the per-
ConvLexLSTM. It combined the output of a CNN with formance of emotion recognition.
lexicon-based features, then fed everything into a LSTM Jiao et al. [105] proposed a Hierarchical Gated Re-
network to produce the final output via the softmax mech- current Unit (HiGRU) framework with two Bi-GRUs,
anism. Yang et al. [96] proposed an interpretable neu- the lower-level Bi-GRU was used to learn the individ-
ral network model for the relevant emotion ranking, us- ual utterance embedding and the upper-level Bi-GRU was
ing a multi-layer feed-forward neural network. Kratzwald used to learn the contextual utterance embedding. Li et
et al. [97] proposed a text-based emotion recognition al. [106] proposed a fully data-driven Interactive Dou-
approach using an RNN, named sent2affect, that was a ble States Emotion Cell Model (IDS-ECM), for textual
tailored form of transfer learning for affective comput- dialogue emotion prediction. In the model, the Bi-LSTM
ing. Yang et al. [98] proposed a framework called In- and attention mechanism were used to extract the emotion
terpretable Relevant Emotion Ranking with Event-driven features. Li et al. [107] proposed a transformer-based
of
Attention (IRER-EA), based on an RNNs and the atten- context- and speaker-sensitive model for emotion detec-
tion mechanism. tion in conversations, namely HiTrans, which consists of
ro
The specific comparison of existing text-oriented two hierarchical transformers. One was used to generate
monolingual emotion analysis models is listed in Table local utterance representations using BERT, and another
3. was used to obtain the global context of the conversation.
6.2. Text Conversation-oriented Monolingual Emotion

p Ghosal et al. [108] proposed a method for emotion
detection in conversations, named COSMIC, which mod-
e-
Analysis Models eled various aspects of commonsense knowledge by con-
There are numerous emotions in textual conversa- sidering mental states, events, actions, and cause-effect
Pr
tions. As people use text messaging applications (such relations. Lu et al. [109] proposed an iterative emo-
as Wechat, Facebook) and conversation agents (such as tion interaction network for emotion recognition in con-
Amazon Alexa) to communicate more frequently than versations. The network consists of three components:
ever, context emotion detection in text is becoming more the utterance encoder, the emotion interaction based con-
al
important to emotion analysis. If we can effectively de- text encoder, and the iterative improvement mechanism.
tect the emotion in a conversation, it has great commer- Li et al. [110] proposed a Hierarchical Transformer (Hi-
n
cial value (e.g., online customer service of an e-commerce Transformer) framework to address utterance-level emo-
platform). tion recognition in dialog systems. It used a lower-level
ur
Ghosal et al. [99] presented the Dialogue Graph Con- transformer to model word-level input, an upper-level
volutional Network (DialogueGCN) for emotion recogni- transformer to capture the contexts of utterance-level em-
beddings, and BERT to obtain better individual utterance
Jo
tion in conversation based on the Bi-GRU. DialogueGCN

consists of three integral components, sequential context embeddings. Mundra et al. [111] proposed an Emo-
encoder, speaker-level context encoder, and emotion clas- tion Detection approach using Neural Networks driven by
sifier. Zhong et al. [100] proposed a Knowledge-Enriched Emotion Vectors (ED-NNEV), to predict the emotion cat-
Transformer (KET) framework for emotion detection in egory of each turn in a conversation, based on the CNN.
textual conversations. They used the hierarchical self- The specific comparison of existing text conversation-
attention to interpret contextual utterances, and used a oriented monolingual emotion analysis models is listed in
context-aware graph attention mechanism to leverage the Table 4.
external commonsense knowledge. Zhang et al. [101] In addition, SemEval-2019 Task 3 [112] introduced a
proposed a Graph-based Convolutional neural Network task to detect contextual emotion (e.g., happiness, sad-
towards Conversations, namely ConGCN, to model both ness, anger) in conversational text. Its purpose was to
context-level and speaker-level dependence for emotion invite research interest to the area of emotion detection
detection. in textual conversation.
Majumder et al. [102] presented a neural architecture, Agrawal and Suri [113] proposed a model, the Neu-
called DialogueRNN, which is based on the RNN to de- ral and Lexical Combiner (NELEC) that combined lexi-
tect emotion in a conversation, where the textual feature cal and neural features for emotion classification. Basile
of each utterance is extracted by the CNN. Ishiwatari et [114] designed different architectures (such as the three-
al. [103] proposed a relational position encodings method input, two-output, Universal Sentence Encoder (USE),
based on Relational Graph ATtention networks (RGAT) to and Bidirectional Encoder Representations from Trans-
recognize human emotions in textual conversation. Zhang formers (BERT) models) based on DL for emotion clas-
et al. [104] proposed a Knowledge Aware Incremental sification. Huang et al. [115] proposed an ensemble ap-
Transformer with Multi-task Learning (KAITML) to con- proach for emotion detection comprised of two DL mod-
duct emotion classification. In KAITML, a dual-level els, the Hierarchical LSTMs for Contextual Emotion De-
Table 3: Comparison of existing monolingual emotion analysis models
No. Name DL method Pre-training Year Dataset Accuracy

Abdul-Mageed’s
1 GRNN NA 2017 Twitter 0.8758
[86]
fastText, TweetEN: 0.781,
of
TweetEN, BLG+HLN,
2 Tafreshi’s [87] GRU word2vec, 2018 BLG+HLN:
MOV
Glove 0.836, MOV: 0.91
ro
DeepMoji,
3 LE-PC-DNN [88] CNN 2018 EmoInt-2017 0.791
word2vec
GRU, LSTM,
Mohammadi’s Glove, SemEval 2019 (Emo-
p
4 SVM, atten- 2019 0.7303
[89] ELMo Context)
tion
e-
5 Li’s [90] LSTM word2vec 2017 WeChat 0.2512
Bi-GRU, cap-
6 Sentylic [91] word2vec 2018 WASSA 2018 0.692
Pr
sule networks
CNN, LSTM, GloVe and EmoInt-2017, SemEval-
7 Akhtar’s [92] 2020 0.748
GRU word2vec 2017
Dailydialogs, Crowd-
al
Flower, TEC, Tales- 0.848, 0.511,

word2vec, Emotions, ISEAR, 0.613, 0.746,
CNN, Bi-
8 SENN [93] GloVe, and 2019 EmoInt, Electoral- 0.910, 0.563,
n
LSTM
FastText Tweets, Grounded- 0.593, 0.988 and
ur
Emotions, Emotion- 0.708

Cause, SSEC
9 Zhang’s [94] CNN word2vec 2018 SemEval-2007 0.4141
Jo
ConvLexLSTM Cancer Survivors’ Net- Joy: 0.932, Sad:

10 CNN, LSTM word2vec 2018
[95] work 0.923
multi-layer
Sina Social News, Ren- News: 0.7108,
feed-forward
11 Yang’s [96] NA 2018 CECps corpus, SemEval Blogs: 0.6187,
neural net-
2007 SemEval: 0.7081
work
SemEval 2007:
SemEval 2007, SemEval
12 sent2affect [97] RNN GloVe 2018 0.584, SemEval
2018
2018: 0.586
SemEval 2007, Ren- News: 0.7379,
RNN, atten-
13 IRER-EA [98] Glove 2019 CECps corpus, Sina Blogs: 0.6304,
tion
Social News SemEval: 0.7538
16 S. Peng, et al.
Table 4: Comparison of existing text conversations-oriented monolingual emotion analysis models
macro-
No. Name DL method Pre-training Year Dataset
F1/weighted-F1
of
DialogueGCN
1 GRU, GCN Golve 2019 IEMOCAP 0.6418
[99]
ro
EC, DailyDialog, 0.7413, 0.5337,
2 KET [100] MHA Golve 2019 MELD, EmoryNLP, 0.5818, 0.3439,
IEMOCAP 0.5956
p
graph con-
3 ConGCN [101] volutional GloVe 2019 MELD 0.574
e-
network
DialogueRNN RNN, CNN,
4 NA 2019 IEMOCAP 0.6275
[102] attention
Pr
relational
DailyDialog, MELD, 0.5431, 0.6091,
5 RGAT [103] graph atten- BERT 2020
EmoryNLP, IEMOCAP 0.3442, 0.6522
tion networks
al
graph at-
tention EC, DailyDialog, 0.7539, 0.5471,
6 KAITML [104] mechanism, GloVe 2020 MELD, EmoryNLP, 0.5897, 0.3559,
n
incremental IEMOCAP 0.6143

ur
transformer
Friends, EmotionPush, 0.744, 0.771,
7 HiGRU [105] Bi-GRU word2vec 2019
IEMOCAP 0.821
Jo
8 IDS-ECM [106] Bi-LSTM DeepMoji 2020 DailyDialog, EC 0.3885, 0.3623

relational
MELD, EmoryNLP, 0.6194, 0.3675,
9 HiTrans [107] graph atten- BERT 2020
IEMOCAP 0.6450
tion networks
DailyDialog, MELD, 0.5105, 0.6521,
10 COSMIC [108] Bi-GRU RoBERTa 2020
EmoryNLP, IEMOCAP 0.3811, 0.6528
11 Lu’s [109] Bi-GRU GloVe 2020 MELD, IEMOCAP 0.6072, 0.6437
HiTransformer Bi-LSTM, Friends, EmoryPush, 0.6788, 0.6543,
12 BERT 2020
[110] MHA EmoryNLP 0.3304
13 ED-NNEV [111] CNN word2vec 2017 Chats data of phone 0.7438
tection model, and the BERT model. mechanism. BAN can aggregate monolingual and bilin-
Winata et al. [116] used the hierarchical attention for gual informative words to form vectors from document
dialogue emotion classification based on logistic regres- representations; it can also integrate attention vectors to
sion and XGBoost. Bae et al. [117] proposed a method conduct emotion prediction. Zhou et al. [126] proposed
to detect emotion using a Bi-LSTM encoder for higher an attention-based cross-lingual sentiment classification
level representation. Liang et al. [118] proposed hierar- model that learns the distributed semantics of documents
chical ensemble classification of contextual emotion using in both source and target languages. In each language,
three sets of CNN-based neural network models trained they used LSTM to model documents and introduced a
for four-emotion classification, Angry-Happy-Sad classi- hierarchical attention mechanism for the model. Chen
fication, and Others-or-not classification respectively. et al. [127] presented an Adversarial DAN (ADAN) for
Xiao [119] designed ensemble of transfer learning cross-lingual sentiment classification. ADAN could trans-
methods using pre-trained language models (ULMFiT, fer knowledge learned from labeled English data to Chi-
OpenAI GPT, and BERT). He also trained a DL model nese and Arabic, where little or no annotated data existed.
from scratch using pre-trained word embedding and Bi- Zhou et al. [128] proposed an Bilingual Sentiment
LSTM architecture with the attention mechanism. The Word Embeddings (BSWE) method, based on DL tech-
conducted experimental result reveals that ULMFiT can nology, for English-Chinese cross-language sentiment
of
perform best due to its fine-tuning technique. Li et al. classification. BSWE could use DAE to learn bilingual
[120] proposed a multi-step ensemble neural network for embeddings for Cross-language Sentiment Classification
ro
emotion analysis in text. They used four DL models (CLSC). Feng and Wan [129] proposed a Cross-Language
(LSTM, GRU, CapsuleNet, and Self-Attention) obtained In-domain Sentiment Analysis (CLIDSA) model based
eight different models by combining two different word on LSTM. It was an end-to-end method that leveraged
embedding models. They then used Dropout to support
improved model convergence. Finally, at each model out-
p unlabeled data in multiple languages and multiple do-
mains. Barnes et al. [130] proposed a Bilingual Senti-
e-
put, the four predicted probability categories were ob- ment Embeddings (BLSE) model that used a two-layer
tained. Ragheb et al. [121] presented a model to detect feed-forward averaging network to predict text sentiment.
Pr
textual conversational emotion. They used deep transfer Ahmad et al. [131] built a DL model for emotion de-
learning, self-attention mechanisms, and turn-based con- tection of Hindi language. They used a CNN, Bi-LSTM,
versational modeling to classify emotion. cross-lingual embeddings, and different transfer learning
Lee et al. [122] proposed a multi-view turn-by-turn strategies for their purpose.
al
model. In this model, the vectors were generated from The specific comparison of existing text-oriented cross-
each utterance using two encoders: a word-level Bi-GRU linguistic emotion analysis models is listed in Table 6.
encoder and a character-level CNN encoder. The model
n
could predict emotion with the contextual information, 6.4. Emoji-oriented cross-linguistic emotion analysis
ur
which was grasped by combining the vectors. Ma et al. model

[123] proposed a DL architecture that combined the Bi-
LSTM and the attention mechanism, to extract emotion Emoji are defined by the Oxford Dictionary [132] as
Jo
information from an utterance. Ge et al. [124] proposed “A small digital image or icon used to express an idea
an attentional LSTM-CNN model for dialog emotion clas- or emotion”. As a way to enhance the visual effect and
sification. They used a combination of CNNs and long- meaning of short text, emojis are becoming one of the
short term neural networks to capture both local and long- indispensable components in any instant messaging plat-
distance contextual information in conversations. In ad- form or social media service. Due to emojis becoming
dition, they applied the attention mechanism to recognize increasingly important in emotion analysis on social net-
and attend to important words within conversations. They works, the SemEval-2018 task 2: Emoji Prediction in En-
also used ensemble strategies by combing the variants of glish and Spanish [133], was introduced in 2018. The aim
the proposed model with different pre-trained word em- was to attract greater NLP attention. The basic under-
bedding via weighted voting. standing for related methods are summarized as follows.
The specific comparison of existing text conversation- Coltekin and Rama [134] designed a supervised system
oriented monolingual emotion analysis models from consisting of an SVM classifier with a bag-of-n-grams
SemEval-2019 Task 3 is listed in Table 5. features. Baziotis et al. [135] proposed a architecture
to predict emojis using Bi-LSTM and a context-aware at-
tention mechanism. Beaulieu and Owusu [136] proposed
6.3. Text-oriented Cross-linguistic Emotion Analysis
a method to predict English and Spanish emojis using a
Models
bag-of-words model and a linear SVM. Coster et al. [137]
In this subsection, we will provide an survey on built a linear SVM model to predict emoji in Spanish
the text-oriented cross-linguistic emotion analysis model. tweets using the SKLearn SGDClassifier.
The basic understanding for related approaches are sum- Jin and Pedersen [138] built a classifier for Span-
marized as follows. ish emoji prediction using naive Bayes, logistic regres-
Wang et al. [125] proposed a Bilingual Attention Net- sion, and random forests. Basile and Lino [139] pre-
work (BAN) model based on LSTM and the attention sented an approach to predict Spanish emoji based on the
18 S. Peng, et al.
Table 5: Comparison of existing text conversations-oriented monolingual emotion analysis models from SemEval-2019 Task 3
No. S ystem Ranking DL method Pre-training macro-F1 Country

1 NELEC [113] 3 GRU, LSTM, attention Emoji2Vec, GloVe 0.7765 U.S.
SymantoResearch Germany
2 4 Bi-LSTM, attention BERT 0.7731
[114]
multi-head self- BERT, GloVe, ELMO,
3 ANA [115] 5 0.7709 Canada
attention, LSTM DeepMoji
CAiRE HKUST LSTM, hierarchical at- BERT, GloVe, ELMO,
4 6 0.7677 China
[116] tention DeepMoji
Bi-LSTM, multi- Word2Vec, ELMO,
5 SNU IDS [117] 7 0.7661 Korea
dimensional attention Emoji2Vec
Bi-LSTM, LSTM,
6 THU-HCSI [118] 8 word2vec, NTUA-SLP 0.7616 China
of
CNN, attention
ULMFit, BERT, NTU-
7 Figure Eight [119] 9 Bi-LSTM, attention ASLP, DeepMoji, 0.7608 U.S.
ro
OpenAI- GPT
Bi-LSTM, GRU,
8 YUN-HPCC [120] 10 ELMO, GloVe 0.7588 China
p
Capsule-Net, attention
LIRMM-Advanse Bi-LSTM, AWD-
9 11
e- ULMFit 0.7582 France
[121] LSTM, attention
10 MILAB [122] 12 CNN, Bi-GRU GloVe 0.7581 Korea
11 PKUSE [123] 14 Bi-LSTM, attention GloVe 0.7557 China
Pr
GloVe, word2vec,
12 THU NGN [124] 15 CNN, LSTM, attention 0.7542 China
ekphrasis
n al
ur
Table 6: Comparison of existing text-oriented cross-linguistic emotion analysis models

Jo
No. Name DL method Pre-training Year Dataset Accuracy

1 BAN [125] LSTM, attention Skip-gram 2016 Weibo 0.672
LSTM, Bi-LSTM,
2 Zhou’s [126] NA 2016 NLPCC 2013 0.824
attention
Yelp reviews, Chi-
3 ADAN [127] DAN BWE 2018 0.4249, 0.5454
nese hotel reviews
BSWE-CLSC denoising autoen-
4 BSWE 2015 NLPCC 2013 0.8068
[128] coder
Unsupervised
5 CLIDSA [129] LSTM 2019 Amazon review 0.8483
CLCA
OpeNER English
and Spanish datasets, ES: 0.803, CA:
6 BLSE [130] DAN word2vec 2018
MultiBooked Cata- 0.85, EU: 73.5
lan and Basque
Emo-Dis-HI:
fastText, SemEval-2018,
0.477, Emo-
7 Ahmad’s [131] CNN, Bi-LSTM alignment 2020 Emo-Crowd-EN,
SemEval-EN:
matrices Hindi review
0.863
SVM model. Liu [140] presented a model for English which has serious negative impact on the research and
emoji prediction using a gradient boosting regression tree performance of non-English language emotion analysis
method. Lu et al. [141] proposed a method to address methods.
Twitter emoji prediction based on Bi-LSTM and the at-
tention mechanism.
The specific comparison of existing emoji-oriented 7.4. Domain Relevance
cross-linguistic emotion analysis models is listed in Table
Descriptive words and phrases, such as “a long time",
7.
can express different emotions depending on their do-
main. For instance, food and beverage reviews often ex-
7. Challenges of Emotion Analysis press negative emotion in relation to long waiting times,
while smartphone reviews express positive emotion in re-
Due to increasing development of social networks and lation to long battery standby times [145]. Thus, the do-
DL technology, unprecedented challenges to emotion main relevance of words must be considered by emotion
analysis have been posed. Though many researchers have analysis. Cross-domain emotion analysis presents numer-
proposed potential solutions for some of the discussed is- ous pressing problems for resolution.
sues, there are still many other open issues requiring fur-
of
ther exploration and deep study [142]. In this section, we
summarize the challenges of emotion analysis, and point 7.5. Understanding Short Texts
ro
out the future trends in this field.
Social networks limit the length of their commentary,
7.1. Emotion Description making short text (with its sparseness, non-standard use
p
of words, and massive data) common instead of tradi-
At present, there is no unified definition for emotion
tional long text. At the same time, insufficient contextual
and no unified standard to classify emotions effectively
e-
semantic information, single-word polysemy, and multi-
and scientifically, which may affect emotional feature
word synonymy make topic information extraction diffi-
extraction performance involving texts. However, be-
cult to perform accurately, affecting final emotion analy-
Pr
cause of the three unique components of human emotion
sis performance. Thus, understanding short texts is a very
(physiological arousal, subjective experience, and exter-
challenging task in NLP.
nal expression), different fields possess different under-
standings. For example, social psychology, developmen-
al
tal psychology, and neuroscience deem it impossible for 7.6. Emotion Cause Extraction (ECE)
researchers to have the same understanding of emotion
n
[142]. Thus, there is difficultly in determining a unified ECE aims to identify important potential causes or
standard to accurately characterize human emotions. stimuli for observed emotions during in-depth emotion
ur
analysis [146]. However, most existing works focus

7.2. Data Imbalance on annotating emotions before cause extraction, which
greatly limits the latter’s application in real-world sce-
Jo
Emotion classification has made great progress in NLP.

However, most existing works assume there are as many narios, and ignores mutual indications between two emo-
positive samples as negative samples, while positive and tions. In addition, due to the inherent subtlety and am-
negative samples are often distributed unevenly in prac- biguity of emotional expression, ECE has become a very
tice. Emotion analysis is a more fine-grained classi- challenging task.
fication based on sentiment analysis, yet most of that
work also assumes balanced sample sizes for each emo- 7.7. DL Model Training
tion category, which is not consistent with reality [143].
Thus, when methods suitable for balanced classification The repetitious process of DL adjusts model param-
are used to deal with unbalanced data, analysis results of- eters; during DL model training, model training speed
ten fail to achieve their intended effects, directly affecting presents the biggest problem due to slow convergence
the performance of emotion classification. speeds and long training times. Thus, model training ef-
ficiency deserves consideration. Improving model con-
7.3. Language Imbalance vergence speed requires reduced iterations and consistent
Most existing emotion analysis methods are aimed at training times, while improving training speeds requires
English texts. Some recent methods have focused on Chi- a reduced number of training passes and, consequently,
nese texts, but these methods are based on emotion dic- reduced opportunity to attempt differing super parame-
tionary or semantic knowledge base that rely on external ters. Both improvements will affect model accuracy. In
resources of specific language [144]. It is difficult to trans- addition, training model performance relates to training
fer English-based emotion analysis methods to other lan- dataset size, with larger datasets producing better results
guages (e.g., Japanese and French). In addition, training [147]. However, larger datasets increase training times
and test sets of non-English emotional analysis are rela- and can require larger amounts of computing resources
tively scarce, particularly uncommon language resources, (e.g., GPUs).
20 S. Peng, et al.
Table 7: Comparison of existing emoji-oriented cross-linguistic emotion analysis models
No. S ystem DL method Pre-training EnglishRanking Spanish Ranking Country

Tubingen-Oslo Germany,
1 SVM NA 1 1
[134] Norway
Bi-LSTM, at-
2 NTUA-SLP [135] word2vec 2 NA Greece
tention
gradient boost-
3 EmoNLP [136] ing regression NA 4 NA NULL
tree
Bi-LSTM, at- POS embed-
4 ECNU [137] 5 7 China
tention ding
UMDuluth-
5 SVM NA 6 3 U.S.
CS8761 [138]
SVM, gradi-
Hatching Chick
6 ent descent NA 29 2 Holland
[139]
of
optimization
POS embed-
7 TAJJEB [140] SVM 8 4 Malta, Spain
ding
ro
naive Bayes,
Duluth UROP logistic regres-
8 NA 18 5 U.S.
[141] sion, random
forests
p
e-
8. Future Research Trends video, and image) [150, 151] can often express emotional
Pr
effect than text with greater description and vividness. In
As Internet +, AI +, 5G, and other opportunities
addition, as a main carrier of emotional information ex-
arise, many new applications (e.g., multi-language, multi-
pression, voice can accurately reflect current user emo-
modal, cross-domain, big data) have emerged and pro-
tions. Thus, allowing for the combined study of vari-
al
vided fresh opportunities for emotion analysis develop-

ous social media big data types (e.g., image, audio, text,
ment. As emotion analysis plays an important role in
video), improved application prospects will become avail-
grasping public sentiment trends quickly, predicting pub-
n
able for researching multi-model user emotion analysis.

lic opinions of relevant development trends, and satisfying
ur
human daily needs, emotion analysis will change qualita-

tively by integrating different media, forms, scales, and 8.3. Cross-domain Emotional Analysis
domains of emotional information. In addition, the rapid The main idea of the cross-domain emotion analysis
Jo
development of social network analysis and DL technol- method is that emotions present in current comment in-
ogy offer many new research directions for emotion analy- formation can be identified accurately and quickly, pro-
sis. Some researchers are gradually changing the focus of vided said information contains words expressing various
research in emotion analysis, from single language, sin- emotions from different domains. However, traditional
gle media, single domain, and small-scale data samples to emotion analysis methods often ignore domain depen-
multi-language, multi-modal, cross-domain, and big data dency characteristics of emotional words, and may even
[148, 149]. According to existing technology develop- deliberately choose domain-independent features (such as
ment trends, future emotion analysis research will include emoji). With increasing demand for practical application
the following aspects. and the emergence of emotional corpus resources in dif-
ferent domains, cross-domain emotion analysis will draw
8.1. Multi-language Emotional Analysis greater research attention and focus.
Due to increasing cultural exchanges, multi-language
network information affects and merges with itself. Exist-
8.4. Emotion Analysis based on Social Network Analysis
ing work has focused on a single language and corpus re-
sources collected for a single-language emotion analysis With the rapid development of social networks, a large
model cannot be applied to multi-language emotion anal- amount of user interaction data has been generated. These
ysis. In addition, corpus resources for the emotion anal- data not only reflect static user characteristics (e.g., num-
ysis of different languages are also unbalanced, making ber of friends, activity, frequency of surfing the Inter-
their application to multi-language environments difficult. net), but also reflect dynamic user characteristics (e.g.,
thoughts, social relations, social influence). Through the
8.2. Multi-modal Emotional Analysis analysis of social networks, we can understand how differ-
While traditional emotion analysis focuses on single ent individuals and social groups express their emotions
forms of media, multi-modal information (e.g., audio, and how group emotional tendencies relate to popular
events. Therefore, research on emotion analysis technol- 9. Conclusions

ogy based on social network analysis can better describe
public opinion trends while providing technical support In this survey, our purpose was to review existing stud-
for applications involving big search, public opinion anal- ies on DL-based TEA solutions and provide a comprehen-
ysis, personalized recommendation and so on. sive understanding for new researchers. We introduced
the background of TEA, a brief on sentiment analysis ap-
8.5. Emotion Analysis based on Big Data Analysis proaches, the current state-of-the-art, challenges, and fu-
ture research trends. We began by introducing preliminar-
With the increasing scale of social networks, massive
ies, such as emotion definition and classification, a sum-
data are produced every day. Mining these data can pro-
mary of emotion analysis applications, basic DL methods,
duce substantially valuable information for products and
and pre-training methods. We then reviewed the literature
services, but a significant portion of that data is stored in
based on various DL methods and compared these stud-
an unstructured form after being collected by crawlers.
ies according to our understanding. Finally, we presented
When emotion analysis is carried out on text data, tra-
readers with challenges and future directions for emotion
ditional methods of probabilistic latent semantic analysis
analysis. We hope that this survey can provide a good ref-
will have difficulty meeting the needs set by large-scale
erence for designing DL-based emotion analysis models
data training. This will drive method proposals for emo-
of
with improved performance.
tion analysis based on big data.
ro
8.6. In-depth Emotion Analysis 10. Declaration of competing interest
The purpose of extracting emotion cause is to recognize
The authors declared that they have no conflicts of in-
the potential cause or stimulus of an observed emotion.
p
terest to this work.
Existing methods of emotion analysis focus on the shal-
low tasks of emotion recognition and classification. How-
e-
ever, emotion cause identification requires in-depth emo- Acknowledgements
tion analysis that focuses on emotional keywords in text
Pr
to identify causes automatically. Although current main- This work is partially supported by the National Nat-
stream methods are based on linguistic rules and statistics, ural Science Foundation of China under Grant Nos.
the wide application of DL will continue to attract increas- 61876205 and 61877013, the Ministry of education of
ing attention in ECE research. Humanities and Social Science project under Grant Nos.
al
19YJAZH128 and 20YJAZH118, the Science and Tech-

8.7. Automatic Recognition of Negative Emotions in nology Plan Project of Guangzhou under Grant No.
201804010433, and the Bidding Project of Laboratory of
n
Short Texts
At present, there are a large number of comments Language Engineering and Computing under Grant No.
ur
on Wechat, Twitter, Taobao, and other social networks. LEC2017ZBKT001.

Much of this massive amount of short text information
Jo
contains negative emotion, making the automatic identifi- References

cation of negative emotion from such information an ur-
[1] S. Bharti, B. Vachha, R. Pradhan, K. Babu, S. Jena, Sarcastic
gent need along with mastering the intelligent extraction sentiment detection in tweets streamed in real time: a big data
of negative emotion features. Such needs will play impor- approach, Digital Communications and Networks 2 (2016) 108–
tant roles in national cyberspace security, driving greater 121.
numbers of scholars to study automatic negative emotion [2] Y. Hao, Q. Zheng, Y. Chen, C. Yan, Recognition of abnormal
behavior based on data of public opinion on the web, Journal of
recognition in short texts. Computer Research and Development 53 (3) (2016) 611–620.
[3] B. Fang, Y. Jia, A. Li, L. Yin, Research progress and trend of cy-
8.8. Negative Emotion Evolution Analysis berspace big search, Journal on Communications 36 (12) (2015)
1–8.
Due to expression methods, popular events drive nega- [4] D. Paul, F. Li, M. K. Teja, X. Yu, R. Frost, Compass: Spatio
tive emotions with the highest concentration of these emo- temporal sentiment analysis of US Election what Twitter says!,
tions being expressed on social networks. Such networks in: Proceedings of the 23rd ACM International Conference on
(e.g., Microblog) often have propagation characteristics Knowledge Discovery and Data Mining, Halifax, Canada, 2017,
pp. 1585–1594.
of “weak information and strong emotion", which has [5] L. Zhang, C. Xu, Y. Gao, Y. Han, X. Du, Z. Tian, Improved dota2
caused negative emotions to be propagated widely across lineup recommendation model based on a bidirectional lstm, Ts-
Microblog [142]. Once information with negative emo- inghua Science and Technology 25 (6) (2020) 712–720.
tions is released, it may propagate by means of nuclear [6] M. D. Choudhury, S. Counts, E. J. Horvitz, A. Hoff, Characteriz-
ing and predicting postpartum depression from shared Facebook
fission. If it is amplified by users with great social influ- data, in: Proceedings of the 17th ACM Conference on Computer
ence (e.g., opinion leaders) [152, 153], it will influence Supported Cooperative Work & Social Computing, Baltimore,
and possibly become public opinion. Thus, characteriz- MD, USA, 2014, pp. 626–638.
[7] C. Tucker, B. Pursel, A. Divinsky, Mining student-generated tex-
ing the internal evolution of negative emotions for popular
tual data in MOOCs and quantifying their effects on student per-
events will become a notable focus of NLP research and formance and learning outcomes, The ASEE Computers in Edu-
cyberspace security. cation (CoED) Journal 5 (4) (2014) 84.
22 S. Peng, et al.
[8] Development Report of China New Media£o Propagation of works, IEEE Transactions on Signal Processing 45 (11) (1997)
Microblog, forum and other we-media increase the threat on 2673–2681.
modern society, China Reading Newspaper (November 28 2012). [29] K. Cho, B. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares,
URL http://epaper.gmw.cn/zhdsb/html/2012-11/28/ H. Schwenk, Y. Bengio, Learning phrase representations using
nw.D110000zhdsb_20121128_3-18.htm RNN encoder-decoder for statistical machine translation, in: Pro-
[9] H. Zhu, X. Shan, J. Hu, 2011 China Internet public opinion anal- ceedings of the 2014 Conference on Empirical Methods in Natu-
ysis report (full text) (July 2012). ral Language Processing, Doha, Qatar, 2014, pp. 1724–1734.
URL http://yuqing.people.com.cn/n/2012/0727/c209 [30] L. Itti, C. Koch, Computational modelling of visual attention, Na-
170-18615551.html ture reviews neuroscience 2 (3) (2001) 194–203.
[10] S. Peng, G. Wang, Y. Zhou, C. Wan, C. Wang, S. Yu, J. Niu, [31] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
An immunization framework for social networks through big data A. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in:
based influence modeling, IEEE Transactions on Dependable and Proceedings of the 31st Conference on Neural Information Pro-
Secure Computing 16 (6) (2019) 984–995. cessing Systems, Long Beach, California, USA, 2017, pp. 5999–
[11] Z. Zhang, X. Li, C. Gan, Identifying influential nodes in so- 6009.
cial networks via community structure and influence distribution [32] Y. Lv, F. Wei, L. Cao, S. Peng, J. Niu, S. Yu, C. Wang, Aspect-
difference, Digital Communications and Networks 7 (1) (2021) level sentiment analysis using context and aspect memory net-
131–139. work, Neurocomputing 428 (2021) 195–205.
[12] D. Camachoa, A. Panizo-LLedot, G. Bello-Orgaz, A. Gonzalez- [33] Dictionary by merriam-webster, emotion.
Pardo, E. Cambria, The four dimensions of social network analy- URL https://www.merriam-webster.com/dictionary/e-
sis: An overview of research methods, applications, and software motion
of
tools, Information Fusion 63 (2020) 88–120. [34] X. Huang, Introduction to Psychology, OBeijing: Peoples Edu-
[13] S. Peng, Y. Zhou, L. Cao, S. Yu, J. Niu, W. Jia, Influence analysis cation Press, 1991.
in social networks: a survey, Journal of Network and Computer [35] E. Hudlicka, Guidelines for designing computational models
ro
Applications 106 (2018) 17–32. of emotions, International Journal of Synthetic Emotions 2 (1)
[14] The 38th statistical report on the development of china’s internet (2011) 26–79.
(August 2016). [36] M. Munezero, C. Montero, E. Sutinen, J. Pajunen, Are they dif-
p
URL http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwt- ferent? Affect, feeling, emotion, sentiment, and opinion detection
jbg/20-1608/t20160803_54392.htm in text, IEEE Transaction Affective Computing 5 (2) (2014) 101–
[15] S. Poria, E. Cambria, R. Bajpai, A. Hussain, A review of affec- 111.
e-
tive computing: From unimodal analysis to multimodal fusion, [37] B. Liu, Sentiment Analysis: Mining Opinions, Sentiments, and
Information Fusion 37 (2017) 98–125. Emotions, Cambridge University Press, 2015.
[16] R. Li, Z. Lin, H. Lin, W. Wang, D. Meng, Text emotion analysis: [38] S. Poria, N. Majumder, R. Mihalcea, E. Hovy, Emotion recog-
Pr
a survey, Journal of Computer Research and Development 55 (1) nition in conversation: research challenges, datasets, and recent
(2018) 30–52. advances, IEEE Access 7 (2019) 100943–100953.
[17] M. Bouazizi, T. Ohtsuki, Multi-class sentiment analysis on twit- [39] P. Ekman, An argument for basic emotions, Cognition and Emo-
ter: classification performance and challenges, Big Data Mining tion 6 (3/4) (1992) 169–200.
and Analytics 2 (3) (2019) 181–194. [40] W. Parrott, Emotions in social psychology: essential readings,
al
[18] M. Usama, B. Ahmad, E. Song, M. S. Hossain, M. Alrashoud, Oxford, UK: Psychology Press, 2001.
G. Muhammad, Attention-based sentiment analysis using convo- [41] R. Plutchik, The nature of emotions, Philosophical Studies 89 (4)
lutional and recurrent neural network, Future Generation Com- (2001) 393–409.
n
puter Systems 113 (2020) 571–578. [42] C. Lin, Emotional problems in socialist psychology, Science of
[19] E. Cambria, S. Poria, A. Gelbukh, M. Thelwall, Sentiment anal- Social Psychology 21 (83) (2006) 37–62.
ur
ysis is a big suitcase, IEEE Intelligent Systems 32 (6) (2017) 74– [43] A. Ceron, L. Curini, S. M. Iacus, G. Porro, Every tweet counts?
80. How sentiment analysis of social media can improve our knowl-
[20] M. Zhou, N. Duan, S. Liu, H.-Y. Shum, Progress in neural edge of citizens political preferences with an application to Italy
Jo
NLP: modeling, learning, and reasoning, Engineering (2020) doi: and France, New Media and Society 16 (2) (2014) 340–358.
https://doi.org/10.1016/j.eng.2019.12.014. [44] B. Alkouz, Z. Aghbari, J. Abawajy, Tweetluenza: predicting flu
[21] T. S. amd M.A. Chishti, Deep learning for the internet of things: trends from twitter data, Big Data Mining and Analytics 2 (4)
potential benefits and use-cases, Digital Communications and (2019) 273–287.
Networks, https://doi.org/10.1016/j.dcan.2020.12.002. [45] J. Zhang, Y. Wang, Z. Yuan, Q. Jin, Personalized real-time movie
[22] J. Deng, W. Dong, R. Socher, L. Li, K. Li, F. Li, ImageNet: a recommendation system: practical prototype and evaluation, Ts-
large-scale hierarchical image database, in: Proceedings of the inghua Science and Technology 25 (2) (2020) 180–191.
2009 IEEE Conference on Computer Vision and Pattern Recog- [46] P. Zhang, X. Huang, L. Zhang, Information mining and
nition, Miami, FL, USA, 2009, pp. 248–255. similarity computation for semi- / unstructured sentences
[23] M. Iyyer, V. Manjunatha, J. Boyd-Graber, H. D. III, Deep un- from the social data, Digital Communications and Networks,
ordered composition rivals syntactic methods for text classifica- https://doi.org/10.1016/j.dcan.2020.08.001.
tion, in: Proceedings of the 53rd Annual Meeting of the Associa- [47] H. Chen, C. Yin, W. R. R. Li, Z. Xiong, B. David, Enhanced
tion for Computational Linguistics and the 7th International Joint learning resource recommendation based on online learning style
Conference on Natural Language Processing, Beijing, China, model, Tsinghua Science and Technology 25 (03) (2020) 348–
2015, pp. 1681–1691. 356.
[24] P. Vincent, H. Larochelle, Y. Bengio, P. A. Manzagol, Extract- [48] C. Yang, X. Lai, Z. Hu, Y. Liu, , P. Shen, Depression tendency
ing and composing robustfeatures with denoising autoencoders, screening use text based emotional analysis technique, Journal of
in: Proceedings of the 25th International Conference on Machine Physics: Conference Series 1237 (2019) 1–10.
Learning, USA, 2008, pp. 1096–1103. [49] Z. Xie, Modelling the dropout patterns of mooc learners, Ts-
[25] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learn- inghua Science and Technology 25 (3) (2020) 313–324.
ing applied to document recognition, Browse Journals & Maga- [50] J. Liao, J. Tang, X. Zhao, Course drop-out prediction on mooc
zines 86 (11) (1998) 2278–2324. platform via clustering and tensor completion, Tsinghua Science
[26] C. L. Giles, G. M. Kuhn, R. J. Williams, Dynamic recurrent neu- and Technology 24 (4) (2019) 412–422.
ral networks: Theory and applications, IEEE Transactions Neural [51] R. Salakhutdinov, G. Hinton, Deep Boltzmann machines, in: Pro-
Networks 5 (1994) 153–156. ceedings of the 12th International Conference on Artificial Intel-
[27] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural ligence and Statistics, Florida, USA, 2009, pp. 448–455.
Computation 9 (8) (1997) 1735–1780. [52] X. Xi, G. Zhou, A survey on deep learning for natural language
[28] M. Schuster, K. K. Paliwal, Bidirectional recurrent neural net- processing, Acta Automatica Sinica 42 (10) (2016) 1445–1465.
[53] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation [69] B. H. Soleimani, S. Matwin, Spectral word embedding with neg-
of word representations in vector space, in: Proceedings of the ative sampling, in: Proceedings of the Thirty-Second AAAI Con-
1st International Conference on Learning Representations (ICLR ference on Artificial Intelligence, New Orleans, Lousiana, USA,
2013), Scottsdale, Arizona, USA, 2013. 2018, pp. 5481–5487.
[54] W. Y. Zou, R. Socher, D. Cer, C. D. Manning, Bilingual word em- [70] S. Cao, W. Lu, J. Zhou, X. Li, cw2vec: Learning Chinese Word
beddings for phrase-based machine translation, in: Proceedings Embeddings with Stroke n-gram Information, in: Proceedings of
of the 2013 Conference on Empirical Methods in Natural Lan- the Thirty-Second AAAI Conference on Artificial Intelligence,
guage Processing, Seattle, Washington, USA, 2013, pp. 1393– New Orleans, Lousiana, USA, 2018, pp. 5053–5061.
1398. [71] T. Wada, T. Iwata, Y. Matsumoto, Unsupervised multilingual
[55] J. Pennington, R. Socher, C. Manning, GloVe: Global Vectors word embedding with limited resources using neural language
for Word Representation, in: Proceedings of the 2014 conference models, in: Proceedings of the 57th Annual Meeting of the Asso-
on empirical methods in natural language processing (EMNLP ciation for Computational Linguistics, Florence, Italy, 2019, pp.
2014), Doha, Qatar, 2014, pp. 1532–1543. 3113–3124.
[56] X. Zhou, X. Wan, J. Xiao, Cross-lingual sentiment classification [72] Q. Le, T. Mikolov, Distributed representations of sentences and
with bilingual document representation learning, in: Proceedings documents, in: Proceedings of the 31st International Conference
of the 54th Annual Meeting of the Association for Computational on Machine Learning, Beijing, China, 2014, pp. 1188–1196.
Linguistics (Volume 1: Long Papers), Berlin, Germany, 2016, pp. [73] R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun,
1403–1412. A. Torralba, S. Fidler, Skip-thought vectors, Advances in neural
[57] B. Eisner, T. Rocktaschel, I. Augenstein, M. Bosnjak, S. Riedel, information processing systems (2015) 3294–3302.
emoji2vec: Learning Emoji Representations from their Descrip- [74] B. Felbo, A. Mislove, A. Sogaard, I. Rahwan, S. Lehmann, Using
of
tion, in: Proceedings of The Fourth International Workshop on millions of emoji occurrences to learn any-domain representa-
Natural Language Processing for Social Media, Austin, TX, tions for detecting sentiment, emotion and sarcasm, in: Proceed-
2016, pp. 48–54. ings of the 2017 Conference on Empirical Methods in Natural
ro
[58] D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, M. Zhou, Sentiment Language Processing, Copenhagen, Denmark, 2017, pp. 1616–
embeddings with applications to sentiment analysis, IEEE Trans- 1626.
actions on Knowledge and Data Engineering 28 (2) (2016) 496– [75] J. Devlin, M. Chang, K. Lee, K. Toutanova, Bert: Pre-training
p
509. of deep bidirectional transformers for language understanding,
[59] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for in: Proceedings of the 2019 Conference of the North American
efficient text classification, in: Proceedings of the 15th Confer- Chapter of the Association for Computational Linguistics: Hu-
e-
ence of the European Chapter of the Association for Computa- man Language Technologies, Minneapolis, Minnesota, 2019, pp.
tional Linguistics, Valencia, Spain, 2017, pp. 427–431. 4171–4186.
[60] O. Melamud, J. Goldberger, I. Dagan, context2vec: Learning [76] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Su-
Pr
generic context embedding with bidirectional LSTM, in: Pro- pervised learning of universal sentence representations from nat-
ceedings of the 20th SIGNLL Conference on Computational Nat- ural language inference data, in: Proceedings of the 2017 Con-
ural Language Learning (CoNLL 2016), Berlin, Germany, 2016, ference on Empirical Methods in Natural Language Processing,
pp. 51–61. Copenhagen, Denmark, 2017, pp. 670–680.
[61] L. Yu, J. Wang, K. Lai, X. Zhang, Refining word embeddings [77] Y. Gao, Y. Xu, H. Huang, Q. Liu, L. Wei, L. Liu, Jointly learning
al
for sentiment analysis, in: Proceedings of the 2017 Conference topics in sentence embedding for document summarization, IEEE
on Empirical Methods in Natural Language Processing (EMNLP Transactions on Knowledge and Data Engineering 32 (4) (2020)
2017), Copenhagen, Denmark, 2017, pp. 534–539. 688–699.
n
[62] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, [78] Y. Hao, X. Liu, J. Wu, P. Lv, Exploiting Sentence Embedding
L. Zettlemoyer, Deep contextualized word representations, in: for Medical Question Answering, in: Proceedings of the Thirty-
ur
Proceedings of the 2018 Conference of the North American Third AAAI Conference on Artificial Intelligence, 2019, pp.
Chapter of the Association for Computational Linguistics: Hu- 938–945.
man Language Technologies, New Orleans, Louisiana, USA, [79] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving
Jo
2018, pp. 2227–2237. language understanding with unsupervised learning,technical

[63] B. H. Soleimani, S. Matwin, Fast PMI-Based Word Embedding report (2018).
with Efficient Use of Unobserved Patterns, in: Proceeding of the URL https://cdn.openai.com/research-covers/langu-
Thirty-Third AAAI Conference on Artificial Intelligence (AAAI age-unsupervised/language_understanding_paper.pdf
2019), 2019, pp. 7031–7038. [80] F. Hill, K. Cho, A. Korhonen, Learning distributed representa-
[64] S. Jamee, Z. Fu, B. Shi, W. Lam, S. Schockaert, Word embedding tions of sentences from unlabelled data, in: Proceedings of the
as maximum a posteriori estimation, in: Proceeding of the Thirty- 2016 Conference of the North American Chapter of the Associa-
Third AAAI Conference on Artificial Intelligence (AAAI 2019), tion for Computational Linguistics: Human Language Technolo-
2019, pp. 6562–6569. gies, San Diego, California, 2016, pp. 1367–1377.
[65] B. McCann, J. Bradbury, C. Xiong, R. Socher, Learned in trans- [81] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, , H. Wang,
lation: Contextualized word vectors, in: Proceedings of the ERNIE 2.0: A Continual Pre-Training Framework for Language
Thirty-first Conference on Neural Information Processing Sys- Understanding, in: Proceedings of the Thirty-Fourth AAAI Con-
tems (NIPS 2017), Long Beach CA, USA, 2017, pp. 1–12. ference on Artificial Intelligence (AAAI 2020), 2020, pp. 8968–
[66] P. Xu, A. Madotto, C. Wu, J. Park, P. Fung, Emo2vec: Learn- 8975.
ing generalized emotion representation by multitask training, in: [82] S. Subramanian, A. Trischler, Y. Bengio, C. J. Pal, Learning gen-
Proceedings of the 9th Workshop on Computational Approaches eral purpose distributed sentence representations via large scale
to Subjectivity, Sentiment and Social Media Analysis, Brussels, multi-task learning, in: Proceedings of the Sixth International
Belgium, 2018, pp. 292–298. Conference on Learning Representations, Vancouver, Canada,
[67] J. Howard, S. Ruder, Universal language model fine-tuning for 2018, pp. 1–16.
text classification, in: Proceedings of the 56th Annual Meeting of [83] D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. S. John,
the Association for Computational Linguistics (Volume 1: Long N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y. Sung,
Papers), Melbourne, Australia, 2019, pp. 328–339. B. Strope, R. Kurzweil, Universal sentence encoder for English,
[68] C. Baziotis, N. Athanasiou, A. Chronopoulou, A. Kolovou, in: Proceedings of the 2018 Conference on Empirical Methods
G. Paraskevopoulos, N. Ellinas, S. Narayanan, A. Potamianos, in Natural Language Processing (System Demonstrations), Brus-
NTUA-SLP at SemEval-2018 Task 1: Predicting Affective Con- sels, Belgium, 2018, pp. 169–174.
tent in Tweets with Deep Attentive RNNs and Transfer Learning, [84] M. Pagliardini, P. Gupta, M. Jaggi, Unsupervised learning of sen-
in: Proceedings of the 12th International Workshop on Semantic tence embeddings using compositional n-gram features, in: Pro-
Evaluation, New Orleans, Louisiana, 2018, pp. 245–255. ceedings of NAACL-HLT 2018, New Orleans, Louisiana, 2019,
24 S. Peng, et al.
pp. 528–540. context-and speaker-sensitive dependence for emotion detection

[85] A. Nie, E. D. Bennett, N. D. Goodman, DisSent: Learning sen- in multi-speaker conversations, in: Proceedings of the Twenty-
tence representations from explicit discourse relations, in: Pro- Eighth International Joint Conference on Artificial Intelligence
ceedings of the 57th Annual Meeting of the Association for Com- (IJCAI), 2019, pp. 5415–5421.
putational Linguistics, Florence, Italy, 2019, pp. 4497–4510. [102] N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh,
[86] M. Abdul-Mageed, L. Ungar, Emonet: Fine-grained emotion de- E. Cambria, Dialoguernn: An attentive rnn for emotion detection
tection with gated recurrent neural network, in: Proceedings of in conversations, in: Proceedings of the AAAI Conference on
the 55th Annual Meeting of the Association for Computational Artificial Intelligence, 2019, pp. 6818–6825.
Linguistics, Vancouver, Canada, 2017, pp. 718–728. [103] T. Ishiwatari, Y. Yasuda, T. Miyazaki, J. Goto, Relation-aware
[87] S. Tafreshi, M. Diab, Emotion detection and classification in a graph attention networks with relational position encodings for
multigenre corpus with joint multi-task deep learning, in: Pro- emotion recognition in conversations, in: Proceedings of the 2020
ceedings of the 27th International Conference on Computational Conference on Empirical Methods in Natural Language Process-
Linguistics, Santa Fe, New Mexico, USA, 2018, pp. 2905–2913. ing (EMNLP), 2020, pp. 7360–7370.
[88] D. Kulshreshtha, P. Goel, A. Singh, How emotional are you? [104] D. Zhang, X. Chen, S. Xu, B. Xu, Knowledge aware emotion
Neural architectures for emotion intensity prediction in mi- recognition in textual conversations via multi-task incremental
croblogs, in: Proceedings of the 27th International Conference on transformer, in: Proceedings of the 28th International Conference
Computational Linguistics, Santa Fe, New Mexico, USA, 2018, on Computational Linguistics, Barcelona, Spain, 2020, pp. 4429–
pp. 2914–2926. 4440.
[89] E. Mohammadi, H. Amini, L. Kosseim, Neural feature extraction [105] W. Jiao, H. Yang, I. King, M. R. Lyu, HiGRU: Hierarchical gated
for contextual emotion detection, in: Proceedings of Recent Ad- recurrent units for utterance-level emotion recognition, in: Pro-
of
vances in Natural Language Processing, Varna, Bulgaria, 2019, ceedings of the 2019 Conference of the North American Chapter
pp. 785–794. of the Association for Computational Linguistics: Human Lan-
[90] P. Li, J. Li, F. Sun, P. Wang, Short text emotion analysis based on guage Technologies, Volume 1 (Long and Short Papers), Min-
ro
recurrent neural network, in: Proceedings of the 6th International neapolis, Minnesota, 2019, pp. 397–406.
Conference on Information Engineering, Dalian Liaoning, China, [106] D. Li, Y. Li, S. Wang, Interactive double states emotion cell
2017, pp. 1–5. model for textual dialogue emotion prediction, Knowledge-Based
p
[91] P. Rathnayaka, S. Abeysinghe, C. Samarajeewa, I. Man- Systems 189 (2020) 1–11.
chanayake, M. Walpola, Sentylic at IEST 2018: Gated recurrent [107] J. Li, D. Ji, F. Li, M. Zhang, Y. Liu, HiTrans: A transformer-
neural network and capsule network based approach for Iimplicit based context- and speaker-sensitive model for emotion detection
e-
emotion detection, in: Proceedings of the 9th Workshop on Com- in conversations, in: Proceedings of the 28th International Con-
putational Approaches to Subjectivity, Sentiment and Social Me- ference on Computational Linguistics, 2020, pp. 4190–4200.
dia Analysis, Brussels, Belgium, 2018, pp. 254–259. [108] D. Ghosal, N. Majumder, A. Gelbukh, R. Mihalcea, S. Poria,
Pr
[92] M. Akhtar, A. Ekbal, E. Cambria, How intense are you? Predict- COSMIC: Commonsense knowledge for emotion identification
ing intensities of emotions and sentiments using stacked ensem- in conversations, in: Proceedings of the 2020 Conference on
ble, IEEE Computational Intelligence Magazine (2020) 64–75. Empirical Methods in Natural Language Processing (EMNLP),
[93] E. Batbaatar, M. Li, , K. Ryu, Semantic-emotion neural net- 2020, pp. 2470–2481.
work for emotion recognition from text, IEEE Access 7 (2019) [109] X. Lu, Y. Zhao, Y. Wu, Y. Tian, H. Chen, B. Qin, An iterative
al
111866–111878. emotion interaction network for emotion recognition in conver-

[94] Y. Zhang, J. Fu, D. She, Y. Zhang, S. Wang, J. Yang, Text emotion sations, in: Proceedings of the 28th International Conference on
distribution learning via multi-task convolutional neural network, Computational Linguistics, 2020, pp. 4078–4088.
n
in: Proceedings of the Twenty-Seventh International Joint Con- [110] Q. Li, C. Wu, K. Zheng, Z. Wang, Hierarchical transformer net-
ference on Artificial Intelligence, 2018, pp. 4595–4601. work for utterance-level emotion recognition, Applied Sciences
ur
[95] H. Khanpour, C. Caragea, Fine-grained emotion detection in 10 (13) (2020) 4447.

health-related online posts, in: Proceedings of the 2018 Con- [111] S. Mundra, A. Sen, M. Sinha, S. Mannarswamy, S. Dandapat,
ference on Empirical Methods in Natural Language Processing, S. Roy, Fine-grained emotion detection in contact center chat ut-
Jo
Brussels, Belgium, 2018, pp. 1160–1166. terances, Pacific-Asia Conference on Knowledge Discovery and
[96] Y. Yang, D. Zhou, Y. He, An interpretable neural network with Data Mining (2017) 337–349.
topical information for relevant emotion ranking, in: Proceedings [112] A. Chatterjee, K. N. Narahari, M. Joshi, P. Agrawal, Semeval-
of the 2018 Conference on Empirical Methods in Natural Lan- 2019 task 3: Emocontext: contextual emotion detection in text,
guage Processing, Brussels, Belgium, 2018, pp. 3423–3432. in: Proceedings of the 13th International Workshop on Semantic
[97] B. Kratzwald, S. Ilic, M. Kraus, S. Feuerriegel, H. Prendinger, Evaluation, Minneapolis, Minnesota, USA, 2019, pp. 39–48.
Deep learning for affective computing: Text-based emotion [113] P. Agrawal, A. Suri, NELEC at SemEval-2019 task 3: Think
recognition in decision support, Decision Support Systems 115 twice before going deep, in: Proceedings of the 13th Interna-
(2018) 24–35. tional Workshop on Semantic Evaluation, Minneapolis, Min-
[98] Y. Y. an D. Zhou, Y. He, M. Zhang, Interpretable relevant emotion nesota, USA, 2019, pp. 266–271.
ranking with event-driven attention, in: Proceedings of the 2019 [114] A. Basile, M. Franco-Salvador, N. Pawar, S. Stajner, M. C. Rios,
Conference on Empirical Methods in Natural Language Process- Y. Benajiba, SymantoResearch at SemEval-2019 task 3: Com-
ing and the 9th International Joint Conference on Natural Lan- bined neural models for emotion classification in human-chatbot
guage Processing, Hong Kong, China, 2019, pp. 177–187. conversations, in: Proceedings of the 13th International Work-
[99] D. Ghosal, N. Majumder, S. Poria, N. Chhaya, A. Gelbukh, Di- shop on Semantic Evaluation, Minneapolis, Minnesota, USA,
alogueGCN: A graph convolutional neural network for emotion 2019, pp. 330–334.
recognition in conversation, in: Proceedings of the 2019 Confer- [115] C. Huang, A. Trabelsi, O. R. Zaiane, ANA at SemEval-2019 task
ence on Empirical Methods in Natural Language Processing and 3: Contextual emotion detection in conversations through hier-
the 9th International Joint Conference on Natural Language Pro- archical LSTMs and BERT, in: Proceedings of the 13th Inter-
cessing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 154– national Workshop on Semantic Evaluation, Minneapolis, Min-
164. nesota, USA, 2019, pp. 49–53.
[100] P. Zhong, D. Wang, C. Miao, Knowledge-enriched transformer [116] G. I. Winata, A. Madotto, Z. Lin, J. Shin, Y. Xu, P. Xu, P. Fung,
for emotion detection in textual conversations, in: Proceedings of CAiRE HKUST at SemEval-2019 task 3: Hierarchical attention
the 2019 Conference on Empirical Methods in Natural Language for dialogue emotion classification, in: Proceedings of the 13th
Processing and the 9th International Joint Conference on Natu- International Workshop on Semantic Evaluation, Minneapolis,
ral Language Processing (EMNLP-IJCNLP), Hong Kong, China, Minnesota, USA, 2019, pp. 142–147.
2019, pp. 165–176. [117] S. Bae, J. Choi, S. Lee, SNU IDS at SemEval-2019 task 3:
[101] D. Zhang, L. Wu, C. Sun, S. Li, Q. Zhu, G. Zhou, Modeling both Addressing training-test class distribution mismatch in conver-
sational classification, in: Proceedings of the 13th International [134] C. Coltekin, T. Rama, Tubingen-Oslo at SemEval-2018 task 2:
Workshop on Semantic Evaluation, Minneapolis, Minnesota, SVMs perform better than RNNs at emoji prediction, in: Pro-
USA, 2019, pp. 312–317. ceedings of the 12th International Workshop on Semantic Evalu-
[118] X. Liang, Y. Ma, M. Xu, THU-HCSI at SemEval-2019 task 3: Hi- ation, New Orleans, Louisiana, 2018, pp. 32–36.
erarchical ensemble classification of contextual emotion in con- [135] C. Baziotis, A. Nikolaos, A. Kolovou, G. Paraskevopoulos, N. El-
versation, in: Proceedings of the 13th International Workshop on linas, A. Potamianos, NTUA-SLP at SemEval-2018 task 2: Pre-
Semantic Evaluation, Minneapolis, Minnesota, USA, 2019, pp. dicting Emojis using RNNs with context-aware attention , in:
345–349. Proceedings of the 12th International Workshop on Semantic
[119] J. Xiao, Figure Eight at SemEval-2019 task 3: Ensemble of trans- Evaluation, New Orleans, Louisiana, 2018, pp. 438–444.
fer learning methods for contextual emotion detection, in: Pro- [136] J. Beaulieu, D. A. Owusu, UMDuluth-CS8761 at SemEval-2018
ceedings of the 13th International Workshop on Semantic Evalu- task 2: Emojis: Too many choices?, in: Proceedings of the 12th
ation, Minneapolis, Minnesota, USA, 2019, pp. 220–224. International Workshop on Semantic Evaluation, New Orleans,
[120] D. Li, J. Wang, X. Zhang, YUN-HPCC at SemEval-2019 task 3: Louisiana, 2018, pp. 397–401.
Multi-step ensemble neural network for sentiment analysis in tex- [137] J. Coster, R. G. van Dalen, N. A. J. Stierman, Hatching Chick
tual conversation, in: Proceedings of the 13th International Work- at SemEval-2018 task 2: Multilingual emoji prediction, in: Pro-
shop on Semantic Evaluation, Minneapolis, Minnesota, USA, ceedings of the 12th International Workshop on Semantic Evalu-
2019, pp. 360–364. ation, New Orleans, Louisiana, 2018, pp. 442–445.
[121] W. Ragheb, J. Aze, S. Bringay, M. Servajean, LIRMM-Advanse [138] S. Jin, T. Pedersen, Duluth UROP at SemEval-2018 task 2: Mul-
at SemEval-2019 task 3: Attentive conversation modeling for tilingual emoji prediction with ensemble learning and oversam-
emotion detection and classification, in: Proceedings of the 13th pling, in: Proceedings of the 12th International Workshop on Se-
of
International Workshop on Semantic Evaluation, Minneapolis, mantic Evaluation, New Orleans, Louisiana, 2018, pp. 479–482.
Minnesota, USA, 2019, pp. 251–255. [139] A. Basile, K. W. Lino, TAJJEB at SemEval-2018 task 2: Tradi-
[122] Y. Lee, Y. Kim, K. Jung, MILAB at SemEval-2019 task 3: Multi- tional approaches just do the job with emoji prediction, in: Pro-
ro
view turn-by-turn model for context-aware sentiment analysis, ceedings of the 12th International Workshop on Semantic Evalu-
in: Proceedings of the 13th International Workshop on Semantic ation, New Orleans, Louisiana, 2018, pp. 467–473.
Evaluation, Minneapolis, Minnesota, USA, 2019, pp. 256–260. [140] M. Liu, EmoNLP at SemEval-2018 task 2: English emoji predic-
p
[123] L. Ma, L. Zhang, W. Ye, W. Hu, PKUSE at SemEval-2019 task tion with gradient boosting regression tree method and bidirec-
3: Emotion detection with emotion-oriented neural attention net- tional lstm, in: Proceedings of the 12th International Workshop
work, in: Proceedings of the 13th International Workshop on on Semantic Evaluation, New Orleans, Louisiana, 2018, pp. 387–
e-
Semantic Evaluation, Minneapolis, Minnesota, USA, 2019, pp. 391.
287–291. [141] X. Lu, X. Mao, M. Lan, Y. Wu, ECNU at SemEval-2018 task 2:
[124] S. Ge, T. Qi, C. Wu, Y. Huang, THU NGN at SemEval-2019 task Leverage traditional nlp features and neural networks methods to
Pr
3: Dialog emotion classification using attentional LSTM-CNN, address twitter emoji prediction task, in: Proceedings of the 12th
in: Proceedings of the 13th International Workshop on Semantic International Workshop on Semantic Evaluation, New Orleans,
Evaluation, Minneapolis, Minnesota, USA, 2019, pp. 340–344. Louisiana, 2018, pp. 430–434.
[125] Z. Wang, Y. Zhang, S. Lee, S. Li, G. Zhou, A bilingual attention [142] L. Cao, S. Peng, P. Yin, Y. Zhou, A. Yang, X. Li, A Survey of
network for code-switched emotion prediction, in: Proceedings Emotion Analysis in Text Based on Deep Learning, in: Proceed-
al
of the 26th International Conference on Computational Linguis- ings of the IEEE 8th International Conference on Smart City and
tics, Osaka, Japan, 2016, pp. 1624–1634. Informatization (iSCI 2020), Guangzhou, China, 2020, pp. 81–
[126] X. Zhou, X. Wan, J. Xiao, Attention-based LSTM network for 88.
n
cross-lingual sentiment classification, in: Proceedings of the [143] R. Xu, T. Chen, Y. Xia, Q. Lu, B. Liu, X. Wang, Word embed-
2016 Conference on Empirical Methods in Natural Language ding composition for data imbalances in sentiment and emotion
ur
Processing, Austin, Texas, 2016, pp. 247–256. classification, Cognitive Computation 7 (2015) 226–240.
[127] X. Chen, Y. Sun, B. Athiwaratkun, C. Cardie, K. Weinberger, [144] S. F. Yilmaz, E. B. Kaynak, A. Koc, H. Dibeklioglu, S. S.
Adversarial deep averaging networks for cross-lingual sentiment Kozat, Multi-label sentiment analysis on 100 languages with
Jo
classification, Transactions of the Association for Computational dynamic weighting for label imbalance, IEEE Transactions
Linguistics 6 (2018) 557–570. on Neural Networks and Learning Systems (2021) DOI:
[128] H. Zhou, L. Chen, F. Shi, D. Huang, Learning bilingual senti- 10.1109/TNNLS.2021.3094304.
mentword embeddings for cross-language sentiment classifica- [145] Z. Cao, Y. Zhou, A. Yang, S. Peng, Deep transfer learning mecha-
tion, in: Proceedings of the 53rd Annual Meeting of the Associa- nism for fine-grained cross-domain sentiment classification, Con-
tion for Computational Linguistics and the 7th International Joint nection Science 33 (4) (2021) 911–928.
Conference on Natural Language Processing, Beijing, China, [146] R. Xia, M. Zhang, Z. Ding, Rthn a rnn-transformer hierarchi-
2015, pp. 430–440. cal network for emotion cause extraction, in: Proceedings of the
[129] Y. Feng, X. Wan, Towards a unified end-to-end approach for fully Twenty-Eighth International Joint Conference on Artificial Intel-
unsupervised cross-lingual sentiment analysis, in: Proceedings of ligence (IJCAI-19), 2019, pp. 5285–5291.
the 23rd Conference on Computational Natural Language Learn- [147] S. Peng, L. Cao, Y. Zhou, J. Xie, P. Yin, J. Mo, Challenges and
ing, Hong Kong, China, 2019, pp. 1035–1044. Trends of Android Malware Detection in the Era of Deep Learn-
[130] J. Barnes, R. Klinger, S. Walde, Bilingual sentiment embeddings: ing, in: Proceedings of the IEEE 8th International Conference
joint projection of sentiment across languages, in: Proceedings on Smart City and Informatization (iSCI 2020), Baltimore, MD,
of the 56th Annual Meeting of the Association for Computa- USA, 2020, pp. 37–43.
tional Linguistics (Long Papers), Melbourne, Australia, 2018, pp. [148] S. Peng, G. Wang, D. Xie, Social influence analysis in social net-
2483–2493. working big data: opportunities and challenges, IEEE Network
[131] Z. Ahmad, R. Jindal, A. Ekbal, P. Bhattachharyya, Borrow 31 (1) (2017) 11–17.
from rich cousin: transfer learning for emotion detection us- [149] M. Mahmud, J. Huang, S. Salloum, T. Emara, K. Sadatdiynov,
ing cross lingual embedding, Expert Systems With Applications A survey of data partitioning and sampling methods to support
139 (112851) (2020) 1–12. big data analysis, Big Data Mining and Analytics 3 (2) (2020)
[132] Oxford english and spanish dictionary, emoji. 85–101.
URL https://www.lexico.com/definition/emoji [150] B. Liu, S. Tang, X. Sun, Q. Chen, J. Cao, J. Luo, S. Zhao,
[133] F. Barbieri, J. Camacho-Collados, F. Ronzano, L. Espinosa-Anke, Context-aware social media user sentiment analysis, Tsinghua
M. Ballesteros, V. Basile, V. Patti, H. Saggion, Semeval 2018 Science and Technology 25 (4) (2020) 528–541.
task 2: Multilingual emoji prediction, in: Proceedings of the 12th [151] W. Peng, X. Hong, G. Zhao, Adaptive modality distillation for
International Workshop on Semantic Evaluation, New Orleans, separable multimodal sentiment analysis, IEEE Intelligent Sys-
Louisiana, 2018, pp. 24–33. tems (2021) DOI: 10.1109/MIS.2021.3057757.
26 S. Peng, et al.
[152] S. Peng, A. Yang, L. Cao, S. Yu, D. Xie, Social influence model-

ing using information theory in mobile social networks, Informa-
tion Sciences 379 (2017) 147–159.
[153] J. Wu, N. Wang, Approximating special social influence max-
imization problems, Tsinghua Science and Technology 25 (6)
(2020) 703–711.
of
p ro
e-
Pr
n al
ur
Jo
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
The authors declared that they have no conflicts of interest to this work.
of
ro
-p
re
lP
na
ur
Jo

Journal Pre-Proof: Digital Communications and Networks

Uploaded by

Copyright:

Available Formats

Journal Pre-Proof: Digital Communications and Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Pre-Proof: Digital Communications and Networks

Uploaded by

Copyright:

Available Formats

Journal Pre-proof

A survey on deep learning for textual emotion analysis in social networks

To appear in: Digital Communications and Networks

Received Date: 27 December 2020

journal homepage: www.elsevier.com/locate/dcan

A survey on deep learning for textual

c 2015 Published by Elsevier Ltd.

works (RNNs) [26], Long Short-term Memory (LSTM)

[29], attention [30],and Multi-head Attention (MHA) [31, as follows.

2.2. Classifying Emotion 3.2. Public Opinion Analysis

are an important element of human nature. Such analysis

3.4.1. Financial prediction: 3.6. Healthcare

3.4.2. Election prediction: 3.7. Online Teaching

opinion regarding various policy events (such as personal 4. DL Methodology

income tax adjustment, medical insurance reform, and re-

c1 c2 cn Fig. 2: The framework of DAE

ing result y is then obtained through linear transformation

ciated with n input sequences of tokens n: y = f ( x̄) = S f (W x̄ + by ) (5)

hi = f (wi · hi−1 + bi ) (3) 4.2.3. CNN:

input, convoluting the original input with multiple convo- o

where X lj denotes the output of convolution operation be-

function to sample characteristics of the matrix generated data.

by the convolution operation, and extracts more impor-

ht = σ(W xh xt + Whh ht−1 + bh ) (11) it = σ(Wi xt + Ui ht−1 + bi ) (14)

ot+1 = σ(Why ht + by ) (12) ft = σ(W f xt + U f ht−1 + b f ) (15)

yt-1 yt yt+1 ht-1 ht

Fig. 6: The framework of Bi-LSTM Fig. 7: The framework of GRU

relationship of long-distance and short-distance for tem-

key1 key2 ĂĂ keyn Linear

Fig. 10: The framework of SDA

keys, values, and queries. 5.1.5. Emoji2Vec:

The context2vec model [60] is a learning contextual

learn generic context embedding of wide sentential con-

texts, and can encode the context around a pivot word.

ing bilingual semantic embedding of words across Chi-

5.1.12. Maximum A Posteriori (MAP): 5.2. Sentence-oriented Representation Learning

5.1.15. ULMFit: 5.2.3. DeepMoji:

5.1.16. NTUA SLP: 5.2.4. BERT:

consists of two steps: pre-training and fine-tuning. It is

5.1.18. cw2vec: 5.2.5. InferSent:

Table 1: Comparison of existing word representation learning methods

No. Name Method T asktype Language Year A f f iliation

semi- multi- Allen Institute for Artificial

weighted least unsupervised

13 CoVe [65] LSTM supervised 2017 NA

5.2.7. CAMSE: 5.2.14. DisSent:

5.2.8. OPAI GPT:

masking strategies, which includes basic-level masking,

improve system performance.

Table 2: Comparison of existing sentence representation learning methods

No. Name Method T asktype Language Year A f f iliation

OPAI GPT semi- multi-

unsupervised multi- University of Cambridge, et

10 ERNIE [81] Transformer 2019 Baidu

6.2. Text Conversation-oriented Monolingual Emotion

tion in conversation based on the Bi-GRU. DialogueGCN

Table 3: Comparison of existing monolingual emotion analysis models

No. Name DL method Pre-training Year Dataset Accuracy

Flower, TEC, Tales- 0.848, 0.511,