Buying or Berowsing
Buying or Berowsing
Buying or Berowsing
1984
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
software on mobile devices, we take advantage of the sensors and superiority of DIPN in predicting purchasing intent. In par-
accelerometers of the mobile devices to automatically glean the ticular, DIPN has been deployed in the operational system
real-time context of user interactions, such as the swipe and tap of Taobao and adopted in the coupon allocation task at a
actions. Compared with the browse-interactive actions, the touch- shopping festival. Online A/B testing shows the benefits of
interactive actions occur more frequently. As shown in Table 1, knowing users’ real-time purchasing intent.
the number of swipe actions and tap actions generated per user
The rest of the paper is organized as follows. We discuss related
per day are 37.7 times and 9.3 times more than that of the browse-
work in Section 2, followed by the data description in Section 3. We
interactive actions, respectively. As a result, the touch-interactive
describe the design of DIPN model in Section 4 and give an overview
behavior contains more rich information about user behavior pat-
of the deployment of DIPN in Section 5. We present experiments in
terns. For example, we find that some customers would browse
Section 6 and conclude the paper in Section 7.
the product comments for a long time before they place the order.
Such typical patterns can be easily captured by using the touch-
interactive behavior. By combing the traditional browse-interactive 2 RELATED WORK
behavior and the new touch-interactive behavior, we are able to 2.1 Purchasing Intent Prediction
model the user behavior patterns more comprehensively.
The problem of purchasing intent prediction has been heavily stud-
However, there exist several challenges in predicting users’ real-
ied, with a variety of classic machine learning and deep learning
time purchasing intent. First, the touch-interactive behavior con-
modelling techniques employed. The earliest work come from the
tains less semantic information than the browse-interactive behav-
RecSys 2015 challenge [2], which provides a public dataset con-
ior. Therefore, it is challenging to extract useful features from these
sisting of 9.2 million user-item click sessions. Given a session, the
data to improve the prediction performance. Second, it is necessary
goal of the challenge is to predict whether the user is going to
to figure out an effective fusion mechanism to combine the browse-
buy something or not within this session. Romov et al. [15] won
interactive behavior and the touch-interactive behavior in order
the competition using GBM with extensive feature engineering on
to bring their advantages into full play. Third, due to the complex-
session summarizing. The other feature-based work includes the
ity of the browsing behavior where the customers with different
ensemble model with neural net and GBM used by [23] and the
purchasing intent can appear to be very similar, it is essential to
deep belief networks and stacked denoising auto-encoders by [22].
capture common features that can well depict the customers and
To reduce the feature engineering work, several work [19, 20, 25]
unique features that would lead to different purchasing behavior.
adopt the recurrent neural network (RNN) to model the sequence
In this paper, we propose a novel end-to-end deep network,
nature of sessions, where a bi-directional LSTM is used in [19, 25]
named Deep Intent Prediction Network (DIPN), for the real-time
and a mixture of LSTM is used in [20].
purchasing intent prediction. In DIPN, the user behavior features
Our work is distinguished from previous work in the following
are automatically learned from the raw data without the need of ex-
aspects. First, given a history session, our goal is to predict a user’s
tensive feature engineering. In particular, we propose a hierarchical
subsequent purchasing behavior within a given time, while the goal
attention mechanism to fuse the views extracted from different in-
of previous work is to predict the purchasing behavior within the
teractive behavior sources. In the bottom attention layer, we design
session. Our setting is more realistic because in reality we should
an intra-view attention mechanism which focuses on the inner parts
predict the future behavior based on the current incomplete session.
of the behavior sequence. In the top attention layer, we propose
Second, the key difference of our work is that we collect touch-
an inter-view attention mechanism that learns the inter-view rela-
interactive actions to capture the real-time user behavior patterns.
tions between different behavior sequences. In addition, we propose
As a result, we need to handle several data sources in our model
to train the real-time and long-term purchasing intent simultane-
while previous work only deal with a single source.
ously with the same model. With the multi-task learning, DIPN
can capture common features that well depict the customers and
unique features that would lead to different purchasing behavior. 2.2 Sequence Classification
The contribution of the paper can be summarized as follows: The task of purchasing intent prediction is closely related to se-
quence classification. A brief survey by [26] categorizes the se-
• We collect a new type of user behavior, the touch-interactive quence classification methods into three groups: feature based
behavior, which contains rich information about user behav- methods [1, 13, 29], sequence distance based methods [10, 17, 24],
ior patterns. Together with the traditional browse-interactive and model based methods [4, 27, 31]. Our work is related to the
behavior, we are able to depict a user from different views model based approach, where we use an end-to-end deep network
for better performance of purchasing intent prediction. to model the sequences and save extensive feature engineering
• We propose a deep network DIPN for real-time purchasing work. Our work is also related to sentence classification in natural
intent prediction. A novel hierarchical attention mechanism language processing [9, 11, 30]. Text sentences and time series data
is proposed to fuse multiple views extracted from different are similar to each other in that they are both ordered sequences
interactive behavior sources. In addition, multi-task learning in nature. However, the semantic information contained in these
is introduced to better distinguish user behavior patterns. two kinds of sequences are definitely different. Our work differs
• We conduct extensive experiments to evaluate the perfor- from the previous work in that we need to handle several different
mance of DIPN in both offline and online settings. Experi- data sources with different formats while in traditional sequence
mental results on a large-scale industrial dataset shows the classification, the data usually comes from a single source.
1985
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
2.3 Multi-task Learning Table 2a shows an example of raw data in the swipe-interactive
Multi-task learning has been used successfully across various appli- behavior. A user’s swipe-interactive track is a time sequence of
cations of machine learning, from natural language processing [3, 5] actions, consisting of these four basic types of actions. Each action
and speech recognition [6] to computer vision [8] and recommender has a timestamp and a page index to identify when and where the
systems [14]. By sharing representations between related tasks, action occurs. In addition, the positional coordinates of the action
multi-task learning can enable the model to capture more under- on the touch screen are also recorded. The duration presents how
lying factors and generalize better on its original task. Ruder [16] long the action lasts. As shown in Table 3a, for each action at a data
presents an overview of multi-task learning in deep learning, where point, we extract 14 raw features. The time duration of a swipe,
multi-task learning is typically done with either hard or soft param- time gap between two actions and positional coordinates of actions
eter sharing of hidden layers. The hard parameter sharing method are continuous variables. Page indices, action indices and swipe
is the most commonly used multi-task learning approach, which directions (i.e., left/right and up/down) are categorical variables.
shares the hidden layers between all tasks and keeps several task- We conduct discretization on all the raw features to ensure unified
specific output layers. Collobert et al. [5] simultaneously learn inputs for DIPN. The discretization of the continuous variables is
several NLP tasks using a language model with embedding lookup described as follows:
table sharing. In [8], multi-task learning is adopted to improve the • Position. The positional coordinates of actions are continu-
performance of classifying object proposals using deep convolu- ous values, and are discretized according to the resolution
tional networks. Ni et al. [14] use deep multi-task representation of the touch screen. We divide the width of the screen into
learning to generate user representations for personalization in 17 uniform segments, and the length into 25 segments for
e-commerce portal. In soft parameter sharing, each task has its own one-hot vectors encoding.
model with its own parameters where the distance between the • Swipe Length. The length of a swipe is encoded into a one-
parameters is regularized. Duong et al. [7] uses l 2 distance for reg- hot vector, the length of which is as twice as the length
ularization while Yang et al. [28] use the trace norm. Our model is of the one-hot vectors of position encoding. The reason of
related to the hard parameter sharing method. We propose a novel applying twice length is that, for a swipe track, we consider
way by partitioning a user’s purchasing intent into three different the direction of the swipe.
phases and use multi-task learning to learn the unique behavior • Time Gap and Duration. We apply a step function to en-
that would lead to different purchasing intent. code time gaps between actions and swipe duration as follow:
To the best of our knowledge, our work is the first study that uses
the attention-based deep network with multi-learning on multiple
⌊x/fs ⌋, x < fb
user behavior sequences for real-time purchasing intent prediction.
y = ⌊x/fb + 9⌋, fb ≤ x < 10 × fb
19, x ≥ 10 × fb
3 DATASET
We build two types of user interactive behavior dataset, i.e., the new where { fs = 100, fb = 1000} are used for time gap, and
touch-interactive behavior and the traditional browse-interactive { fs = 25, fb = 250} are used for time duration.
behavior. In the following, we describe each dataset in details.
The tap-interactive behavior. This behavior records the in-
formation associated with the tap actions, as shown in Table 2b. A
3.1 Touch-interactive Behavior user’s tap-interactive track is a time sequence of tap actions. Each
The touch-interactive behavior dataset contains normal users’ daily action has a timestamp and page index to identify when and where
touch-interactive information when using the Taobao app, which the action occurs. There is also an event id to identify whether a
is composed of the swipe-interactive and tap-interactive behavior. user taps on a page or a button. If a button is tapped, the button
The swipe-interactive behavior. This behavior includes four name is also recorded. As shown in Table 3b, we extract 3 raw
types of basic actions, i.e., Open Page, Leave Page, Swipe and Tap. features, all of which are categorical variables.
1986
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
1987
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
…
…
…
…
…
…
…
…
…
…
…
…
…
Figure 1: The model architecture of DIPN.
→
− ←− →
− ← −
feature can then be obtained as Ebut t on · Bbut t on ∈ Rne , where h t and backward hidden states h t , i.e., ht = [ h t , h t ]. In this way,
Bbut ton ∈ Rnb is the one-hot vector of the Button Index feature. a behavior sequence is represented as h = {h 1 , h 2 , ..., hn } ∈ Rn×2d ,
The length of every embedded feature is shown in the Embedding where d is the dimension of the hidden state .
Dim column of Table 3. At last, for each feature group, all the In DIPN, we need to handle three types of behavior sequences,
embedded features are concatenated into a vector and fed into a i.e., the swipe-interactive sequence, the tap-interactive sequence
fully-connected layer for reshape. During the training process, the and the browse-interactive sequence. There are two ways to fuse
embedding layer is trained at the same time with the model. these sequences: early fusion and late fusion. The early fusion
refers to aligning the three sequences by timestamp before feeding
4.2 RNN Layer them into a single GRU model, while the late fusion refers to first
The user interactive behavior used in DIPN are all time sequence of feeding each sequence to a separate GRU model and then concate-
actions. Therefore, we use RNN to model the long-term dependen- nating the output hidden features. One disadvantage of the early
cies between actions. The adoption of RNN can eliminate the need fusion method is that the behavior sequences usually have differ-
for extensive feature engineering, which is very helpful because it ent densities, as shown in Table 1. When aligning the sequences
is difficult to extract features from the touch-interactive behavior by timestamp, dense sequence could dominate the concatenated
composed of swipe or tap actions with little semantic information. feature space and override the effects of sparse but important se-
To avoid the vanishing gradient problem suffered by the standard quence. In addition, since the length of GRU model is limited, the
RNN, LSTM and GRU are proposed to control the update of the early fusion method would result in information loss of the dense
information via gates. We take GRU to model the dependency be- sequence when truncating the sessions. Therefore, we propose to
cause GRU is faster than LSTM and more suitable for e-commerce use the late fusion method and feed the three behavior sequences to
system. The formulations of GRU are listed as follows: separate Bi-GRU models, as shown in Figure 1. After the RNN layer,
we get three hidden outputs, i.e., hs = {hs1 , hs2 , ..., hsn } ∈ Rn×2d ,
r t = σ (Wer et + Whr ht −1 + br ) ht = {ht 1 , ht 2 , ..., ht n } ∈ Rn×2d and hb = {hb1 , hb2 , ..., hbn } ∈
zt = σ (Wez et + Whz ht −1 + bz ) Rn×2d , corresponding to the swipe-interactive, tag-interactive and
(1) browse-interactive sequence, respectively.
h̃t = tanh(Weh et + Whh (r t ⊙ ht −1 ) + bh )
ht = zt ⊙ ht −1 + (1 − zt ) ⊙ h̃t .
4.3 Hierarchical Attention Layer
where et is embedding vector of the t-th action, ht is the t-th hidden To better fuse the views extracted from different behavior sequences,
states, σ is the sigmoid function and ⊙ is the element-wise product we propose a hierarchical attention mechanism, where the bottom
operator. To better capture the global information of the behavior se- attention layer focuses on the inner parts of each behavior sequence
quences, we adopt a bidirectional recurrent layer composed of two while the top attention layer learns the inter-view relations between
GRU layers working in opposite directions. We obtain the represen- different behavior sequences, as shown in Fig. 1. In the following,
tation of the t-th action by concatenating the forward hidden state we introduce the hierarchical attention mechanism in details.
1988
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
vs v T where D is the training set with size N , x is the input of the net-
Ab (vs , vb , vb ) = softmax( √ b )vb ,
2d work and y is the label, ps (x) and pl (x) represents the predicted
1989
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
protect the user privacy, because only the prediction scores, rather
than the features capturing behavior patterns, are sent to the cloud.
6 EXPERIMENTS
In this section, we present a comprehensive evaluation of the perfor-
mance for DIPN. We first introduce the experimental setup and then
present the experimental results under various settings. Finally, we
share a case study for online serving.
1990
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
Table 4: Comparison of different models. Impact of different sources. In this paper, DIPN predicts the
real-time purchasing intent by utilizing multiple data sources si-
Model AUC multaneously. To better understand the role that different data
GBDT 0.7871 sources play, we conduct two types of tasks using DIPN. The first
RNN+DNN 0.7902 task is to predict the purchasing intent without each data source,
DIPN-early-fusion 0.7708 while the second task only uses one single data source for predic-
DIPN-no-attention 0.8345 tion. The results are shown in Table 6. We can see that each data
DIPN-no-inter-view-attention 0.8367 source gives a positive impact on the performance of DIPN. The
DIPN-no-intra-view-attention 0.8401 user profile feature performs worst in the second task, which is as
DIPN-no-multi-task 0.8371 expected because it only provides basic information about a user.
DIPN 0.8429 However, it increases AUC by about 0.5% when used together with
other data sources, because it improves personalization in DIPN.
The tap-interactive behavior plays a more significant role than the
Table 5: Impact of multi-task learning. other behavior. The reason is that it captures more real-time behav-
ior patterns compared with the browse-interactive behavior and
AUC(real-time) AUC(long-term) contains more rich semantic information compared with the swipe-
DIPN-no-multi-task 0.8371 0.8204 interactive behavior. It should be noted that the user history feature
DIPN 0.8429 0.8276 also contributes a lot to the performance of DIPN, demonstrating
that the activeness of users has a great impact on their purchasing
behavior. As shown, by utilizing all the data sources listed in our
Table 6: Impact of different sources.
paper, DIPN gains about 18.96% improvement on AUC than the
Task1 AUC Task2 AUC baseline only using traditional user behavior sequences.
DIPN w/o profile. 0.8381 DIPN w/ profile. 0.5419 6.3 Online A/B Testing
DIPN w/o history. 0.7862 DIPN w/ history. 0.7335
Coupon allocation is an important strategy for improving the Gross
DIPN w/o browse. 0.8303 DIPN w/ browse. 0.6533
Merchandise Volume (GMV) on e-commerce platforms. In this sec-
DIPN w/o swipe. 0.8287 DIPN w/ swipe. 0.6742
tion, we introduce a new coupon allocation strategy based on the
DIPN w/o tap. 0.7978 DIPN w/ tap. 0.7418
real-time purchasing intent predicted by DIPN in online traffic of
Taobao. The online A/B testing was conducted at “Double 11” in
2018, which is a shopping festival in China, similar as the “Black
TensorFlow with 1 parameter server and 100 workers. The metric
Friday” in America.
used in our experiments is Area Under the Curve (AUC), which is
We choose a coupon with 10 RMB nominal value for our testing
insensitive to class imbalance and suitable to our experiments.
and set three coupon allocation strategies to compare the perfor-
mance, defined as follows:
6.2 Experimental Results
• All-allocation Strategy where everyone in this bucket is
Results of different models. Table 4 shows the performance of
selected to get this coupon.
the evaluated models. We have the following observations. (1) DIPN
• Non-allocation Strategy where everyone in this bucket is
outperforms the baseline methods GBDT and RNN by a significant
not selected to get this coupon.
margin about 5.6% and 5.3% in terms of AUC, respectively. The
• Model-allocation Strategy which uses the score predicted
improvement of DIPN over GBDT and RNN reveals the value of
by DIPN and the fixed thresholds to decide the allocation.
adopting the touch-interactive behavior to depict users from dif-
ferent views. (2) The early fusion manner is not appropriate for The users selected to get this coupon will be pushed a popup in
fusing views from different data sources. We can see that DIPN- Taobao’s mobile application.
early-fusion performs worst among the compared models. The We use the usage rate of coupons Rc , and the GMV improvement
reason is that the early fusion method could result in the imbal- per coupon Iдmv as evaluation metrics, defined as follows:
ance of different views and information loss. (3) The hierarchical Nwb
attention mechanism plays an important role in DIPN. As shown, Rc = , (7)
Nb
DIPN-no-inter-view-attention and DIPN-no-intra-view-attention
where Nwb is the number of users who have used this coupon to
are superior to DIPN-no-attention but inferior to DIPN. This proves
buy something in a bucket b, and Nb is the total number of users
that the intra-view attention and inter-view attention mechanism
who have got this coupon in b.
are effective in identifying important actions within a view and
discovering useful asynchronous interactions between views, re-
(Gb − NNnon
b
G non ) Nnon Gb − Nb G non
spectively. (4) Prediction performance can be further improved by Iдmv = = , (8)
utilizing multi-task learning. Table 5 shows the results of DIPN with Nwb Nnon Nwb
or without multi-task learning. As shown, the performance of real- where Gb is the total GMV of users in a bucket b, Nb is the number
time and long-term purchasing intent prediction can be improved of users in b, G non and Nnon are the total GMV of users and the
by 0.6% and 0.7% when using multi-task learning, respectively. number of users in the non-allocation strategy bucket, respectively.
1991
Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA
Table 7: The results of different coupon allocation strategies. [3] Yangbin Chen, Yun Ma, Xudong Mao, and Qing Li. 2019. Multi-Task Learning
for Abstractive and Extractive Summarization. Data Science and Engineering 4, 1
(01 Mar 2019), 14–23.
Num. of Users Rc Iдmv [4] Betty Yee Man Cheng, Jaime G. Carbonell, and Judith Klein-Seetharaman. 2005.
Protein classification based on text document classification techniques. Proteins:
None-allocation 1.38M / 0 Structure, Function, and Bioinformatics 58, 4 (2005), 955–970.
All-allocation 10.35M 40.4% Ia [5] Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Lan-
Model-allocation 1.22M 57.0%(+41.1%) I m (+39.8%) guage Processing: Deep Neural Networks with Multitask Learning. In Proceedings
of the 25th International Conference on Machine Learning. 160–167.
[6] L. Deng, G. Hinton, and B. Kingsbury. 2013. New types of deep neural network
learning for speech recognition and related applications: an overview. In 2013
We hypothesize that the users with a very low purchasing intent IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Low Resource
are hard to change their mind because of this coupon, while the Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network
users with a high purchasing intent are not needed to be given Parser. In ACL-IJCNLP. 845–850.
this promotion. Therefore, we set the lowest threshold tl = 0.2 [8] Ross Girshick. 2015. Fast R-CNN. In Proceedings of the 2015 IEEE International
Conference on Computer Vision (ICCV). 1440–1448.
and the uppermost threshold tu = 0.4. The users whose real-time [9] Long Guo, Dongxiang Zhang, Lei Wang, Han Wang, and Bin Cui. 2018. CRAN:
purchasing intent score given by DIPN is between tl and tu are A Hybrid CNN-RNN Attention-Based Model for Text Classification. In 37th
selected to get this coupon in the model-allocation strategy bucket. International Conferencel on Conceptual Modeling. 571–585.
[10] Eamonn J. Keogh and Michael J. Pazzani. 2000. Scaling Up Dynamic Time
As shown in Table 7 2 , there are more than 12.95 millions of users Warping for Datamining Applications. In Proceedings of the Sixth ACM SIGKDD
in this online A/B testing. It is notable that the model-allocation International Conference on Knowledge Discovery and Data Mining. 285–289.
[11] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification.
strategy contributes up to 41.1% Rc and 39.8% Iдmv promotion CoRR abs/1408.5882 (2014).
compared with the all-allocation strategy in this large scale online [12] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
traffic. The reason is that DIPN can help allocation system to under- mization. CoRR abs/1412.6980 (2014).
[13] Neal Lesh, Mohammed J. Zaki, and Mitsunori Ogihara. 1999. Mining Features for
stand user’s real-time purchasing intent and allocate the coupon to Sequence Classification. In Proceedings of the Fifth ACM SIGKDD International
the right person at the right time. Compared with the all-allocation Conference on Knowledge Discovery and Data Mining. 342–346.
strategy, a reasonable allocation strategy relied on DIPN would [14] Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo
Si. 2018. Perceive Your Users in Depth: Learning Universal User Representa-
result in a significant GMV improvement. tions from Multiple E-commerce Tasks. In Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. 596–605.
7 CONCLUSION [15] Peter Romov and Evgeny Sokolov. 2015. RecSys Challenge 2015: Ensemble
Learning with Categorical Features. In RecSys ’15 Challenge. Article 1, 4 pages.
In this paper, we propose DIPN, a novel attention-based deep net- [16] Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural
Networks. CoRR abs/1706.05098 (2017).
work with multi-task learning, for real-time purchasing intent pre- [17] Rong She, Fei Chen, Ke Wang, Martin Ester, Jennifer L. Gardy, and Fiona S. L.
diction. Different from previous work, we collect a new type of user Brinkman. 2003. Frequent-subsequence-based Prediction of Outer Membrane
interactive behavior, i.e., the touch-interactive behavior, to capture Proteins. In Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. 436–445.
comprehensive user behavior patterns. In order to fuse multiple [18] Humphrey Sheil and Omer Rana. 2018. Classifying and Recommending Using
user interactive behavior effectively, we propose a hierarchical at- Gradient Boosted Machines and Vector Space Models. In Advances in Computa-
tention mechanism including intra-view attention and inter-view tional Intelligence Systems. 214–221.
[19] Humphrey Sheil, Omer Rana, and Ronan Reilly. 2018. Predicting purchasing
attention. In addition, we use multi-task learning to train DIPN intent: Automatic Feature Learning using Recurrent Neural Networks. CoRR
to better distinguish user behavior patterns. We conduct exten- abs/1807.08207 (2018).
[20] Arthur Toth, Louis Tan, Giuseppe Di Fabbrizio, and Ankur Datta. 2017. Predicting
sive experiments on a large-scale industrial dataset to evaluate the Shopping Behavior with Mixture of RNNs. In ACM SIGIR Forum.
performance of DIPN. Experimental results show the superiority [21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
of DIPN under various settings. In particular, online A/B testing Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All
you Need. In Advances in Neural Information Processing Systems 30. 5998–6008.
results reveal the potential of knowing users’ real-time purchasing [22] Armando Vieira. 2015. Predicting online user behaviour using deep learning
intent, which would result in a significant GMV improvement in algorithms. CoRR abs/1511.06247 (2015).
the e-commerce platforms. [23] Maksims Volkovs. 2015. Two-Stage Approach to Item Recommendation from
User Sessions. In RecSys ’15 Challenge. Article 3, 4 pages.
[24] Li Wei and Eamonn Keogh. 2006. Semi-supervised Time Series Classification.
8 ACKNOWLEDGMENTS In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. 748–753.
This work is supported in part by the National Natural Science [25] Zhenzhou Wu, Bao Hong Tan, Rubing Duan, Yong Liu, and Rick Siow Mong Goh.
Foundation of China under Grant No. 61702016, 61832001, 61572039, 2015. Neural Modeling of Buying Behaviour for E-Commerce from Clicking
Patterns. In RecSys ’15 Challenge. Article 12, 4 pages.
the National Key Research and Development Program of China (No. [26] Zhengzheng Xing, Jian Pei, and Eamonn Keogh. 2010. A Brief Survey on Sequence
2018YFB1004403). Classification. SIGKDD Explor. Newsl. 12, 1 (Nov. 2010), 40–48.
[27] Oksana Yakhnenko, Adrian Silvescu, and Vasant Honavar. 2005. Discriminatively
Trained Markov Model for Sequence Classification. In Proceedings of the Fifth
REFERENCES IEEE International Conference on Data Mining. 498–505.
[1] Charu C. Aggarwal. 2002. On Effective Classification of Strings with Wavelets. [28] Yongxin Yang and Timothy M. Hospedales. 2016. Trace Norm Regularised Deep
In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Multi-Task Learning. CoRR abs/1606.04038 (2016).
Discovery and Data Mining. 163–172. [29] Lexiang Ye and Eamonn Keogh. 2009. Time Series Shapelets: A New Primitive
[2] David Ben-Shimon, Alexander Tsikinovsky, Michael Friedmann, Bracha Shapira, for Data Mining. In KDD. 947–956.
Lior Rokach, and Johannes Hoerle. 2015. RecSys Challenge 2015 and the YOO- [30] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level Convolu-
CHOOSE Dataset. In Proceedings of the 9th ACM Conference on Recommender tional Networks for Text Classification. In Proceedings of the 28th International
Systems. 357–358. Conference on Neural Information Processing Systems - Volume 1. 649–657.
[31] Yi Zhao, Yanyan Shen, and Yong Huang. 2019. DMDP: A Dynamic Multi-source
2 As the sensitive data policy, the Iдmv of the all-allocation strategy and the model- Default Probability Prediction Framework. Data Science and Engineering 4, 1 (01
allocation strategy have been replace as I a and I m . Mar 2019), 3–13.
1992