Barbosa - Predicting Portuguese Steam Review Helpfulness Using Artificial Neural Networks (2016)
Barbosa - Predicting Portuguese Steam Review Helpfulness Using Artificial Neural Networks (2016)
Barbosa - Predicting Portuguese Steam Review Helpfulness Using Artificial Neural Networks (2016)
287
digital music, computer software and digital movies. Steam 2.2 Formative features of review helpfulness
was chosen because of its innovative review system and com- Evidences support the theoretical conceptualization of re-
munity model. view helpfulness as a formative construct [16]. Thus, it is
The main contributions of this work are a model for brazil- understood that there are a number of features that influ-
ian portuguese reviews and the investigation on the impact ence the perception review helpfulness.
of social features, present in social networks like Steam, in Some past works identified the direct influence of the re-
the perception of review helpfulness, along with features al- view content into its helpfulness perception. With the help
ready studied in previous works, such as linguistic features. of a supervised machine learning, Kim et al. (2006) [11]
The rest of this paper is organized as follow: Section 2 found that text size, its unigrams and the product type are
presents a literature review. Section 3 presents the proposed important features to measure the perception of review help-
model. Section 4 shows the results of the experiments. Sec- fulness. Danescu-Niculescu-Mizil et al. (2009) [6] conducted
tion 5 presents the conclusion and final considerations. a study on the Amazon corpus and identified that a review is
perceived as more helpful when its rating is close to the av-
erage rating for the same product. Otterbacher (2009) [18]
2. LITERATURE REVIEW analized data from the website Amazon and determined that
the perception of review helpfulness can be manifested by
2.1 The perception review helpfulness the relevance of the topic, readability and the author’s cred-
In a voting system as the one proposed by Amazon, the ibility and objectivity. Furthermore, it was discovered a
review helpfulness can be defined as strong relationship between the chronological order of the
reviews their helpfulness. Schindler and Bickart (2012) [19]
np
H= (1) examined the content of the review features and found that
np + nn the text size and the amount of statements positively influ-
where np represents the number of positive votes and nn ence the perception of helpful reviews, but only to a certain
represents the number of negative votes [11]. While this extent. Using neural networks, Lee and Choeh (2014) [15]
approach is simple and good enough, it presents some un- found that the text size and the number of words with only
favorable trends, as the lack of votes for new reviews [16] one letter are good indicative of the review helpfulness per-
and the fact that not everyone that reads reviews actually ception.
votes on them [11]. For that reason, the most-voted reviews Some scholars also explored the influence of the declara-
are not necessarily an accurate representation of the most tion of authorship on the perception of reviews helpfulness.
helpful ones. Connors et al. (2011) [4] studied some basic factors asso-
Another problem of that approach is that it does not take ciated with review helpfulness. They realized that the au-
into account the amount of votes received by each review. thor’s expertise, i.e. the knowledge of the author about the
For example, it is impossible to distinguish two reviews with product domain, has positive influence on the perception of
helpfulness 0.8 (or 80%), even if one of them has 10000 votes reviews helpfulness. Forman et al. (2008) [7] discovered that
and the other one has just 10. Mathematically, this prob- the presence of descriptive information about the identity of
lem was already solved in 1927 by Edwin B. Wilson using the author of online reviews has a positive impact on the
the lower bound of Wilson score confidence interval for a perception of review helpfulness.
Binomial parameter [1] Another point of interest is the type of product being re-
viewed and the impact it causes in the perception of review
zα2 q helpfulness. Sen and Lerman (2007) [20] investigated how
p̂ + 2n/2 ± zα /2 [p̂(1 − p̂) + zα2 /2 /4n]/n consumers evaluate the review helpfulness of positive and
H= (2) negative reviews and found that the product type moderates
(1 + zα2 /2 /n)
the effect of review valence. It was noted that, for utilitarian
products, readers are more likely to attribute the negative
where p̂ represents the percentage of observed positive votes,
opinion expressed in the review to the author’s external rea-
n is the total of received votes and z quantile of the standard
sons. As to hedonic products, negative assessments are as-
normal distribution.
signed to internal reasons. In the same topic, Mudambi and
This equation considers the amount of votes of a review
Schuff (2010) [17] found that, for experience goods, reviews
as a sample of a hypothetical population where all reviews
with extreme ratings are less useful than reviews with mod-
are equally voted. Assuming a set confidence level (95% in
erate ratings. The depth of the review also has a positive
this work), it is determined a new success probability for
influence on the perception of usefulness, being stronger to
the analyzed distribution. Some websites such as Reddit 3
search goods.
and Yelp 4 already use this equation to order their topics,
This paper presents a different approach by analyzing of
comments and reviews.
the helpfulness of reviews written in Brazilian Portuguese
In this paper, review helpfulness is the percentage of pos-
and by exploiting different review features from previous
itive votes of a review based on Equation (2). Formally, the
works, such as the consumer engagement on the online com-
review helpfulness measure can be described as the extent to
munity and past reviews published by the same consumer.
which consumers perceive their capability to facilitate judg-
These features can be easily extracted thanks to the virtual
ment or purchase decisions [16].
community model proposed by Steam.
3
http://www.redditblog.com/2009/10/reddits-new-
comment-sorting-system.html
4
http://officialblog.yelp.com/2011/02/the-most-romantic- 3. PROPOSED MODEL
city-on-yelp-is.html For a given review, the objective of this work is to find the
288
measurement of this review helpfulness, called H, that can
also be defined by the Equation (2), where H is a real num-
ber between 0 and 1. To develop a model that predicts this
measurement, it was defined a set of formative features of
review helpfulness. Based on this study, it was hypothesized
that there are three important groups of review features that
model the review helpfulness: features based on the author-
ship, textual features and features based on review meta-
data. The proposed model is given by a regression function
that has as input a X vector containing those features and
as output a scalar H that is the perception of review help-
fulness measurement. It was chosen an ANN MLP to ap-
proximate this function because MLP are universal function
approximators. An ANN for function approximation can be
described as follow:
n1
X
y(x1 , x2 , ..., xn ) = λi gj (ui ) (3) Figure 1: Example of Steam user profile
i=1
n
X
ui = Wij xj − θi (4)
j=1
289
can be analyzed using the Flesh-Kincaid readability test, a
Table 1: Linguistic patterns mathematical method that evaluates how readable is a text
NOUN ADV? V? (ADV? ADJ | ADV V)*
by its average of words per sentence and average of syllables
ADJ NOUN
per total words [13]. An adaptation to the Portuguese was
ADV ADJ
published by Squarisi and Salvador (2005) [23]
nw
+ np ∗ 0.4 (6)
University of Sao Paulo8 . After the data cleaning, it was ns
obtained a final sample containing 5,823 reviews that were
published between November 2013 and February 2016. where nW is the total number of words of text, ns is the
total number of sentences and np is the total number of
3.2 Review features polysyllables. The constant 0.4 is the average number of
letters of the word in the phrase in Portuguese. The higher
3.2.1 Features concerning authorship the score, the less readable is the text. A text score 1 can
be easily read by anyone.
There are two important features regarding authors: rep-
Based on discoveries of Kim et al. (2006) [11] and Lee
utation and expertise. Studies suggest that reviews submit-
et al. (2014) [15] , it was modeled the text size from the
ted by authors with positive history are seen as more help-
number of words, number of sentences and the number of
ful [25]. The reputation incorporates aspects of the author’s
monosyllabic text, as the Portuguese has not large amount
credibility and expertise indicates the level of knowledge of
of semantically relevant words with only one letter.
an author about the discussed topic.
The author reputation is modeled from three variable: the 3.2.3 Review Metadata
average number of reviews votes (total number of votes from
The final evaluation and the posting date of the review are
past reviews divided by total number of reviews), the aver-
considered metadata. Final evaluation refers to the binary
age number of positive votes (total of positive votes from
evaluation of product present in each review, which may be
past reviews divided by the total number of votes in past
“recommended” or “not recommended”. It is important to
reviews); and the number of friends a user has in the Steam
note that readers may be influenced by the average ratings of
community. The expertise is modeled by the amount of
the product. This idea was explored by Danescu-Niculescu-
hours a review author played the analyzed game. For ex-
Mizil et al. (2009) [6].
ample, an author who played only two hours of a particular
Considering an evaluation review as being 1 for “recom-
game may have less proficiency in the topic than one that
mended” (or positive) and 0 for “not recommended” (or neg-
played 20 hours.
ative), the evaluation is modeled as an expression where p is
3.2.2 Features regarding the textual content of re- the percentage of reviews with positive evaluations (in dec-
views imal representation) for a product and x is the individual
evaluation review for the same product.
Fundamentally, previous researches dedicated themselves
to two types of textual analysis in online reviews: analysis of
semantic features and analysis of the stylistic features of the xp + (1 − x) ∗ (1 − p) (7)
text [3, 11]. By exploring the textual content of a review, it
can be attentive to the opinions and sentiment expressed in Starting from Otterbacher (2009) [18] discoveries, it was
the text. According to the Oxford Dictionary, an opinion is modeled the review’s posting time as the difference between
a view or judgment formed about something, not necessarily the release date of the product and the review posting date
based on fact or knowledge. in days.
It is hypothesized that online reviews with a large amount 3.2.4 Validation
of opinions are more informative. In this approach, opinions
should be modeled numerically. So it is interesting only The ANN MLP for function approximation is composed
know the amount of opinions expressed in every review. To of three layers: an input layer, just one hidden layer and
this end, holds up a process of extracting opinions accord- an output layer. The input layer maps each of the input
ing to an adaptation of the model proposed by Sousa et al. variables studied. The output layer has only one neuron
(2015) [22]. that maps the output variable H, that represents the review
In the extraction of opinions, the texts are analyzed ac- helpfulness. To set the most appropriate number of neurons
cording to the phrasal structure of sentences. The stan- in the hidden layer it was used the Cross-validation pro-
dard adopted was “subject + verb + predicative of subject”, cess and for the neurons on the hidden layer, It was used
where the core of the subject is the qualified feature and the the logistic function as activation (5). For the output layer,
predicate core is the qualifying word. For example, “esse jogo it was used a linear activation function (3), since the out-
é divertido” (this game is fun) is extracted as (jogo, diver- put neuron performs only a linear combination of logistic
tido) or (game, fun). These linguistic patterns were defined activation functions implemented in the hidden layer neu-
by a manual analysis performed on another sample of 385 rons [5]. Therefore, after the MLP network training pro-
reviews collected on the same website (confidence level 95% cess, the weights matrix relating to the output neuron will
and error 5%). The result is shown in Table 1. correspond to the own λi parameters of the Equation (3),
The stylistic elements, the text readability and the text i.e. λi = W1,i . The illustration of this network is shown in
size were considered. Readability is the ease in which a text Figure 3.
can be understood. The readability of a text in English The input variables defined in the hypotheses specification
are: i) the number of reviews votes by the total amount of
8
http://www.ime.usp.br/ ueda/br.ispell/ reviews of a user; ii) the ratio of positive votes by the total
290
Figure 3: MLP for functional approximation
number of votes of a user; iii) the number of the user’s friends described as follows:
in the Steam community; iv) the amount of time (in hours) krcf
that the author dedicated to game analysis; v) the amount of Ms = p (9)
k + k(k − 1)rf f
opinions expressed in the text; vi) the readability of the text;
vii) the size of text in words; viii) the number of sentences where Ms is the heuristic “merit” of a feature subset S con-
in the text; ix) the number of monosyllabic words in the taining k features, rcf is the mean feature-class correlation
text; x) the difference between the average evaluation of the (f ∈ S) and rf f is the average feature-feature intercorrela-
product and the user evaluation; xi) the difference between tion [9]. The numerator of Equation (9) can be thought of
the date of release of the product and the date of publication as providing an indication of how predictive of the class a set
of the review in days. The expected output variable is the of features are; the denominator of how much redundancy
review helpfulness. there is among the features [9].
The relationship between each input variable and the out-
put variable can be calculate using the Relative Strength 4. RESULTS
(RS) measure [15] [26]: The most appropriated topology for an MLP to map an
specific problem is specified empirically [5]. The topology
n
X selection process is usually given by trial and error [15]. To
(Wki Wjk ) select the best topology between the candidates, it was used
RSji =
k=0
(8) the cross-validation k-fold. In that process, the total sample
m X n
X is divided in k folds, being (k − 1) of them used to com-
|Wki Wjk | pose the training subset while the the remaining partition
i=0 k=0
is the subset test. In this paper, k was defined being 10.
The overall performance of each candidate topology is ob-
where Wki denotes the weight of the k-th hidden unit and tained according to the average of individual performances
the i-th input unit. Wjk denotes the weight between the j-th observed when applying the k partitions.
output unit and the k-th hidden unit. RSji calculates the The RNA implemented in this study has random values
relative weight between the i-th input variable and the j-th as initial weights, learning rate and 0.1 and momentum 0.3.
output variable. The numerator RSji calculates the ratio We opted for small values to ensure that convergence occurs,
between the i-th input variable and the j-th output variable even if the process requires more times. To avoid overfitting,
and can be either positive or negative, depending on the each topology was trained until it was found the least mean
weights. Since the denominator relationship calculates the square error of the square (RMSE) local or when it was
total of all input and output variables. reached 104 epochs, whichever occurs first.
To validate the chosen features, it was used the Correlation- Figure 4 presents a graphic with the RMSE for each of the
based Feature Selection (CFS) [9]. CFS takes the subset tested topologies. We begin with a topology of one neuron
evaluation approach. The most suitable features are selected in the hidden layer and increase the layer in one neuron to
taking into account the correlation between all the features each new topology. We note that the error increases with
in the model. A good model has features that are not cor- every new neuron included in the hidden layer, but begins
related with each other, but are correlated with the class. to decrease as soon as it reaches five neurons. The topology
Features with high correlation with the class variable are chosen was the one that has the lowest number of neurons
highly relevant. CFS’s feature subset evaluation function is in the hidden layer and the lowest RMSE. The final RNA
291
MLP model chosen is a network of three layers consisting of The expertise of the author also seems to be a relevant fea-
11 input nodes, five nodes in the hidden layer and one output ture. In the case of Steam, the amount of hours played by
node. Given the complexity of performing predictions based an author at the time of publication of the review has a posi-
on human behavior, the RMSE of 0.1929 was considered tive impact on the perception of usefulness, but is correlated
acceptable. with another feature. The amount of user’s friends in the
community also has a positive impact on the perception of
review helpfulness.
292
[5] I. N. da Silva, D. H. Spatti, and R. A. Flauzino. Redes [21] R. R. Sinha and K. Swearingen. Comparing
Neurais Artificiais: para engenharia e ciências recommendations made by online systems and friends.
aplicadas, chapter Redes Perceptron Multicamadas. In DELOS Workshop: Personalisation and
Artliber, 2010. Recommender Systems in Digital Libraries, 2001.
[6] C. Danescu-Niculescu-Mizil, G. Kossinets, [22] R. F. d. Sousa, R. A. L. Rabelo, and R. S. Moura. A
J. Kleinberg, and L. Lee. How opinions are received by fuzzy system-based approach to estimate the
online communities: A case study on amazon.com importance of online customer reviews. IEEE
helpfulness votes. 18th International Conference on International Conference on Fuzzy Systems
World Wide Web,, 2009. (FUZZ-IEEE), 2015.
[7] C. Forman, A. Ghose, and B. Wiesenfeld. Examining [23] D. Squarisi and A. Salvador. A arte de escrever bem:
the Relationship Between Reviews and Sales: The Role um guia para jornalistas e profissionais do texto. pages
of Reviewer Identity Disclosure in Electronic Markets. 50–52. Editora Contexto, 2005.
INFORMS, 2008. [24] K. D. Vohs, R. F. Baumeister, B. J. Schmeichel, J. M.
[8] J. Goldenberg, B. Libai, and E. Muller. Talk of the Twenge, N. M. Nelson, and D. M. Tice. Making
network: A complex systems look at the underlying choices impairs subsequent self-control: A
process of word-of-mouth. Marketing Letters, Volume limited-resource account of decision making,
12, Issue 3, pages 211–223, 2001. self-regulation, and active initiative. Motivation
[9] M. A. Hall. Correlation-based Feature Subset Selection Science, Vol 1(S), pages 19–42, 2014.
for Machine Learning. PhD thesis, University of [25] C. N. Wathen and J. Burkell. Believe it or not:
Waikato, Hamilton, New Zealand, 1998. Factors influencing credibility on the web. Journal of
[10] N. Hu, L. Liu, and J. Zhang. Do online reviews affect the American Society for Information Science and
product sales? the role of reviewer characteristics and Technology, 53(2):134–144, 2002.
temporal effects. Information Technology and [26] Y. Yoon, T. Guimaraes, and G. Swales. Integrating
Management Vol. 9 No. 3, pages 201–214, 2008. artificial neural networks with rule-based expert
[11] S.-M. Kim, P. Pantel, T. Chklovski, and systems. Decis. Support Syst., 11(5):497–507, June
M. Pennacchiotti. Automatically assessing review 1994.
helpfulness. In D. Jurafsky and r. Gaussier, editors,
EMNLP, pages 423–430. ACL, 2006.
[12] Y. A. Kim and J. Srivastava. Impact of social
influence in e-commerce decision making. In
Proceedings of the Ninth International Conference on
Electronic Commerce, ICEC ’07, pages 293–302, New
York, NY, USA, 2007. ACM.
[13] J. P. Kincaid, R. P. Fishburne, R. L. Rogers, and B. S.
Chissom. Derivation of New Readability Formulas
(Automated Readability Index, Fog Count and Flesch
Reading Ease Formula) for Navy Enlisted Personnel.
Technical report, Feb. 1975.
[14] T. Landauer, P. Foltz, and D. Laham. An introduction
to latent semantic analysis. Discourse processes,
25:259–284, 1998.
[15] S. Lee and J. Y. Choeh. Predicting the helpfulness of
online reviews using multilayer perceptron neural
networks. Expert Syst. Appl., 41(6):3041–3046, 2014.
[16] M. Li, L. Huang, C.-H. Tan, and K.-K. Wei.
Helpfulness of online product reviews as seen by
consumers: Source and content features. International
Journal of Electronic Commerce, 17:101–136, 2013.
[17] S. M. Mudambi and D. Schuff. What makes a helpful
online review? a study of customer reviews on
amazon.com. MIS Quarterly, 34(1):185–200, 2010.
[18] J. Otterbacher. ’helpfulness’ in online communities: A
measure of message quality. In Proceedings of the
SIGCHI Conference on Human Factors in Computing
Systems, CHI ’09, pages 955–964. ACM, 2009.
[19] R. Schindler and B. Bickart. Perceived helpfulness of
online consumer reviews: The role of message content
and style. Journal of Consumer Behaviour,
11:234–243, 2012.
[20] S. Sen and D. Lerman. Why are you telling me this?
an examination into negative consumer reviews on the
web. Journal of Interactive Marketing, 21:76–96, 2007.
293