0% found this document useful (0 votes)
28 views26 pages

Spam

This document describes a study that uses a neural network model called sentence weighted neural network (SWNN) to learn document representations of reviews and detect deceptive spam reviews. The SWNN model learns the importance of each sentence and incorporates sentence weights into composing the document representation. The study also combines multiple syntactic features with the SWNN model and finds this feature combination outperforms other methods, achieving an F1 score of 86.1% for spam review detection. Experiments show the feature combination with SWNN has better robustness for cross-domain detection, while feature combination with unigrams performs better for domain-independent detection.

Uploaded by

Sai Raja G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views26 pages

Spam

This document describes a study that uses a neural network model called sentence weighted neural network (SWNN) to learn document representations of reviews and detect deceptive spam reviews. The SWNN model learns the importance of each sentence and incorporates sentence weights into composing the document representation. The study also combines multiple syntactic features with the SWNN model and finds this feature combination outperforms other methods, achieving an F1 score of 86.1% for spam review detection. Experiments show the feature combination with SWNN has better robustness for cross-domain detection, while feature combination with unigrams performs better for domain-independent detection.

Uploaded by

Sai Raja G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Accepted Manuscript

Document Representation and Feature Combination for Deceptive


Spam Review Detection

Luyang Li, Bing Qin, Wenjing Ren, Ting Liu

PII: S0925-2312(17)30398-3
DOI: 10.1016/j.neucom.2016.10.080
Reference: NEUCOM 18148

To appear in: Neurocomputing

Received date: 1 February 2016


Revised date: 21 October 2016
Accepted date: 27 October 2016

Please cite this article as: Luyang Li, Bing Qin, Wenjing Ren, Ting Liu, Document Representation
and Feature Combination for Deceptive Spam Review Detection, Neurocomputing (2017), doi:
10.1016/j.neucom.2016.10.080

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Document Representation and Feature Combination for


Deceptive Spam Review Detection
Luyang Li, Bing Qin, Wenjing Ren, Ting Liu

T
Research Center for Social Computing and Information Retrieval,
Harbin Institute of Technology, Harbin, China

IP
CR
Abstract

US
Deceptive spam reviews of products or service are harmful for customers in deci-
sion making. Existing approaches to detect deceptive spam reviews are concerned
in feature designing. Hand-crafted features can show some linguistic phenomena,
however can hardly reveal the latent semantic meaning of the review. We present
AN
a neural network based model to learn the representation of reviews. The model
makes a hard attention through the composition from sentence representation into
document representation. Specifically, we compute the importance weights of
M

each sentence and incorporate them into the composition process of document
representation. In the mixed-domain detection experiment, the results verify the
effectiveness of our model by comparing with other neural network based meth-
ED

ods. As the feature selection is very important in this direction, we make a feature
combination to enhance the performance. Then we get 86.1% F1 value which
outperform the state-of-the-art method. In the cross-domain detection experiment,
our method has better robustness.
PT

Keywords:
Spam review detection, Opinion spam, Representation learning
CE

1. Introduction
Deceptive opinion spam detection is an urgent and meaningful task in the field
AC

of natural language processing. By continuous growth of the user-generated re-


views, the appearance of deceptive opinion spam arouses people’s attention [1, 2,
3, 4]. Deceptive opinion spam is the review with fictitious opinions which are de-
liberately written to sound authentic [5]. For commercial motive, some businesses
hire people to write undeserving positive reviews to promote the objects or unjust

Preprint submitted to Nuclear Physics B March 2, 2017


ACCEPTED MANUSCRIPT

negative reviews to damage the reputations of the objects [6]. It is very difficult
for people to distinguish deceptive spam. In the test of Ott et al. [5], the average
accuracy of three human judges is only 57.33%. Hence, the research in detecting
deceptive opinion spam is necessary and meaningful.
The reviews are commonly short documents. The objective of the task is to

T
distinguish whether the document whether is a spam or a truth. The task can be
transformed into a 2-category classification problem. The majority of existing ap-

IP
proaches follows Ott et al. [5] and utlizes machine learning algorithms to build the
classifiers. Under this direction, most studies focus on designing effective features

CR
to enhance the classification performance. Feature engineering is important, how-
ever, we can hardly learn the inherent law of data from a semantic perspective. In
view of the good performance of neural network based models in the natural lan-

US
guage processing tasks currently, the document-level representation can be learnt
by neural network based models, and be used as features of the review.
In this work, we try to make a comparison and analysis between represen-
tation learning algorithms and conventional features while solving the problem.
AN
We present a novel method which is sentence weighted neural network (SWNN)
model to learn the document-level representation of the review and detect spam
reviews. Learning the representation of the document can capture the global fea-
M

ture and take word order and sentence order into consideration. We also make a
feature combination with SWNN, that the features are firstly used jointly in the
spam review detection. We verify the effectiveness of SWNN and the feature com-
ED

bination in two types of experiments. One is to verify the capability of domain mi-
gration on cross-domain dataset, and another is to verify the capability of domain-
independent spam review detection on mix-domain dataset. The experiments run
on the public data sets [7]. The domain migration experiment verifies that feature
PT

combination with SWNN has the best robustness. The domain-independent ex-
periment verifies that the feature combination with unigram perform better than
the feature combination with SWNN. The final result outperforms other strong
CE

baseline methods with the highest F1 value of 86.1%.


The major contributions of the work presented in this paper are as following.
• We present a sentence weighted neural network to learn the representation
AC

of document-level reviews. To learn the semantic of the document better, the


proposed model takes the importance of different sentences through com-
positing sentence representation into document representation.
• We use multiple syntax features and make a feature combination to further
improve the performance of our method.

2
ACCEPTED MANUSCRIPT

• We verify the effectiveness of SWNN and feature combination in domain


migration and domain-independent experiments for spam review detection.

It should be noted that the work is the extension of our last work which is about
learning document representation for deceptive opinion spam detection [8]. In the

T
last work, we presented the SWNN model to learn document representation and
detect deceptive spam reviews. In this work, we make two improvements. First,

IP
we introduce some new syntax features and use the feature combination to resolve
the problem. Second, we incorporate syntax features with SWNN to jointly detect

CR
spam reviews. The experimental result outperforms the original one.

2. Related work

US
We present a brief review of the related work from two perspectives. One is
deceptive opinion spam detection, and the other is neural networks for specific
task representation learning.
AN
2.1. Deceptive Opinion Spam Detection
On the Internet, various kinds of spam bring troubles to people. Over the
years, many studies focus on spam detection. Web spam has been extensively
M

studied [9, 10, 11, 12, 13, 14, 15]. The objective of the web spam is to gain high
page rank and attract people to click by fooling search engines. Email spam is an-
other related research, which pushes unsolicited advertisements to users [16, 17].
ED

Social media spam is a type of spam information which spreads rumors on the
social media [18]. The web spam and mail spam have a common character that
they have irrelevant words. Opinion spam is quite different and more crafty which
PT

contains opinions of users about products and services. By the explosive growth
of user-generated content, the number of opinion spam in the reviews increases
continuously. This phenomenon attracts the attention of researchers. Opinion
CE

spam is firstly investigated by Liu et al. [6] who also summarize the opinion spam
into different types. In terms of different damages to users, we can further con-
clude the opinion spam into two types which are including deceptive opinion spam
AC

and product-irrelevant spam. In the former spam, the spammers give undeserving
positive reviews or unjust negative reviews to the object for misleading costumers.
The latter spam contains no comments about the object. Obviously, the deceptive
opinion spam is more difficult to detect.
The approaches to detect deceptive opinion spam can be divided into unsuper-
vised methods and supervised methods. Liu et al. [19] take a Bayesian approach

3
ACCEPTED MANUSCRIPT

and formulate opinion spam detection as a clustering problem. There are also
many unsupervised methods to research on spammers detection [20, 21, 22, 23]
and reviewing patterns mining [24]. Due to the lack of gold standard data, most
methods take researches on pseudo labeled data. Liu et al. [6] assume duplicate
and near duplicate reviews to be deceptive spam. They also apply features of re-

T
view texts, reviewers and products. Yoo et al. [25] first collect a small amount of
deceptive spam and truth reviews and do a linguistic analysis on them. By apply-

IP
ing Amazon Mechanical Turk, Ott et al. [5, 26, 27] gather a gold standard labeled
data. A few follow-up spam detecting methods have been presented on the data

CR
set. Ott et al. estimate prevalence of deceptive opinion spam in reviews [26], and
identify negative spam [27]. Li et al. [28] identify manipulated offerings on re-
view portals. Feng et al. [29] apply context free grammar parse trees to extract

US
syntactic features to improve the performance of the model. Feng and Hirst [30]
take the group of reference reviews into account according to the same product.
Although there are deceptive opinion spam in the Ott’s data sets, it still can not
reflect the real condition with the lack of cross-domain data, and the Turkers also
AN
lack of professional knowledge. Li et al. [7] create a cross-domain data sets (i.e.
hotel, restaurant, and doctor) with part of reviews from domain experts. On this
labeled data set, they use n-gram features as well as POS and LIWC features in
M

classification and show that POS perform more robust on cross-domain data.

2.2. Neural Networks for Representation Learning


ED

Representation learning by neural networks based methods have been proven


to be effective in the place of task-specific feature engineering [31]. Compared
with feature engineering, representation learning does not need much prior knowl-
edge. As a continuous real-valued vector, representation can be incorporated as
PT

features in a variety of natural language processing tasks [32, 33, 34], such as POS
tagging, chunking, named entity recognition [32, 35], semantic role labeling, pars-
ing [36], language modeling [37, 38], sentiment analysis tasks [39, 40] and text
CE

classification [41]. Representation learning is to learn continuous representations


of text with different grains, like word, phrase, sentence and document.
As for representing a document, the existing deep learning methods consist of
AC

two processing stages. Firstly, word embedding should be learnt by massive text
corpus. Some work utilizes global context of document and multiple word proto-
types [42], or global word-word co-occurrence to improve word embedding [43].
There are also some work for task-specific word-embedding [40]. After obtaining
word representation, many studies focus on researching the semantic composition

4
ACCEPTED MANUSCRIPT

Non-linear

Pooling

T
Convolutional

IP
CR
Lookup

US
The Chicago Hilton is very great

Figure 1: The traditional neural network for learning sentence representation.


AN
methods. Yessenalina et al. use matrices to model each word and applying itera-
tive matrix multiplication to combine words [44]. Glorot et al. develop Stacked
Denoising Autoencoders for domain adaptation [45]. Socher et al. propose Re-
M

cursive Neural Network (RNN) [46], matrixvector RNN [47] and Recursive Neu-
ral Tensor Network (RNTN) [39] to learn the semantic of unfixed-length phrases.
Hermann et al. [48] learn the semantic of sentences by Combinatory Categorial
ED

Autoencoder method. The method is the combination of Combinatory Categorial


Grammar and Recursive Autoencoder. Li et al. [49] use feature weight tuning to
control the effect one specific unit makes to the higher-level representation in a
PT

Recursive Neural Network. Le et al. [50] learn the representation of paragraph.

3. Methodology
CE

In the section, we present the details of neural network based models to learn
document representation for deceptive spam review detection. We develop two
convolutional neural network models to learn document representation. In the fol-
AC

lowing subsections, we firstly introduce the conventional model and then present
the details of our proposed models.

3.1. Basic Convolutional Neural Network


Collobert et al. [32] introduce a neural network approach to learn the repre-
sentation of a sentence. The architecture is shown in Fig. 1. It is a multilayer

5
ACCEPTED MANUSCRIPT

Linear

Tanh

T
Document Convolution Convolution

IP

CR
Sentence Convolution Convolution Convolution
Convolution


US … … …
AN
s1 s2 sn

Figure 2: SCNN model for learning sentence representation.


M

neural network which consists of four layers. Given a sentence “The Chicago
Hilton is very great”, the model applies the lookup layer to map these words into
corresponding word embeddings which are continuous real-valued vectors. The
ED

convolutional layer extracts local features by representing the semantic meaning


of the words in the window. The size of the output of convolutional layer depends
on the number of words in the sentence fed to the network. Pooling layer obtains
PT

a global feature vector by combining the local feature vectors through previous
layers. Common operations are doing average or max operations over the corre-
sponding vectors. The average operation captures the influence of all words to
CE

the certain task. The max operation captures the most useful local features pro-
duced by convolutional layer. The non-linear layer is necessary to extract high
level features.
AC

3.2. The document representation learning model


Basic model. We apply the traditional convolutional neural network model
to represent sentences. To make a composition for the document, we use average
operation to capture all of the sentences features on the pooling layer. This is a
basic model, which is modified below to suit the deceptive opinion spam detection

6
ACCEPTED MANUSCRIPT

Linear

Tanh

T
Weighted Pooling α1 α2 αn

IP
Sentence weight Sentence weight Sentence weight
generation generation generation

CR
Convolution Convolution Convolution Convolution

Lookup … … … …

s1
US s2

Figure 3: SWNN model for learning sentence representation.


sn
AN

task.
SCNN model. As the architecture is shown in the Fig. 2, SCNN model con-
M

sists of two convolutional layers to do the composition. The sentence convolution


is to make a composition of each sentence by a fix-length window. The document
convolution transforms sentence vectors into a document vector. The output is
ED

consisting of the scores of the corresponding category. The training objective is


that the gold label has a higher score than the other label through predicting the
label of a review. This objective can be optimized by a hinge loss. Compared with
softmax function which is with strict constrain, hinge loss is a relaxed constraint
PT

that is more suitable for our objective. Hinge loss function is defined as shown in
Eq. 1, where t is the gold label of the review r, t∗ stands for the another label, and
mδ is the margin in the experiment.
CE

Loss(r) = max(0, mδ − f (rt ) + f (rt∗ )) (1)


AC

SWNN model. The sentence-weighted neural network model is a modified


model of the basic document representation learning model. As a matter of fact,
the words in a review play different roles in the semantic representation. Some
words must be more important in distinguishing spam from the truth reviews.
Hence, each sentence also owns its importance weight according to the words in
it. We compute the importance weight of the sentence based on the importance

7
ACCEPTED MANUSCRIPT

weights of words in the sentence. We apply KL-divergence as the importance


weight of the word. The value of KL-divergence stands for the capacity of a
feature in dividing documents which is a feature selection approach. In fact, we
also try tf − idf as a candidate of weight computing method, however, it does
not perform as well as KL-divergence in the experiment. We assume that U =

T
{U1 , ..., Ui , ..., Un } is the universal set of words in the review, where Ui is the
word set of the ith sentence, and Wj stand for the weight of the jth word. The

IP
sentence weight is a normalization value like in the following formula.
P

CR
j∈U Wj
αi = P i (2)
k∈U Wk

In the Fig. 3, the architecture of SWNN model is given. Each sentence of

US
the input document review transforms into the fixed-length vector through con-
volutional layer. The process of generating sentence weights produce normalized
weight αi corresponding to the ith sentence. Through the pooling layer, the sen-
AN
tence vectors transform into a document vector by a weighted-average operation.
More important sentences have more influences when producing the document
vector. The vector transforms through tanh layer to extract high level features.
The linear layer produces the scores of the categories.
M

3.3. Features
We add two types of features to the proposed model. POS can capture syn-
ED

tax feature and first-person pro By incorporating the features, SWNN model can
capture both semantic and syntax features.
POS. In Li’s analysis [7] between spam reviews and truth reviews, the ob-
servations of the POS distribution are in agreement with the early findings in the
PT

literature [51, 52]. The findings are that truth reviews contain more nouns(N), ad-
jectives(JJ), prepositions(IN) and determiners(DT); spam reviews contain more
verbs(V), adverbs(RB), pronouns(PRP) and pre-determiners(PDT). Thus, POS
CE

features are meaningful features in distinguishing spam reviews.


First-Person Pronouns. Psychologically, the frequency of the first-person
pronouns in a review is relevant to whether the review is a spam or not [53,
AC

54, 55, 52]. In the literatures, they find spam reviews contain less first-person
pronouns.

3.4. Complexity
We try to make a theoretical analysis in the time complexity of the proposed
methods. During learning neural network based models, the time is mainly costed

8
ACCEPTED MANUSCRIPT

on updating each weight matrix W . Supposed the dimension of W is d, the order


of magnitudes is d2 in the time complexity through each matrix updating. We
adopt stochastic gradient descent (SGD) to optimize the parameters which con-
clude weight matrices. In other words, the matrices are updated at each time
when there is a new sample. Supposed the number of samples in dataset is n, then

T
the time complexity of the proposed methods is O(n ∗ d2 ).

IP
4. Experiments

CR
We conduct experiments to empirically evaluate our document representation
learning model by applying it in spam review detection. We do two types of ex-
periments which are cross-domain classification and mixed-domain classification.

ture combination.

4.1. Data and Evaluation Criteria


US
We make an analysis between neural network based methods and SVM with fea-
AN
We apply the public dataset released by Jiwei Li [7] which is gold standard
spam review dataset. The dataset contains three domains (hotel, restaurant and
doctor). The distribution of the dataset is shown in Table 1. There are three types
of data in each domain, “Turker”, “Expert”, and “Customer”. They stand for dif-
M

ferent data sources. The spam reviews are edited by Turkers and experts. Specif-
ically, Li [7] and Ott [5, 27] use Amazon Mechanical Turk to collect deceptive
reviews from online workers (Turkers). Experts are employees in each domain
ED

who have expert-level domain knowledge. The truth reviews are from customers
who really have consumption experience.
In the cross-domain classification, we want to make a comparison with Li’s
PT

method. According to Li’s paper, he applies only 200 spam reviews from 356
spam reviews in Doctor domain, and does not apply “Expert” data in his experi-
ment. Hence, we do our best to use data with the same distribution in the cross-
CE

domain experiment. We insist the same treatment on positive sentiment samples


and negative sentiment samples. Thus, the statistical data in the table is the total
number of each domain.
AC

In the mixed-domain spam review classification, all the spam review samples
from Turkers and experts in Table 1 and truth reviews from customers are utilized.
Then spam reviews are 1,636 and truth reviews are 1,200. We use five-fold cross
validation. The data is split into five equal folds, and four folds are treated as
training data, the remaining fold is as test data.

9
ACCEPTED MANUSCRIPT

Domain Turker Expert Customer


Hotel 800 280 800
Restaurant 200 0 200
Doctor 356 0 200

T
Table 1: Statistics of the three domain dataset.

IP
We use accuracy (A), precision (P), recall (R) and F1 score to evaluate the

CR
effectiveness of the methods. Accuracy score reflects the prediction capability on
both spam samples and non-spam samples. Precision score reflects the correct-
ness of predicting spam samples. Recall score reflects the coverage of correctly
predicting spam samples in the true spam samples. F1 score reflects a trade off
prediction capability.

4.2. Cross-domain Classification


US
AN
To frame the problem as a domain adaptation task, we want to find a more ro-
bust feature on cross-domain dataset. On the latest public data, only Li shows the
experiment results. Hence, we do the comparison with his method. Our methods
conclude paragraph-average, basic CNN, SWNN and the combination of afore-
M

mentioned methods and features. Specifically, paragraph-average is the average


vector of all word embedding in the paragraph which can be regarded as features
in the SVM classifier.
ED

Baseline Method. Li respectively apply Unigram, LIWC and POS features


in SVM and SAGE classifiers to explore a more general classifier of the task.
SAGE is sparse additive generative model which can be viewed as a combination
PT

of topic models and generalized additive models. However, SAGE do not outper-
form SVM. We apply SVM 1 as the classifier in the comparison experiment. In
Li’s experiment, the method gains best results by using Unigram an POS features
CE

in test datasets (restaurant and doctor domains) by training hotel domain data.
Hence, we just list the best results from his paper.
Results and Analysis. Table 2 shows the results from baseline method as
well as our methods. Unigram get the best result on restaurant domain, but it is
AC

not robust on doctor domain. Paragraph-average method performs comparable


with the best result on restaurant domain, and is robust across domains. Basic
CNN with features perform best on doctor domain. Generally, the neural network

1
We use LIBSVM as the software tool to run SVM classifier.

10
ACCEPTED MANUSCRIPT

based methods perform more robust across domains. We can see SWNN with
features does not perform as well as basic CNN with features. The reason is the
sentence weight from SWNN is domain-specific.

Restaurant Doctor

T
Features A P R F1 A P R F1
Unigram 0.785 0.813 0.742 0.778 0.550 0.573 0.725 0.617

IP
POS 0.735 0.697 0.815 0.751 0.540 0.521 0.975 0.679
Paragraph-average 0.733 0.684 0.865 0.764 0.588 0.555 0.885 0.682

CR
Basic CNN+POS+I 0.725 0.679 0.855 0.757 0.583 0.548 0.950 0.695
SWNN 0.690 0.644 0.850 0.733 0.610 0.573 0.860 0.688
SWNN+POS+I 0.668 0.612 0.915 0.733 0.615 0.576 0.870 0.693

US
Table 2: Classifier performance on cross-domain test data.
AN
4.3. Mixed-domain Classification
We gather all domain data into a mixed-domain dataset. We verify the effec-
tiveness of proposed neural network method as well as SVM with feature com-
M

bination. Some comparison experiments are also made among different neural
network methods. The experiment results are based on five-fold cross validation.

Model A P R F1
ED

unigram 0.825 0.828 0.880 0.853


bigram+ 0.804 0.778 0.925 0.845
POS 0.637 0.658 0.776 0.712
PT

POS + bigram+ 0.808 0.783 0.924 0.848


SWNN 0.801 0.800 0.873 0.834
SWNN+POS 0.797 0.791 0.886 0.835
CE

SWNN+POS+I 0.822 0.844 0.847 0.845


unigram+I 0.831 0.830 0.890 0.859
unigram+POS 0.830 0.830 0.887 0.858
0.835 0.861
AC

unigram+POS+I 0.839 0.885

Table 3: Spam review classification on mixed-domain data.

We adopt previous methods as baseline [5, 7], which utilize SVM as classifier
with unigram, bigram and POS as traditional features.

11
ACCEPTED MANUSCRIPT

Results and Analysis. The results are shown in Table 3, in which unigram
with POS and first-person feature gets the best results in accuracy and F1 values.
In the spam review detection task, unigram is a strong feature. Even being used
alone, unigram has a higher F1 value than SWNN. However, SWNN with features
gain highest value in precision which is useful in the application of spam review

T
detection.

IP
4.3.1. Comparisons among Neural Network based Methods
We apply various neural network based methods to learn the document repre-

CR
sentation and do the spam review classification. The experiments are gained on
mixed-domain dataset by five-fold cross validation.

Model A P R F1
Paragraph-average
Weight-average
Basic LSTM
US
0.729
0.680
0.550
0.704
0.652
0.590
0.915
0.955
0.720
0.795
0.775
0.720
AN
Hier-LSTM 0.618 0.608 0.949 0.741
Basic CNN 0.708 0.694 0.883 0.776
SCNN 0.702 0.698 0.851 0.766
SWNN 0.801 0.800 0.873 0.834
M

Table 4: Performance of Neural Network based Method on mixed-domain data.


ED

The Weight-average method is computing a weighted average vector of word


embeddings in the document which serves as features in SVM classifier. The
weight of each word is computed by information gain. Basic LSTM method uses
PT

LSTM to represent sentences and average operation to represent documents. Hier-


LSTM use hierarchical LSTM to represent the review documents. Specifically, it
utilizes LSTM to composite the sentence representations into document represen-
CE

tation. Basic CNN is the basic convolutional neural network model. The sentences
are represented through convolutional layer and transform into a document vec-
tor by average-pooling operation. SCNN applies convolutional layer to replace
AC

the average operation. SWNN is the modification of Basic CNN model by using
sentence weights.
Results and Analysis. We do the comparison among various document rep-
resentations. Table 4 shows the results that our SWNN model gains the best
result in deceptive spam classification. The scores of accuracy and F1 are both

12
ACCEPTED MANUSCRIPT

high above the other neural-network based methods. The results show the effec-
tiveness of incorporating sentence weight in representing document. We also find
more complex model like SCNN and Hier-LSTM do not perform as well as sim-
ple model like Paragraph-average model and Basic CNN model. Overfitting is a
primary reason. Hier-LSTM gains F1 value of 97% on training data, but with a

T
low result on test data. For a small dataset, neural network based models with
many parameters is not necessarily a good choice.

IP
Meanwhile, we make an analysis about spam review detection capacity of
SWNN on each domain. From Table 5, we find the proposed method performs

CR
best on restaurant domain and worst on doctor domain. The reviews on hotel
and restaurant domains share more linguistic phenomena [7]. Hence, the model
generalizes better on restaurant reviews than on doctor reviews.

Domain
Hotel
Restaurant
P
US
0.841
0.870
R
0.833
0.882
F1
0.837
0.876
AN
Doctor 0.850 0.810 0.829

Table 5: SWNN performance in each domain in spam review classification.


M

4.3.2. Parameter Settings


ED

We experimentally study the effect of three parameters in our deceptive spam


review detection experiment, which are window size, hidden layer length and
learning rate. We do the comparison experiments by five-fold cross validation.
The results of averaged accuracy and F1 value are shown in Fig. 4, from which
PT

we can see the averaged accuracy and F1 value both have one top when window
size is set as 2, hidden layer length is 50 and the learning rate is 0.3. Thus, we use
these settings in our experiments.
CE

We use LIBSVM as a software tool to implement SVM classifier. Although


there are some optimization approaches [56, 57] about the machine learning meth-
ods, we tune parameters according to each kernel function by equal distance ad-
AC

justing method. There are four kernel functions to be chosen in the LIBSVM,
which are linear kernel function, polynomial kernel function, gaussian kernel
function (also called radial basis function) and sigmoid kernel function. Different
kernel functions are used with different parameters. We tune parameter c with
linear kernel function; d, g, r and c with polynomial kernel function; g and c with
gaussian kernel function; g, r and c with sigmoid kernel function. We tune each

13
ACCEPTED MANUSCRIPT

0.86

0.84

0.82

0.8

0.78

T
0.76

IP
0.74

0.72
Accuracy
0.7

CR
F1
0.68
2 4 6 8 10 window size

(a) Effect of window size


0.84

0.83

0.82
US
AN
0.81

0.8
M

0.79
Accuracy
0.78
F1
0.77 hidden layer
ED

50 100 150 200 length

(b) Effect of hidden layer length


0.86
PT

0.84
0.82
0.8
0.78
CE

0.76
0.74
0.72
0.7
AC

0.68
Accuracy
0.66
F1
0.64
0.0003 0.003 0.03 0.3 learning rate

(c) Effect of learning rate

Figure 4: The effect of three parameters in the experiment.

14
ACCEPTED MANUSCRIPT

parameter for each kernel function. We find that only a part of parameters can
affect the classification results based on corresponding kernel function. Specifi-
cally, fine tuning of c can enhance the detecting capacity of the models with linear,
gaussian or sigmoid kernel function; g and r can enhance the models with poly-
nomial kernel function. The effect of the above parameters in the model with

T
each kernel function is tested by five-fold cross validation. The specific results are
shown in the appendix. When we adopt gaussian kernel function and set c as 400,

IP
the model has the best results. The results of each kernel function with the most
suitable parameters are shown in Table 6.

CR
Kernel function A P R F1 parameter setting
Linear 0.827 0.838 0.870 0.853 c=1
Polynomial 0.828 0.826 0.891 0.857 c=1, g=0.5, r=10, d=3
Gaussian
Sigmoid
0.835
0.832
0.839
0.836
US
0.885 0.861 c=400, g=0.5
0.883 0.859 c=400, g=0.5, r=0
AN
Table 6: The classification results of SVM with different kernel functions and suitable parameters.

Finally, we adopt gaussian as the kernel function and set c as 400, g as 0.5,
M

which are verified most effective in the experiments.

5. Conclusion
ED

We introduce a novel convolutional neural network to learn document repre-


sentation for deceptive spam review detection. Sentences play different roles in
the document, in other words they have different importance. We model semantic
PT

representation of reviews by incorporating sentence weights into document-level


representation learning. We construct experiments on the latest public data sets
and compare with multiple baseline methods. The results show that sentence-
CE

weighted neural network is more effective than other neural network based mod-
els in the deceptive spam review detection. We also find that neural network based
methods perform more robust than the hand-crafted features on cross-domain data
AC

set. Additionally, we do the comparison between neural network based method


and traditional features. By making a feature combination, we enhance the F1
value to 86.1%.
Our sentence-weighted neural network has an attention mechanism which
takes the different importance of sentences in the review document into consid-
eration. However, the attention weights are computed by a hard alignment and a

15
ACCEPTED MANUSCRIPT

fixed mode, which can be improved by a soft alignment and a more flexible mode.
The document is intuitively useful through computing the importance weights of
each sentence. Thus, memory network based model may be effective to resolve
the problem. We will verify the ideas in the future.

T
6. Acknowledgments.

IP
This work was supported by the National High Technology Development 863
Program of China (NSFC) via grant 2015AA015407, National Natural Science

CR
Foundation of China (NSFC) via grant 61133012 and 61273321.

References

US
[1] C. Miller, Company settles case of reviews it faked, New York Times.

[2] D. Meyer, Fake reviews prompt belkin apology, CNet News.


AN
[3] D. Streitfeld, For 2 a star, an online retailer gets 5 star product reviews, New
York Times 26.

[4] A. Topping, Historian orlando figes agrees to pay damages for fake reviews,
M

The Guardian 16.

[5] M. Ott, Y. Choi, C. Cardie, J. T. Hancock, Finding deceptive opinion spam


ED

by any stretch of the imagination, in: Proceedings of the 49th Annual


Meeting of the Association for Computational Linguistics Human Language
Technologies-Volume 1, Association for Computational Linguistics, 2011,
PT

pp. 309–319.

[6] N. Jindal, B. Liu, Opinion spam and analysis, in: Proceedings of the 2008
International Conference on Web Search and Data Mining, ACM, 2008, pp.
CE

219–230.

[7] J. Li, M. Ott, C. Cardie, E. Hovy, Towards a general rule for identifying
AC

deceptive opinion spam, Proceedings of the 52nd Annual Meeting of the


Association for Computational Linguistics (2014) 1566–1576.

[8] L. Li, W. Ren, B. Qin, T. Liu, Learning Document Representation for De-
ceptive Opinion Spam Detection, Springer International Publishing, 2015.

16
ACCEPTED MANUSCRIPT

[9] Z. Gyöngyi, H. Garcia-Molina, J. Pedersen, Combating web spam with


trustrank, in: Proceedings of the Thirtieth international conference on Very
large data bases-Volume 30, VLDB Endowment, 2004, pp. 576–587.

[10] A. Ntoulas, M. Najork, M. Manasse, D. Fetterly, Detecting spam web pages

T
through content analysis, in: Proceedings of the 15th international confer-
ence on World Wide Web, ACM, 2006, pp. 83–92.

IP
[11] Z. Gyöngyi, H. Garcia-Molina, Link spam alliances, in: Proceedings of the
31st international conference on Very large data bases, VLDB Endowment,

CR
2005, pp. 517–528.

[12] P. T. Metaxas, J. DeStefano, Web spam, propaganda and trust., in: AIRWeb,
2005, pp. 70–78.
US
[13] B. Wu, B. D. Davison, Identifying link farm spam pages, in: Special interest
tracks and posters of the 14th international conference on World Wide Web,
AN
ACM, 2005, pp. 820–829.

[14] D. Fetterly, M. Manasse, M. Najork, Detecting phrase-level duplication on


the world wide web, in: Proceedings of the 28th annual international ACM
M

SIGIR conference on Research and development in information retrieval,


ACM, 2005, pp. 170–177.
ED

[15] C. Castillo, D. Donato, A. Gionis, V. Murdock, F. Silvestri, Know your


neighbors: Web spam detection using the web topology, in: Proceedings
of the 30th annual international ACM SIGIR conference on Research and
development in information retrieval, ACM, 2007, pp. 423–430.
PT

[16] P.-A. Chirita, J. Diederich, W. Nejdl, Mailrank: using ranking for spam de-
tection, in: Proceedings of the 14th ACM international conference on Infor-
CE

mation and knowledge management, ACM, 2005, pp. 373–380.

[17] H. Drucker, D. Wu, V. N. Vapnik, Support vector machines for spam catego-
rization, Neural Networks, IEEE Transactions on 10 (5) (1999) 1048–1054.
AC

[18] F. Wu, J. Shu, Y. Huang, Z. Yuan, Co-detecting social spammers and spam
messages in microblogging via exploiting social contexts, Neurocomputing
201 (2016) 51C65.

17
ACCEPTED MANUSCRIPT

[19] A. Mukherjee, A. Kumar, B. Liu, J. Wang, M. Hsu, M. Castellanos,


R. Ghosh, Spotting opinion spammers using behavioral footprints, in: Pro-
ceedings of the 19th ACM SIGKDD international conference on Knowledge
discovery and data mining, ACM, 2013, pp. 632–640.

T
[20] G. Wang, S. Xie, B. Liu, P. S. Yu, Review graph based online store review
spammer detection, in: Data mining (icdm), 2011 ieee 11th international

IP
conference on, IEEE, 2011, pp. 1242–1247.

[21] A. Mukherjee, B. Liu, J. Wang, N. Glance, N. Jindal, Detecting group review

CR
spam, in: Proceedings of the 20th international conference companion on
World wide web, ACM, 2011, pp. 93–94.

US
[22] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, H. W. Lauw, Detecting product
review spammers using rating behaviors, in: Proceedings of the 19th ACM
international conference on Information and knowledge management, ACM,
2010, pp. 939–948.
AN
[23] A. Mukherjee, B. Liu, N. Glance, Spotting fake reviewer groups in consumer
reviews, in: Proceedings of the 21st international conference on World Wide
Web, ACM, 2012, pp. 191–200.
M

[24] N. Jindal, B. Liu, E.-P. Lim, Finding unusual review patterns using unex-
pected rules, in: Proceedings of the 19th ACM international conference on
ED

Information and knowledge management, ACM, 2010, pp. 1549–1552.

[25] K.-H. Yoo, U. Gretzel, Comparison of deceptive and truthful travel reviews,
Information and communication technologies in tourism 2009 (2009) 37–47.
PT

[26] M. Ott, C. Cardie, J. Hancock, Estimating the prevalence of deception in


online review communities, in: Proceedings of the 21st international confer-
CE

ence on World Wide Web, ACM, 2012, pp. 201–210.

[27] M. Ott, C. Cardie, J. T. Hancock, Negative deceptive opinion spam., in:


HLT-NAACL, 2013, pp. 497–501.
AC

[28] J. Li, M. Ott, C. Cardie, Identifying manipulated offerings on review portals.,


in: EMNLP, 2013, pp. 1933–1942.

18
ACCEPTED MANUSCRIPT

[29] S. Feng, R. Banerjee, Y. Choi, Syntactic stylometry for deception detection,


in: Proceedings of the 50th Annual Meeting of the Association for Compu-
tational Linguistics: Short Papers-Volume 2, Association for Computational
Linguistics, 2012, pp. 171–175.

T
[30] V. W. Feng, G. Hirst, Detecting deceptive opinions with profile compati-
bility, in: Proceedings of the 6th International Joint Conference on Natural

IP
Language Processing, Nagoya, Japan, 2013, pp. 14–18.

[31] A. Prieto, B. Prieto, E. M. Ortigosa, E. Ros, F. Pelayo, J. Ortega, I. Rojas,

CR
Neural networks: An overview of early research, current frameworks and
new challenges, Neurocomputing.

US
[32] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa,
Natural language processing (almost) from scratch, The Journal of Machine
Learning Research 12 (2011) 2493–2537.
AN
[33] N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural net-
work for modelling sentences, Proceedings of the 52nd Annual Meeting of
the Association for Computational Linguistics (2014) 655C665.
M

[34] J. Li, J. Dan, E. Hovy, When are tree structures necessary for deep learning of
representations?, Proceedings of the 2015 Conference on Empirical Methods
in Natural Language Processing (2015) 2304C2314.
ED

[35] J. Turian, L. Ratinov, Y. Bengio, Word representations: a simple and gen-


eral method for semi-supervised learning, in: Proceedings of the 48th an-
nual meeting of the association for computational linguistics, Association
PT

for Computational Linguistics, 2010, pp. 384–394.

[36] R. Socher, J. Bauer, C. D. Manning, A. Y. Ng, Parsing with compositional


CE

vector grammars, in: In Proceedings of the ACL conference, Citeseer, 2013.

[37] A. Mnih, G. E. Hinton, A scalable hierarchical distributed language model,


in: Advances in neural information processing systems, 2009, pp. 1081–
AC

1088.

[38] Y. Bengio, R. Ducharme, P. Vincent, C. Janvin, A neural probabilistic lan-


guage model, The Journal of Machine Learning Research 3 (2003) 1137–
1155.

19
ACCEPTED MANUSCRIPT

[39] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng,


C. Potts, Recursive deep models for semantic compositionality over a sen-
timent treebank, in: Proceedings of the conference on empirical methods in
natural language processing (EMNLP), Vol. 1631, Citeseer, 2013, p. 1642.

T
[40] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning sentiment-
specific word embedding for twitter sentiment classification, in: Proceedings

IP
of the 52nd Annual Meeting of the Association for Computational Linguis-
tics, Vol. 1, 2014, pp. 1555–1565.

CR
[41] P. Wang, B. Xu, J. Xu, G. Tian, C. L. Liu, H. Hao, Semantic expansion using
word embedding clustering and convolutional neural network for improving
short text classification, Neurocomputing 174 (PB) (2016) 806–814.

US
[42] E. H. Huang, R. Socher, C. D. Manning, A. Y. Ng, Improving word represen-
tations via global context and multiple word prototypes, in: Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics:
AN
Long Papers-Volume 1, Association for Computational Linguistics, 2012,
pp. 873–882.
[43] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word
M

representation, Proceedings of the Empiricial Methods in Natural Language


Processing (EMNLP 2014) 12 (2014) 1532–1543.
ED

[44] A. Yessenalina, C. Cardie, Compositional matrix-space models for sentiment


analysis, in: Proceedings of the Conference on Empirical Methods in Natural
Language Processing, Association for Computational Linguistics, 2011, pp.
172–182.
PT

[45] X. Glorot, A. Bordes, Y. Bengio, Domain adaptation for large-scale senti-


ment classification: A deep learning approach, in: Proceedings of the 28th
CE

International Conference on Machine Learning (ICML-11), 2011, pp. 513–


520.
[46] R. Socher, C. C. Lin, C. Manning, A. Y. Ng, Parsing natural scenes and
AC

natural language with recursive neural networks, in: Proceedings of the 28th
international conference on machine learning (ICML-11), 2011, pp. 129–
136.
[47] R. Socher, B. Huval, C. D. Manning, A. Y. Ng, Semantic composition-
ality through recursive matrix-vector spaces, in: Proceedings of the 2012

20
ACCEPTED MANUSCRIPT

Joint Conference on Empirical Methods in Natural Language Processing and


Computational Natural Language Learning, Association for Computational
Linguistics, 2012, pp. 1201–1211.

[48] K. M. Hermann, P. Blunsom, The role of syntax in vector space models of

T
compositional semantics., in: ACL (1), 2013, pp. 894–904.

IP
[49] J. Li, Feature weight tuning for recursive neural networks, Eprint Arxiv.

[50] Q. V. Le, T. Mikolov, Distributed representations of sentences and docu-

CR
ments, Computer Science 4 (2014) 1188–1196.

[51] B. M. Depaulo, M. E. Ansfield, K. L. Bell, Interpersonal deception theory,


Communication Theory 6 (3) (1996) 297–310.

US
[52] P. Rayson, A. Wilson, G. Leech, Grammatical word class variation within the
british national corpus sampler, Language & Computers (2001) 295–306.
AN
[53] M. L. Newman, J. W. Pennebaker, D. S. Berry, J. M. Richards, Lying words:
Predicting deception from linguistic style, Personality & Social Psychology
Bulletin 29 (5) (2003) 665–675.
M

[54] L. Zhou, J. K. Burgoon, D. P. Twitchell, T. Qin, J. F. Nunamaker, A compari-


son of classification methods for predicting deception in computer-mediated
communication., Journal of Management Information Systems 20 (4) (2004)
ED

139–165.

[55] M. L. Knapp, M. E. Comaden, Telling it like it isn’t: A review of theory and


PT

research on deceptive communications, Human Communication Research


5 (3) (1979) 270C285.

[56] O. A. Arqub, Z. Abo-Hammour, Numerical solution of systems of second-


CE

order boundary value problems using continuous genetic algorithm, Infor-


mation Sciences 279 (2014) 396–415.
AC

[57] O. A. Arqub, Adaptation of reproducing kernel algorithm for solving fuzzy


fredholmcvolterra integrodifferential equations, Neural Computing & Appli-
cations (2015) 1–20.

21
ACCEPTED MANUSCRIPT

0.86

0.85

0.84

0.83

T
0.82

IP
0.81
Accuracy
0.8
F1

CR
0.79
1 50 100 150 200 250 300 350 400 C

Figure A.5: The effect of c in SVM with linear kernel function.

0.9

0.85

0.8
US
AN
0.75

0.7

0.65
M

0.6
Accuracy
0.55
F1
0.5
1 100 150 200 250 300 350 400 500 C
ED

Figure A.6: The effect of c in SVM with gaussian kernel function.


PT

Appendix A. The Effect of Parameters in SVM


We tune c of SVM separately with linear kernel function, gaussian kernel func-
tion and sigmoid kernel function. The effect of c based on each kernel function
CE

on spam detection classification is show in Fig. A.5, Fig. A.6 and Fig. A.7.
We tune d, g, r and c with polynomial kernel function. We find only g and r
can affect the results; d and c have no influence. When we tune one parameter, the
AC

other parameters are set as default. We also test some combination of the values of
g and r. The results show that SVM with polynomial function will acquire good
results when r is set as 10 and other parameters are set as default. The effect of g
and r on spam detection classification is show in Fig. A.8.

22
ACCEPTED MANUSCRIPT

0.9

0.85

0.8

0.75

T
0.7

0.65

IP
0.6
Accuracy
0.55
F1

CR
0.5
1 100 200 300 400 500 C

Figure A.7: The effect of c in SVM with sigmoid kernel function.

0.9

0.85
US
AN
0.8

0.75

0.7
M

0.65
Accuracy
0.6
F1
0.55
ED

0.5 1 5 10 15 20 g

(a) Effect of g
0.9
PT

0.85

0.8
CE

0.75

0.7

0.65
AC

Accuracy
0.6
F1
0.55
0 5 10 15 20 r

(b) Effect of r

Figure A.8: The effect of g and r in SVM with polynomial kernel function.

23
ACCEPTED MANUSCRIPT

Luyang Li received the Master¡¯s degree in July 2011 from the Department of Computer Science,

T
Harbin Institute of Technology, Harbin, China. Since 2011, she has been a Ph.D. candidate at the
Department of Computer Science, Harbin Institute of Technology. Her current research interests

IP
include natural language processing, contradiction detection, deceptice spam review detection and
representation learning.

CR
US
AN

Wenjing Ren received the bachelor degree in July 2015 from the Department of Computer Science,
Harbin Institute of Technology, Harbin, China. Since 2015, she has been a master candidate at the
M

Department of Computer Science, Harbin Institute of Technology. Her current research interests
include natural language processing, contradiction detection, deceptice spam review detection and
representation learning.
ED
PT
CE
AC

Bing Qin received her Ph.D. degree in 2005 from the Department of Computer Science, Harbin
Institute of Technology, Harbin, China. She is a Full Professor of Department of Computer Science,
and the Deputy Director of Research Center for Social Computing and Information Retrieval (HIT-
SCIR) from Harbin Institute of Technology. Her research interests include natural language
processing, information extraction, document-level discourse analysis, and sentiment analysis.
ACCEPTED MANUSCRIPT

T
IP
Ting Liu received his Ph.D. degree in 1998 from the Department of Computer Science, Harbin
Institute of Technology, Harbin, China. He is a Full Professor in the Department of Computer

CR
Science, and the Director of the Research Center for Social Computing and Information Retrieval
(HIT-SCIR) from Harbin Institute of Technology. His research interests include information
retrieval, natural language processing, and social media analysis.

US
AN
M
ED
PT
CE
AC

You might also like