2017 SoICT CrossDomainIntentionDetection

Uploaded by

nnghian21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

2017 SoICT CrossDomainIntentionDetection

Uploaded by

nnghian21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Cross-Domain Intention Detection in Discussion Forums

Ngo Xuan Bach Le Cong Linh Tu Minh Phuong

Department of Computer Department of Computer Department of Computer
Science, Posts and Science, Posts and Science, Posts and
Telecommunications Institute Telecommunications Institute Telecommunications Institute
of Technology, Vietnam of Technology, Vietnam of Technology, Vietnam
[email protected] [email protected] [email protected]

ABSTRACT been investigated on social media data, including part-of-

This paper presents a method for cross-domain intention de- speech tagging and named entity recognition [12, 24], senti-
tection, which identifies posts expressing user intentions in ment analysis [3, 19, 21], recommender systems [1, 17, 18],
discussion forums. The results of the task can be beneficial social network analysis [11], and intention analysis [8, 13,
for intelligent business such as online advertising and mar- 14, 16, 20].
keting. Our method utilizes labeled data from several other In this paper, we consider the task of intention detection in
domains to help the learning task in the target domain us- discussion forums. The goal of the task is to identify discus-
ing a Naive Bayesian framework. Since the distributions of sion posts expressing certain user intentions such as buying
data vary from domain to domain, it is important to adjust intentions, which can be exploited in online advertising and
the contributions of different data sources in the learning marketing. Based on the user intentions, suitable ads will
model to achieve high accuracy. Here, we propose to op- be displayed in social media platforms. We cast the task as
timize the parameters of the Naive Bayes classifier using a a binary classification problem, which classifies discussion
stochastic gradient descent (SGD) algorithm. Experimental posts into intention posts (positive class) or non-intention
results show that our method outperforms several competi- posts (negative class).
tive baselines on a benchmark dataset, consisting of forum Figure 1 shows examples of intention posts (Examples 1,
posts in four domains, namely Cellphone, Electronics, Cam- 2) and non-intention posts (Example 3, 4) from the dataset
era, and TV. of Chen et al. [8]. Sentences expressing user intentions are
displayed in bold format. Although forum posts are quite
long and complicated, only small part of posts indicates di-
Categories and Subject Descriptors rectly user intentions. Forum posts, moreover, usually con-
I.2 [Artificial Intelligence]: Natural Language Processing tain abbreviations, personal names, and typos. Such char-
acteristics make the classification very noisy.
To improve performance of classification problems, several
General Terms methods have been proposed, including domain adaptation
Algorithms, Experimentation, Languages and transfer learning [4, 15], semi-supervised learning [5, 26],
and weakly supervised learning [3], among others. While do-
main adaptation and transfer learning exploit labeled data
Keywords from one or more domains to train the classification model in
Cross-domain, discussion forums, intention detection, stochas- another domain, semi-supervised learning uses both labeled
tic gradient descent and unlabeled data to build the classification model. A typ-
ical scenario in semi-supervised learning is that we have a
small set of labeled data and a large amount of unlabeled
1. INTRODUCTION data. Weakly supervised learning is quite similar to semi-
Web 2.0 platforms such as blogs, wikis, social networks, supervised learning, but we have a training set with noisy
and internet forums have facilitated the generation of a huge labels.
volume of user-generated content. Such social media data We propose in this paper a new method for intention
have become an important and interesting source, which at- detection, which leverages labeled data in multi-source do-
tracts researchers in both natural language processing and mains to improve performance in the target domain. Specif-
data mining communities. Many research problems have ically, our method uses stochastic gradient descent (SGD) to
optimize the aggregation process of source and target data
in a Naive Bayesian framework. To reduce noisy information
in classification, we conduct feature selection based on in-
Permission to make digital or hard copies of all or part of this work for formation gain. We verify the effectiveness of the proposed
personal or classroom use is granted without fee provided that copies are method on a benchmark dataset, which contains four do-
not made or distributed for profit or commercial advantage and that copies mains i.e., Cellphone, Electronics, Camera, and TV. Exper-
bear this notice and the full citation on the first page. To copy otherwise, to imental results show that our method outperforms several
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
competitive baselines.
SoICT ’2017 Nha Trang, Vietnam The rest of this paper is organized as follows. Section 2
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
Figure 1: Examples of intention and non-intention posts.

discusses related work. Section 3 introduces our proposed of its content to words that are specific to intent categories.
method. Section 4 presents experiments on intention detec- Chen et al. [8] leverage labeled data from other domains to
tion on four different domains. Finally, Section 5 concludes train a classifier for the target domain by using domain adap-
the paper and discusses future work. tation techniques. They propose a co-training like algorithm
that alternates between two classifiers trained on source and
target domains to boost the final performance. Ding et al.
2. RELATED WORK [10] also report on successful application of domain adap-
A considerable amount of work has been done on intention tation by using convolutional neural networks with shared
detection and domain adaptation which we review in this middle layers, which were trained on labeled data from other
section. domains. A different approach has been proposed by Li et
Intention detection. Because of its importance in ad- al [16], in which they use Wikipedia as an external source
vertisement and targeted marketing, the task of identifying of knowledge and map microblog posts to Wikipedia con-
intention from social media such as tweets, forum posts, has cepts to classify them, thus reducing the need of having large
attracted substantial research interest in recent years. The training sets.
most straightforward approach is to formulate the task as Domain adaptation and transfer learning. Domain
a text categorization problem, which can be solved with adaptation and closely related transfer learning has been
supervised learning methods. Hollerit et al. [14] classify long studied in settings when we have labeled data from one
tweets into containing and not containing intents by train- or more domains and want to use the data to train classi-
ing support vector machines (SVMs) on words and part-of- fiers for another domain, but the distributions of features
speech n-grams extracted from tweets. Luong et al. [20] use and/or labels are different from domain to domain [2, 4,
maximum entropy with n-grams features to classify forum 23]. Blitzer et al. [4] were among the first to apply do-
and Facebook posts, written in Vietnamese, as intention- main adaptation to sentiment classification - a document
containing and normal ones. While simple and straightfor- categorization problem by nature. They train a classifier to
ward, supervised learning approaches require laborious an- predict the presence of domain dependent feature based on
notation of documents to create training data, which limit domain independent features, thus reducing the mismatch of
their applicability across various application domains. To features between different domains. Other approaches use
alleviate this problem, techniques that make use of unla- different types of transformation to project source and/or
beled data from the same domain or labeled data from other target features into a new feature space so that they have
domain have been applied to identify and classify intents. similar distributions in the new space (an overview is given
Wang et al. [25] use both labeled and unlabeled data to de- in [23]). Pan et al. [22] propose a so called transfer com-
tect intent-containing tweets and classify into six categories ponent analysis to optimize an objective function based on
with a graph-based semi-supervised learning method. Infor- the Maximum Mean Discrepancy principle. They reported
mally, their method classify a tweet based on the proximities good transfer effect over several applications. Bach et al. [2]
ŝ
propose a method that finds a new feature space by com- N−,w . We then compute the number of occurrences of w in
bining canonical correlation analysis with word embeddings. the documents of the positive (and negative) class in all the
They report improved performance on cross-domain senti- source domains and store them in a knowledge base:
ment classification task. Liu and colleagues use the name X ŝ
KB
“lifelong learning” for a group of methods that utilize data N+,w = N+,w
from other domain to support supervised or unsupervised ŝ
learning in the target domain [7]. Among the methods by
that group is the method that uses topics learnt from other KB
X ŝ
N−,w = N−,w
domain to improve topic modeling in the target domain [6]. ŝ
Chen et al. [9] also propose a lifelong learning method that
uses stochastic gradient descent to adjust the importance of 3.3 Classification with Naive Bayes
features from multiple source and target domains to gain
In classification step, the goal is to find a class label cj
maximum positive effect of feature transfer while reducing
given a sample d. Here, d is a document, i.e. a post in dis-
the negative effect (if any). They report improved perfor-
cussion forums, and cj is a label class indicating whether the
mance for sentiment classification using labeled data from
post expresses an intention (+) or not (−). Naive Bayesian
multiple source domains.
classification will find the class label that maximizes the con-
For document categorization in a new domain, we may
ditional probability P (cj |d).
have some labeled data and a lot of unlabeled data. This is
By using the Bayes’ theorem and the independence as-
a typical semi-supervised learning setting. A popular semi-
sumption, we have:
supervised learning method is co-training [5], which is used
by Chen et al. [8] in their intention detection method to
Q
P (d|cj )P (cj ) P (w|cj )P (cj )
combine labeled and unlabeled data from different domains. P (cj |d) = ≈ w (1)
P (d) P (d)
Graph-based learning is another popular semi-supervised
learning approach, which has also been applied to intent de- here the product is computed over all word w in d. Because
tection and classification as reported by Wang et al. [25]. In the denominator of the Equation (1) is independent from the
addition to semi-supervised, the setting when training data class label, it can be ignored in computation. Further more,
contain noisy labels, it is known as weakly supervised learn- P (cj ) can be estimated based on the frequency of label cj in
ing, an example of which is reported by Bach et al. [3], where the training data, we focus on the key parameters P (w|cj ),
they use noisy training data from ratings to augment labeled which are computed as follows:
data to predict sentiment polarity of new review posts.
Our method here is based on domain adaptation of exter- λ + Ncj ,w
nal data sources for improved intention detection in forum P (w|cj ) = P
λ|V | + v∈V Ncj ,v
posts.
where Ncj ,w is the frequency of word w in documents of class
3. PROPOSED METHOD cj . |V | is the size of vocabulary V and λ (0 ≤ λ ≤ 1) is used
for smoothing.
In this section, we present our method for cross-domain Recall that we consider the task in a cross-domain set-
intention detection in discussion forums. We consider the ting, Ncj ,w will be counted in the whole dataset, consisting
case that we have labeled data in both source domains and of labeled data in both source and target domains. A sim-
the target domain. The goal is to leverage labeled data in ple method to compute Ncj ,w is to sum up the counts in
multi-source domains to improve the performance of inten- multi-source domains, i.e. in the knowledge base, with the
tion detection in the target domain. empirical counts in the target domain as follows:
3.1 Method Overview KB
N+,w = N+,w t
+ N+,w
As illustrated in Figure 2, our method consists of three
modules: data aggregation, optimization, and classification. KB t
N−,w = N−,w + N−,w
• Data aggregation: This module extracts knowledge
where t denotes target domain and KB denotes source do-
from multi-source domains and stores it in a knowledge
mains in the knowledge base. This method, however, has
base.
two weaknesses.
• Optimization: This module utilizes knowledge from
the knowledge base to optimize key parameters, which • Past domains contain much more data than the target
will be used in the classification model. domain. Merged results, therefore, may be dominated
by the counts from the source domains.
• Classification: This module uses the optimized pa-
rameters to build the Naive Bayesian classification model. • The method does not consider domain-dependent words.
A word may be an indicator of intention (+) in the tar-
In the following, we describe these modules in detail. Clas- get domain but not (−) in source domains.
sification with Naive Bayes will be presented before the op-
timization section for clarity and readability purposes. To deal with such problems, we introduce a method to revise
these counts by optimizing two variables X+,w and X−,w ,
3.2 Aggregation the number of times that a word w appears in the positive
For each source domain ŝ, we count the number of times and negative class. In classification, we will use those virtual
ŝ
a word w appears in positive or negative class, N+,w and counts instead of empirical counts N+,w and N−,w .
Figure 2: A Method for cross-domain intention detection.

3.4 Optimization ∂G+,u ∂G+,u ∂G+,w ∂G+,w

Derivatives , for a word u and ,
∂X+,u ∂X−,u ∂X+,u ∂X−,u
Ideally, we expect P (cj |di ) = 1 if cj is the correct class
for a word w (different to u) can be computed as follows:
label of document di and P (cf |di ) = 0 for other labels. In
optimization, for each positive document di , i.e. a post with
P
∂G+,u (|V | + v∈V X+,v ) − (1 + X+,u )
an intention, we want the probability P (+|di ) as high as pos- =− P
∂X+,u (|V | + v∈V X+,v )2
sible and the probability P (−|di ) as low as possible. On the P
other hand, for a negative document we want the probabil- |V | + v∈V X+,v
×
ity P (−|di ) as high as possible and the probability P (+|di ) 1 + X+,u
as low as possible.
P
(|V | + v∈V X+,v ) − (1 + X+,u )
We define the objective function for each positive doc- =− P
(|V | + v∈V X+,v )(1 + X+,u )
ument (the process for a negative document is similar) as
follows: P
P (+|di ) ∂G+,w (1 + X+,w ) |V | + v∈V X+,v
Obj+,i = log = P ×
P (−|di ) ∂X+,u (|V | + v∈V X+,v )2 1 + X+,w
1
To maximize the objective function, we minimize the follow- = P
ing loss function. (|V | + v∈V X+,v )

F+,i = −Obj+,i = logP (−|di ) − logP (+|di ) (2) P

∂G+,u (|V | + v∈V X−,v ) − (1 + X−,u )
We define G+,w for a word w as follows: = P
∂X−,u (|V | + v∈V X−,v )2
G+,w = logP (w | −) − logP (w | +)
P
|V | + v∈V X−,v
×
We have: 1 + X−,u
P
Y (|V | + v∈V X−,v ) − (1 + X−,u )
F+,i = log (P (−) P (w | −)nw,di ) = P
w∈di
(|V | + v∈V X−,v )(1 + X−,u )
Y
− log (P (+) P (w | +)nw,di ) P
∂G+,w −(1 + X−,w ) |V | + v∈V X−,v
w∈di = P ×
X ∂X−,u (|V | + v∈V X−,v )2 1 + X−,w
= log P (−) − log P (+) + (nw,di × G+,w )
−1
w∈di = P
(|V | + v∈V X−,v )
= log P (−) − log P (+) + nu,di × G+,u
X Therefore, derivatives of F+,i can be computed as follows:
+ (nw,di × G+,w )
w∈di ,w6=u ∂F+,i ∂G+,u X ∂G+,w
= nu,di × + (nw,di × )
where nw,di denotes the number of times that word w ap- ∂X+,u ∂X+,u ∂X+,u
w∈di ,w6=u
pears in document di , P (+) is the prior probability of a doc- nu,di n
ument occurring in positive class (P (−) is similar), P (w | +) =− + Pu,di
1 + X+,u |V | + v∈V X+,v
is the conditional probability of a word w occurring in a doc- n
Pw,di
X
ument of positive class (P (w | −) is similar). Replace +
|V | + v∈V X+,v
w∈di ,w6=u
λ + X+,w
P (w | +) = P
λ|V | + v∈V X+,v
∂F+,i ∂G+,u X ∂G+,w
= nu,di × + (nw,di × )
λ + X−,w ∂X−,u ∂X−,u ∂X−,u
w∈di ,w6=u
P (w | −) = P
λ|V | + v∈V X−,v nu,di n
= − Pu,di
and λ = 1, we have: 1 + X−,u |V | + v∈V X−,v
n
Pw,di
X
1 + X−,w 1 + X+,w −
G+,w = log − log |V | + v∈V X−,v
w∈di ,w6=u
P P
|V | + v∈V X−,v |V | + v∈V X+,v
Stochastic gradient descent (SGD) is used to minimize the
loss function (Equation (2)) by the following update rules, Table 1: Statistical information of the dataset
where γ is the learning rate1 and l denotes the iteration:
Domain #intention #non-intention #total posts
l l−1 ∂F+,i Cellphone 184 816 1000
X+,u = X+,u −γ
∂X+,u Electronics 280 720 1000
Camera 282 718 1000
l l−1 ∂F+,i TV 263 737 1000
X−,u = X−,u −γ
∂X−,u
0 0
X+,w = N+,w and X−,w = N−,w are served as the starting
point for SGD. In experiments, we considered one domain as the target
domain and three others as source domains. We randomly
3.5 Feature Selection divided the target domain data into 10 subsets and con-
In intention-containing posts, only one or two sentences ducted 10-fold cross-validation. For example, if we consider
express the intention, while most sentences are not related Cellphone as the target domain, the training data consist of
directly to the intention. Therefore, feature selection is very 9/10 data of Cellphone and all data of Electronics, Camera,
important for the task to reduce noisy information. Similar and TV, while the remaining Cellphone data are used as
to the work of Chen et al. [8], we select top ranked features the test set. The performance of the system was measured
according to their Information Gain (IG) values, which can using precision, recall, and the F1 score. Let A and B be
be computed as follows: the set of posts that the system predicted as intention posts
m
X and the set of actual intention posts respectively, precision,
IG(f ) = − P (ci ) log P (ci ) recall and F1 score are defined as follows:
i=1 |A ∩ B|
m P recision = ,
X X |A|
+ P (f ) P (ci | f ) log P (ci | f )
f,f i=1
|A ∩ B|
Recall = ,
Because − m
P
i=1 P (ci )logP (ci ) is independent of every fea- |B|
ture f , we can rank features using the following function:
m 2 ∗ P recision ∗ Recall
X X F1 = .
IG2 (f ) = P (f ) P (ci | f ) log P (ci | f ) P recision + Recall
f,f i=1
4.2 Methods to Compare
In our case, the set of labels contains positive (+) and neg- We compared the proposed method with several baselines
ative (-) labels. Therefore: and state-of-the-art method. Note that all methods used
Naive Bayes (NB) as the classifier6 .
IG2 (f ) = P (f )P (+ | f ) log P (+ | f )
+ P (f )P (− | f ) log P (− | f ) • Baseline1: This method used 9/10 data in the target
+ P (f )P (+ | f ) log P (+ | f ) domain as training data and the remaining data as test
data. We did not use data from source domains in the
+ P (f )P (− | f ) log P (− | f ) training process. The purpose of this experiment is to
investigate the task in an in-domain setting, without
where P (+|f ) and P (−|f ) are probabilities that a post has
any help from labeled data in source domains.
positive (negative) label when it contains feature f . P (+|f )
and P (−|f ) are probabilities that a post has positive (neg- • Baseline2: This model trained the classification sys-
ative) label when it does not contain feature f . tem using three source domains and tested on the tar-
get domain. It did not use data of the target domain
4. EXPERIMENTS for training. The purpose of this experiment is to in-
vestigate the task in a cross-domain setting, without
4.1 Experimental Setup any labeled data in the target domain.
We conducted experiments using the dataset for inten- • Co-Class: Co-Class is a semi-supervised method for
tion detection introduced by Chen et al. [8]. The dataset intention detection proposed by Chen et al. [8], which
consists of posts retrieved from different forum discussion was reported to achieve state-of-the-art performance
sites, which belong to four domains: Cellphone2 , Electron- on intention detection. The method borrows the idea
ics3 , Camera4 , and TV5 . For each domain, there are 1000 la- of Co-Training by using two classifiers in an alternat-
beled posts. The distributions of intention and non-intention ing fashion. One classifier is trained on labeled data
posts for each domain are shown in Table 1. from source domains, and the other one is trained on
1
In experiments, we set γ = 0.01. unlabeled data in the target domain with labels pre-
2
Cellphone: http://www.howardforums.com/forums.php dicted by the first classifier. The final classification is
3
Electronics: http://www.avsforum.com/avs-vb/ 6
As shown in Chen et al. [8], Naive Bayes is the suitable
4
Camera: http://forum.digitalcamerareview.com/ method for the task of intention detection in discussion fo-
5
TV: http://www.avforums.com/forums/tvs/ rums.
Table 3: Experimental results on Cellphone domain Table 5: Experimental results on Camera domain
Model Precision(%) Recall(%) F1 (%) Model Precision(%) Recall(%) F1 (%)
Baseline1 69.04 56.21 61.97 Baseline1 70.49 86.88 77.83
Baseline2 50.81 67.93 58.14 Baseline2 86.81 72.34 78.92
Co-Class 54.74 69.02 61.06 Co-Class 87.34 73.40 79.77
Combined 58.94 75.76 66.30 Combined 84.56 75.88 79.99
Our method 66.66 69.29 67.95 Our method 84.56 75.88 79.99

Table 4: Experimental results on Electronics domain Table 6: Experimental results on TV domain

Model Precision(%) Recall(%) F1 (%) Model Precision(%) Recall(%) F1 (%)
Baseline1 65.02 80.36 71.88 Baseline1 78.85 81.39 80.10
Baseline2 62.37 86.43 72.45 Baseline2 64.03 89.35 74.60
Co-Class 63.06 85.36 72.53 Co-Class 73.57 87.83 80.07
Combined 69.09 80.71 74.45 Combined 73.76 90.85 81.42
Our method 73.78 76.79 75.25 Our method 78.21 88.97 83.24

based on both classifiers using a bootstrapping tech- Baseline1 and Baseline2. A somewhat surprising observation
nique. An important point is that both classifiers use is that a simple combination of labeled data from source and
the same feature set selected from the target data. target domain as training data (Combined method) achieved
better results than more sophisticated Co-class in all four
• Combined: This model used labeled data in three cases. A possible explanation for the superiority of Com-
source domains and 9/10 data in the target domain bined over Co-class is that the former uses some labeled
to train a Naive Bayesian classification model. The data of the target domain, and that data is important for
purpose of this experiment is to investigate the perfor- classification. Combined method is also consistently better
mance of the detection system when we combine source than both Baseline1 and Baseline2. These results render
and target domains without domain adaptation. simple Combined as a very competitive method.
From four domains, Camera is the least sensitive to meth-
• Our method: Our method is similar to the Combined
ods used. For this domain, the last three methods achieved
model but using an optimization technique for domain
nearly the same F1 scores, while the worst performing method
adaptation. In effect, the method combines source and
is behind by only 2%.
target domain labeled data based on their contribution
In all cases, except for insensitive Camera domain, our
to final classification accuracy.
proposed method consistently outperformed the other ex-
perimented methods, achieving the most accurate results in
We summarize the experimented methods in Table 2. Note
terms of F1 scores. The differences in F1 score between our
that the first three methods, i.e. Baseline1, Baseline2, and
and the second best method, namely Combined, is almost
Co-Class were run with the same settings as described by
2% for Cellphone and TV and 0.8% for Electronics. All
Chen et al. [8]. For Combined and Our method, we selected
differences are statistically significant, according to a t-test
2500 features on the target domain and 1500 features on
with the threshold of 0.05. Since both Combined and our
each source domain. We conducted 10-fold cross-validation
method use similar training and test data, we believe the
with models that used part of data in the target domain in
improvement of the later method comes from optimization
the training processes i.e. Baseline1, Combined, and Our
we have performed to calculate the posterior probability of
method.
the Naive Bayes classifier, which is our main contribution in
4.3 Results this paper.
Error Analysis. We now analyze some cases in which
Tables 3, 4, 5, 6 summarize experimental results on four
our system made a mistake. We divide errors into two main
domains. In each table, we show Precision, Recall, and F1
types: False positives (a non-intent post was predicted as
scores, averaged over 10 folds for one target domain. Note
intent) and False negatives (an intent post was predicted as
that F1 score is the most important metrics as it balances
non-intent). For each type, we list several typical examples.
Precision and Recall. As can be seen, there is no clear winner
between Baseline1 and Baseline2. Each method achieved 1. False positives. These posts usually contain inten-
higher F1 scores in two domains and lower scores in the tion description words, but the meaning does not in-
others. These results suggest that using lots of training data clude purchase intention. Another case is that the post
from other domains may be more or less useful than using is an experience sharing. The author had bought the
fewer training data from the same domain, depending on product when he posted. Here are some examples.
specific cases.
Co-class is comparable or slightly better than either of two • Who is looking to buy a camera as a semi proffes-
baselines. Specifically, co-class achieved F1 scores that are sional camera mix of SLR and normal digital cam-
comparable with those of Baseline1 in Cellphone and TV era and easy to learn I advice to buy Canon SX1
and higher scores in Electronics and Camera, beating both IS as HD viedo, 2.8 LCD Rotate, 20X zoom and
Table 2: Methods to compare
Model Training data Test data Exp method Learning algorithm
Baseline1 9/10 target 1/10 target cross-validation NB
Baseline2 3 sources target One time NB
Co-Class 3 sources target One time NB, bootstrapping
Combined 3 sources, 9/10 target 1/10 target cross-validation NB
Our method 3 sources, 9/10 target 1/10 target cross-validation NB, optimization

other great options, really a nice camera. I been 5. CONCLUSION

used before Nikon P80 but when I bought Canon We have presented in this paper a method for identifying
SX1 IS there is no compare between both, with dif- intention posts in discussion forums. Our method leverages
ferent option’s I prefer who is looking to buy a data from multi-source domains to improve the performance
camera try with this. (CAMERA domain) in the target domain. To combine data from source and
• Can the Pio 436 be ISF calibrated from new or do target domains efficiently, we have proposed an SGD-based
you really need to run it in? Also who do you con- method to optimize key parameters of a Naive Bayesian clas-
tact to calibrate it and how much does it cost? sification model. The method has been shown to be effective
(TV domain) for intention detection on a benchmark dataset consisting of
four different domains.
• Just sharing my experience. My samsung 3D In the future, we would like to investigate the method for
PS50C680G5KXXU has died after 6 months-sound, other tasks in natural language processing and data mining.
no picture, responds to remote, meaning the Y- Empirical studies on Vietnamese language datasets are also
SUS or Z-sus boards have blown. Dixons has asked an interesting research direction.
to talk to Samsung, and Samsung to Dixons -
Would obviously get repaired until 1 year is over.
Âč1,000 3D TV for 1 year, with no hope after-
6. REFERENCES
AVOID, and buy a Sony or Panasonic - Most [1] N. X. Bach, N. D. Hai, and T. M. Phuong.
importantly, buy with extended warranty!!! (TV Personalized recommendation of stories for
domain) commenting in forum-based social media. Information
Sciences, 352–353:48–60, 2016.
• I have a LG Env Touch and it was activated last [2] N. X. Bach, V. T. Hai, and T. M. Phuong.
year. I’m planning to change to the Family Cross-domain sentiment classification with word
Plan from the Individual Plan so I can add my embeddings and canonical correlation analysis. In
wife. Will the change require the $9.99 monthly Proceedings of the 7th International Symposium on
data plan for my phone? (CELLPHONE domain) Information and Communication Technology (SoICT),
• Are you going to buy the new grand theft auto? pages 159–166, 2016.
I rent most of my games and once i beat them I [3] N. X. Bach and T. M. Phuong. Leveraging user
am done. I think this is how this game will be. ratings for resource-poor sentiment classification. In
What about online will it be worth it? I can keep Proceedings of the 19th International Conference on
the game rental for a month I think id rather do Knowledge-Based and Intelligent Information &
that. (ELECTRONICS domain) Engineering Systems (KES), pages 322–331, 2015.
[4] J. Blitzer, M. Dredze, and F. Pereira. Biographies,
bollywood, boom-boxes and blenders: Domain
2. False negatives. These posts are usually general de-
adaptation for sentiment classification. In Proceedings
scriptions or lack of important words indicating a buy-
of the Annual Meeting of the Association for
ing action. Therefore, the purchase intention is un-
Computational Linguistics (ACL), pages 440–447,
clear. Here are some examples.
2007.
[5] A. Blum and T. Mitchell. Combining labeled and
• does anyone know of any that are specifically made unlabeled data with co-training. In Proceedings of the
for cleaning plasma’s?? I am told the pc screen eleventh annual conference on Computational learning
cleaning cloths should not be used - right or wrong?? theory (COLT), 1998.
your assistance appreciated thanks (TV domain) [6] Z. Chen and B. Liu. Topic modeling using topics from
• HI folks, can some tell me if there is a better soft- many domains, lifelong learning and big data. In
ware to use than the one that comes with the cam- Proceedings of the 31st International Conference on
era Cannon SD1200 IS (CAMERA domain) Machine Learning (ICML), 2014.
[7] Z. Chen and B. Liu. Lifelong Machine Learning.
• I am having trouble updating my new Samsung Morgan and Claypool, 2017.
BDP-1500 player? Can someone talk me through [8] Z. Chen, B. Liu, M. Hsu, M. Castellanos, and
what I do to get the latest firmware and then in- R. Ghosh. Identifying intention posts in discussion
stall it onto the player?? PLEASE HELP!!! (ELEC- forums. In Proceedings of the 2013 Conference of the
TRONICS domain) North American Chapter of the Association for
Computational Linguistics: Human Language recommendation system. In Proceedings of the
Technologies (NAACL-HLT), pages 1041–1050, 2013. Thirty–fourth Annual International ACM SIGIR
[9] Z. Chen, N. Ma, and B. Liu. Lifelong learning for Conference on Research and Development in
sentiment classification. In Proceedings of the Annual Information Retrieval, pages 125–134, 2011.
Meeting of the Association for Computational [18] Q. Li, J. Wang, Y. Chen, and Z. Lin. User comments
Linguistics (ACL), pages 750–756, 2015. for news recommendation in forum-based social
[10] X. Ding, T. Liu, J. Duan, and J.-Y. Nie. Mining user media. Information Sciences, 180(24):4929–4939, 2010.
consumption intention from social media using domain [19] B. Liu. Sentiment Analysis and Opinion Mining:
adaptive convolutional neural network. In Proceedings Synthesis lectures on human languages technologies.
of the Twenty-Ninth AAAI Conference on Artificial Morgan and Claypool, 2012.
Intelligence, pages 2389–2395, 2015. [20] T.-L. Luong, T.-H. Tran, Q.-T. Truong, T.-M.-N.
[11] D. Easley and J. Kleinberg. Networks, Crowds, and Truong, T.-T. Phi, and X.-H. Phan. Learning to filter
Markets: Reasoning About a Highly Connected World. user explicit intents in online vietnamese social media
Cambridge University Press, 2010. texts. In Proceedings of the Asian Conference on
[12] K. Gimpel, N. Schneider, B. O’Connor, D. Das, Intelligent Information and Database Systems
D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, (ACIIDS), pages 13–24, 2016.
J. Flanigan, and N. Smith. Part-of-speech tagging for [21] P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, and
twitter: annotation, features, and experiments. In V. Stoyanov. Semeval-2016 task 4: Sentiment analysis
Proceedings of the Annual Meeting of the Association in twitter. In Proceedings of SemEval-2016, pages
for Computational Linguistics (ACL), pages 42–47, 1–18, 2016.
2011. [22] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang.
[13] M. Hamrouna, M. S. Gouider, and L. B. Said. Large Domain adaptation via transfer component analysis.
scale microblogging intentions analysis with pattern IEEE Transactions on Neural Networks,
based approach. In Proceedings of International 22(2):199–210, 2011.
Conference on Knowledge Based and Intelligent [23] S. J. Pan and Q. Yang. A survey on transfer learning.
Information and Engineering Systems (KES), pages IEEE Transactions on Knowledge and Data
1249–1257, 2016. Engineering, 22(10):1345–1359, 2010.
[14] B. Hollerit, M. Kroll, and M. Strohmaier. Towards [24] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named
linking buyers and sellers: Detecting commercial entity recognition in tweets: An experimental study.
intent on twitter. In Proceedings of the World Wide In Proceedings of the 2011 Conference on Empirical
Web Conference (WWW), pages 629–632, 2013. Methods in Natural Language Processing, pages
[15] J. Jiang. A literature survey on domain adaptation of 1524–1534, 2011.
statistical classifiers. Technical report, University of [25] J. Wang, G. Cong, W. X. Zhao, and X. Li. Mining
Illinois Urbana-Champaign, 2008. user intents in twitter: A semi-supervised approach to
[16] C. X. Li, Y. J. Du, J. Liu, H. Zheng, and S. D. Wang. inferring intent categories for tweets. In Proceedings of
A novel approach of identifying user intents in the Twenty-Ninth AAAI Conference on Artificial
microblog. In Proceedings of International Conference Intelligence, pages 339–345, 2015.
on Intelligent Computing (ICIC), pages 391–400, 2016. [26] X. Zhu. Semi-supervised learning literature survey.
[17] L. Li, D. Wang, T. Li, D. Knox, and B. Padmanabhan. Technical report, University of Wisconsin - Madison,
Scene: A scalable two-stage personalized news 2008.