Integrated Syntactic and Semantic Tree For Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network
Integrated Syntactic and Semantic Tree For Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network
2329-9290 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1110 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1111
to the noise interference of words in sentences. In light of this, However, it not only takes a lot of time to parse the sentence with
Fan et al. [13] introduced a multi-grained attention network to multiple parsers but also causes the problem of error accumu-
capture the semantic interaction information of different gran- lations of multiple parsers. Furthermore, Li et al. [54] proposed
ularities to correct the sentiment polarity of the target mention. a DualGCN structure to obtain the feature representation of the
Zeng et al. [14] explored the local context focus mechanism, target by designing two GCNs that extract syntactic and semantic
leveraging the local context words that were close to the target information respectively. However, it still achieved limited per-
to model the semantic relations and eliminate the interference formance in reducing the paring errors of the dependency tree,
caused by distant irrelevant opinion words. Lin et al. [15] de- which accordingly brought more redundancy and complexity to
signed a selective attention mechanism to make full use of the the model. Similarly, Dai et al. [55] employed the syntax-guided
inter-target information. relationship provided by the pre-trained BERT and RoBERTa to
Similar to the attention mechanisms, memory networks are build a dependency tree, whereas the pre-trained model would
also employed to model the interaction between the target men- receive interference from the initial learning data and provide
tion and opinion words. Chen et al. [32] combined memory limited semantic information. Sun et al. [56] built complex
networks and attention mechanisms to obtain the long-term con- three-graph structures to capture the interaction between doc-
textual information. Wang et al. [33] proposed target-sensitive uments for document-level relationship extraction, which was
memory networks to preserve the contextual information that not suitable for sentiment analysis. However, we only used the
was relevant to the target. However, the contextual information concise IS2 tree to capture the structural information, semantic
of the sentence could be forgotten with the stacking of the information, and syntactic information of the sentence at the
memory networks. same time, and it was suitable for sentiment analysis, which
In addition, BERT [34], ALBERT [35], and other large-scale could capture the interaction between the target and the context
pre-trained language models were published to further improve words. Lan et al. [57] proposed to use syntactic information and
the performance of NLP tasks and targeted sentiment classifica- dependency information between target mentions. However, in
tion tasks. Since the knowledge provided by BERT is beneficial syntactic-insensitive scenarios, the lack of semantic information
for downstream tasks [36], we combine it with the contextual in the syntactic dependency tree would lead to wrong word in-
encoder in our model to embed the sentence. teractions, so it could cause error propagation when the sentence
contains multiple target mentions. Dai et al. [58] used GCN to
learn the semantic information in the syntactic structure of the
B. Syntactic-Based Methods sentence, but it was based on the original syntactic dependency
Methods based on syntactic structures leverage the syntac- tree without the semantic relationship, which not only intro-
tic knowledge of sentences to model and obtain the feature duced two redundant GCNs to obtain the semantic and syntactic
representations of targets. Early studies [37], [38] exploited information respectively but also increased the parameters of the
manually-extracted syntactic rules. Subsequently, the external adjacency matrix. Moreover, Chen et al. [43] combined syntactic
parser and neural network-based approaches were introduced dependency trees with latent graph structures to obtain the
into the targeted sentiment classification task. Dong et al. [39] target representation. Zhang et al. [44] designed a hierarchical
used an adaptive recursive neural network to encode targets into syntactic and lexical graph based on the syntactic dependency
syntactic dependency trees. Furthermore, Nguyen et al. [40] tree, leveraging the co-occurrence characteristics of words in a
developed phase-level recursive neural networks, employing sentence to improve the classification performance. However,
both the dependency tree and combination tree as the model the syntactic dependency tree provided by external parsers
input to encode the syntactic knowledge of the target mention may negatively affect the model performance because of the
and contexts simultaneously. He et al. [41] established the con- inaccurate dependency parsing and the informal expressions in
nections between the target and contexts based on the syntactic syntax-insensitive scenarios.
relative distance between them in the syntactic dependency tree Recent research found that semantic relations between target
to distinguish the importance of contexts. mentions and contexts could be exploited to enhance the encod-
To further improve the ability to process graph-structure data ing of syntax-insensitive and informal reviews [23], [24], [25].
(e.g., dependency trees), recent studies introduce graph neu- Zhang et al. [19] proposed aspect-specific GCN to introduce
ral networks [20], [21] to model the relations between targets the semantic attention based on target mentions to allocate the
and contexts in dependency trees into adjacency matrices. Sun attention weights of opinion words and to obtain the represen-
et al. [18] proposed to leverage GCN for encoding syntactic tations of syntactic structures. Similarly, Tang et al. [26] devel-
features of sentences to exchange contextual information in oped a graph-dependent enhanced double transformer network
the model. Wang et al. [42] reconstructed the target-specific to combine the semantic representations of contexts and the
dependency tree of the sentence to emphasize the importance of syntactic structures extracted by GCN. Bai et al. [22] designed
the contextual information adjacent to the target mention. Fur- an attention mechanism using syntactic dependency labels and
thermore, recent studies combine several different types of graph proved the importance of fine-grained dependency labels in im-
neural networks to perform targeted sentiment analysis tasks. proving classification performance. However, existing methods
For example, Hou et al. [53] designed an ensemble learning based on attention mechanisms fail to address the interference
method to fuse syntactic dependency trees obtained by different caused by irrelevant opinion words with the opposite sentiment
parsers to enhance the syntactic representation of the target. polarity.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1112 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
Fig. 2. (a) A syntactic dependency tree of an example sentence with two targets of opposite sentiment polarities. The color of each word represents the attention
weight. (b) A syntactic dependency tree of an example sentence with one target mention.
In contrast to existing targeted sentiment classification meth- and they have the identical syntactic relative distance (both
ods, we propose a novel IS2 tree to integrate semantic depen- 2-hops) with target “falafel”, but only “dried” is related to
dency labels onto the syntactic dependency tree provided by “falafel”. Fortunately, the same dependency label “conj” can be
parsers. In this way, the IS2 tree can address the inaccurate distinguished according to the semantic similarity with different
dependency parsing by encoding the semantic dependency la- targets “falafel” and “chicken”. In addition, important opinion
bels. Meanwhile, the dependency relations between the target words are often neglected by the syntactic relative distance in
mention and contexts can be enhanced by the target-specific the dependency tree and not directly associated with the target.
reshaping of IS2 trees. Furthermore, we propose a DCGCN As shown in Fig. 2(b), although “hypes” is not rooted in target
to couple the syntactic and semantic relations in IS2 tree in “average cake”, it plays a decisive role in predicting the senti-
parallel while implementing network updates at each layer based ment polarity. These results indicate that the noise interference
on the dual-channel mechanism. In this way, only a single of words in a sentence cannot be overcome by the syntactic
GCN network is employed to couple syntactic and semantic dependency tree with attention mechanisms and the necessity
information in the case of insensitive syntax, which greatly of enhancing the dependency relations between the target and
reduces the redundancy of the model. Considering the impact of relevant opinion words.
some important fine-grained dependency labels on classification
performance [22], a dynamic pruning mechanism is proposed to
obtain the contextual information that is relevant to the target. B. Labels of Semantic Dependency Relations
In contrast to previous research [42], [43], we propose to in-
III. INTEGRATED SYNTACTIC AND SEMANTIC TREE tegrate semantic information with the syntactic structures using
the IS2 tree to avoid inaccurate dependency parsing and enhance
A. Limitations of Syntactic Dependency Trees
the dependency relations between the target and contexts. For
Generally, the syntactic structure of sentences is obtained each input sentence, a parser is employed to obtain its syntactic
from an external parser. The relationship between words is dependency tree T , and rij represents the syntactic dependency
represented as a directed edge between a pair of nodes and label between word i and j. As shown in Algorithm 1, there are
a specific dependency label. Existing studies can leverage the two main steps to establish an IS2 tree: reshaping (lines 4–6)
syntactic information (e.g., syntactic dependency label [22], and labeling (lines 7–9).
syntactic relative distance [19], [42]) in the dependency tree Reshaping The target mention in a sentence serves as the
provided by an parser to design attention mechanisms. How- root of an IS2 tree. Specially, a multiple-word target mention
ever, due to the informal and complex reviews, these attention- should be regarded as a whole, whose internal dependency label
based methods may not work well and even lead to classi- can be ignored [42]. In addition, the other nodes in a syntactic
fication mistakes. We illustrate two possible defects of the dependency are reshaped as leaf nodes of the IS2 tree, and
syntactic dependency tree and methods based on attention their original syntactic dependency labels rij are preserved. In
mechanisms. this way, the dependency relations between the target mention
As shown in Fig. 2(a), the score under each word represents and contexts in the original syntactic dependency tree can be
the attention weight obtained by the attention-based LSTM [11]. enhanced.
For target “chicken”, the model mistakenly assigns high atten- Labeling The semantic dependency relations between other
tion weights to “but” and “dried”, because they are close to context words and the target mention in sentence s are labeled
the target. Even given the syntactic knowledge, the syntactic as Sim : wjs , which represents the semantic similarity between
dependency labels for “fine” and “dried” are both “conj” syntactic dependency label rij of context word wjs and the
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1113
IV. METHODOLOGY
2
An IS tree contains both syntactic and semantic relations,
and therefore we leverage DCGCN to couple the syntactic and
semantic dependency relations of the IS2 tree and dynamic
pruning to select the contextual information relevant to the target
mention. Formally, we denote each sample in the dataset as a
triplet T, S, G, where T = {wi , wi+1 , . . ., wi+k−1 } is a target
mention sequence and S = {w1 , w2 , . . ., wi , . . ., wi+k , . . ., wn }
is a sentence sequence. The lengths of T and S are k and n,
respectively. G = (V, A, R) represents a graph, i.e., the IS2 tree
of sentence S, where V represents the set of all nodes (words)
in the graph, A ∈ Rn×n is an adjacent matrix, and Rij denotes
the dependency label (i.e., rij or Sim : wjs ) between word i and
Fig. 3. Structure of an IS2 tree reshaped and labeled from a syntactic depen- j in the IS2 tree when Aij = 1, otherwise Rij = N one.
dency tree. The overall architecture of our proposed DCGCN model is
shown in Fig. 4, which is mainly composed of four compo-
nents: the contextual encoder, DCGCN encoder, gate unit, and
Algorithm 1: Integrated Syntactic and Semantic Tree. classifier. Specifically, the contextual encoder (e.g., GloVe [59],
BERT) is responsible for vectorizing the input sentences and
Input:Target mention a = {wia , wi+1 a a
, . . ., wi+m }, obtaining the contextual information. The DCGCN encoder
sentence s = {w1 , w2 , . . ., wn }, syntactic dependency
s s s
processes both the syntactic and semantic dependency relations
tree T , and syntactic dependency labels r. in the IS2 tree and selects the contextual information using
Output:Integrated syntactic and semantic tree T̃ . a dynamic pruning mechanism (DPM) based on the semantic
1: Construct the center of the target a as the root R̃ for T̃ ; dependency labels. Finally, the output of the contextual encoder
2: for i → m do and the output of the DCGCN encoder are dynamically aggre-
3: for j = 1 → n do gated by the gate unit and sent to the classifier to predict the
4: if wjs →rji wia or wjs ←rij wia then sentiment polarity (i.e., positive, neutral, negative). The four
5: preserving the original syntactic structures and components are detailed as follows.
dependency labels rij by wjs ←rij R̃
6: else
7: labeling the semantic relations as A. Contextual Encoder
zij = Sim : wjs
In this paper, we consider two sentence encoding structures:
8: embedding semantic dependency labels into the
the first is the bidirectional long short-term memory (Bi-LSTM)
IS2 tree by wjs ←zij R̃
network, which is widely used for contextual encoding of input
9: end if
sentences in sentiment analysis tasks [14], [19]; the second is
10: end for
a large-scale pre-training model named BERT [34]. Previous
11: end for
studies [36] have proved that BERT can significantly improve
12: return T̃
the classification performance of sentiment analysis tasks. To
compare our proposed DCGCN model with baseline models,
we also employ BERT as the contextual encoding layer.
Bi-LSTM Encoder models the bi-directional contextual infor-
target mention i. The purpose of introducing a new dependency
mation of sentences. We employ GloVe [59] embeddings vi ∈
label Sim : wjs is to avoid the incorrect parsing of syntactic
Rdv , lexical tags ti ∈ Rdt , and position embeddings pi ∈ Rdp ,
dependency trees in syntax-insensitive scenarios.
where dv , dt , and dp denote the dimension of the word, the
The structure of an IS2 tree reshaped from a syntactic de-
lexical tag, and position embeddings, respectively. Therefore,
pendency tree is illustrated in Fig. 3. The IS2 tree contains not
the representation of word wi in sentence S can be denoted by
only the syntactic dependency relations but also the semantic
a concatenation of vi , ti and pi as ei = [vi ; ti ; pi ].
information between the target mention and context words us-
Given a word embedding sequence e = {e1 , e2 , . . ., en }, the
ing the semantic dependency labels. If the sentence contains −−−−→ →
−
more than one target mention, we construct a unique tree for forward LST M is leveraged to generate hidden state h =
−
→ − → −
→ ←−−−−
each target mention. In summary, our proposed IS2 structure {h1 , h2 , . . ., hn } and the backward LST M to generate hidden
←− ←− ← − ←
−
has at least two advantages. First, the semantic information is state h = {h1 , h2 , . . ., hn }. Finally, the output hidden state
integrated into the IS2 tree to overcome inaccurate dependency vector H can be obtained as:
parsing in syntax-insensitive scenarios. Second, the dependency
between target mentions and opinion words in the IS2 tree can be −
→ −−−−→
h = LST M (ec ), c ∈ [1, n] (1)
enhanced by using the semantic dependency labels to represent
←
− ←−−−−
the semantic similarity. h = LST M (ec ), c ∈ [1, n] (2)
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1114 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
Fig. 4. Flowchart of our proposed DCGCN for targeted sentiment classification. The Bi-LSTM encoder can be replaced by BERT. The right half shows the details
of the DPM, where the top K=2 context words with the largest attention weights are selected for aggregation in each iteration based on semantic dependency labels
Sim : wjs .
→ ←
− − on graph data structures [13]. GCN can directly perform con-
H = [h; h] (3)
volutional operations on adjacent nodes to encode information,
BERT Encoder is a pre-trained language model based on the which is then passed through a multi-layer neural network so that
transformer mechanisms. The introduction of BERT in targeted each node in the graph can learn more contextual information.
sentiment analysis tasks can significantly improve the classi- Given a graph with n nodes, its adjacency matrix A ∈ Rn×n as
fication performance [30]. To compare the proposed DCGCN the discretized output of a dependency parser and each word with
model with baselines, we also employ BERT to generate the an adjacency node j ∈ N (i), GCN updates the representations
contextual embeddings of words. To fine-tune the BERT model, of words with the multi-head attention mechanisms [31] at the
each sentence sequence is reconstructed into “[CLS] + target lth layer by:
mention + [SEP] + sentence + [SEP]” as: ⎛ ⎞
e = {e0 , e1 , . . ., ek , ek+1 , ek+2 , . . ., ek+1+n , ek+2+n } (4) Hil = ||M
m=1 σ
⎝ Alm
ij W
lm l−1 ⎠
Hj (6)
j∈N (i)
where, e0 and ek+1 are the vector representations of “[CLS]”
and “[SEP]”, respectively. The output of BERT is a different 1 , if j ∈ N (i)
ij =
Alm
0 , if j∈/ N (i)
(7)
sequence of words with the same length as the input:
h = {h0 , h1 , . . ., hk , hk+1 , hk+2 , . . ., hk+1+n , hk+2+n } (5) where || represents the concatenation of m vectors, W lm ∈
R M ×d is the parameter matrix in layer l, N (i) refers to the
d
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1115
a DPM in the semantic channels of DCGCN to acquire the Algorithm 2: DCGCN with Dynamic Pruning Mechanism.
opinion words relevant to the target mention based on semantic 1: Encode the input sentence H
dependency labels in the IS2 tree, as shown in Fig. 4. In this way, 2: for l = 1 → L do
only the contextual information related to the target mention is 3: for j ∈ N (i) do
encoded. 4: map syntactic dependency label rij into vector γij
Syntactic Channel: Specifically, syntactic dependency re- 5: lm
calculate syntactic attention weight βij by
lations rij are directly mapped into vectors γij ∈ Rdr . The (9)–(10)
syntactic information is extracted from the two-layer neural 6: obtain syntactic representation hl+1
Syni by (8)
networks and the contexts located in the neighbors of the target 7: end for
are calculated to update the syntactic representation hl+1 Syni of 8: for ĵ ∈ D(i) do
the target mention i by 9: calculate semantic attention weight ϕlz
ij by
(12)–(13)
hl+1
Syni = M
m=1
lm
βij Wzl hlj (8)
10: sum semantic attention ϕlij by (14) and select the
j∈N (i)
top K attention to update the set D(i) by (15)
lm
gij = σ(relu(γij Wr1 + br1 )Wr2 + br2 ) (9) 11: obtain semantic representation hl+1
Simi by (11)
12: end for
lm
exp(gij ) 13: calculate output feature H T by concatenating
lm
βij = N (i) (10)
j=1 exp(gij )
lm syntactic representation hl+1
Syni and semantic
representation hl+1
Semi
where Wr1 and Wr2 are trainable parameter matrices, br1 and
14: end for
br2 denote the bias vectors, σ represents the sigmod activation
15: return D(i) and H T
function, N (i) refers to the set of nodes rooted in the target
lm
mention in the original syntactic dependency tree, and βij
represents the mth syntactic attention weight at layer l.
Semantic Channel with DPM: To couple the semantic depen- where D(i) denotes the set of opinion words related to the target
dency relations in the IS2 tree, we extract semantic information mention i in the sentence processed by DPM. For example, when
between the target mention and contexts based on the semantic K = 2, the sequence output by DP M2 ({0.6, 0.1, 0.3, 0.2}) is
dependency label Sim : wjs . Furthermore, DPM can eliminate {0.6, 0.3}, and the corresponding opinion words are added to
the interference of the semantic information irrelevant to the D(i). After each iteration of the DPM, the relevant contexts in
target in the IS2 tree. The semantic representation hl+1 D(i) are updated according to the adjacency relation between
Simi of the
target mention i can be updated as: the target mention and contexts.
Finally, the DCGCN encoder concatenates syntactic repre-
Semi =m=1
hl+1 M
ϕlm l l
ij Wm hj (11) sentation hl+1 l+1
Syni and semantic representation hSimi filtered out
j∈D(i) by DPM into:
Sim : wjs H̃ l+1 = hl+1
Syni hSemi
l+1
(16)
ij =
ϕlm (12)
Sim : wĵs
ĵ ∈
/N (i) H l+1 = ReLU (H̃ l+1 W + b) (17)
k where H l+1 is the output of the DCGCN encoder at layer l + 1.
Sim : wjs = Similarity(target, rij ) = hc Ws · γij (13)
c=1 C. Gate Unit
where Ws represents the trainable parameter matrix, hc is the To learn a composite representation containing both contex-
inner word vector of a multi-word target mention, γij refers to tual and target-specific information and to control the infor-
the vector mapped from the original syntactic dependency label mation fusion ratio, we introduce a fine-grained feature fusion
of word wjs , and ϕlmij represents the attention weight provided mechanism. This feature fusion mechanism leverages a gate unit
by the semantic dependency label Sim : wjs . In this way, the to aggregate target representation H L output by the last DCGCN
semantic information relevant to the target mention hc contained encoder and global contextual representation H T of the target
in γij can be obtained. In addition, the contextual information mention output by a mask mechanism [16]. Because the target
relevant to the target mention is encoded by a DPM based on the mention is viewed as a single whole term in the IS2 tree, the
semantic dependency labels. Specifically, M attention weights averaging pooling operation is unnecessary. The feature fusion
ij of the context words not in N (i) are sorted, where the first
ϕlm process is:
K attention weights are selected and recorded in D(i):
Hf ull = g ◦ H L + (1 − g) ◦ H T (18)
M
ϕlij = ϕlm
ij (14) where ◦ refers to the element-wise product operation and g
m=1 denotes the fusion rate:
l
ψij = DP MK ({ϕlij |j ∈
/ N (i)}), j ∈ D(i) (15) g = sigmod(Wg [H L ||H T ] + bg ) (19)
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1116 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
TABLE I TABLE II
STATISTICS OF DATASETS INPUT/OUTPUT DIMENSION OF MODEL SUB-MODULES
E. Training B. Baselines
The training process consists of two stages in each iteration Our proposed DCGCN is compared with state-of-the-art tar-
as in Algorithm 2. During the first step, DCGCN encodes the geted sentiment classification systems, which are divided into
syntactic dependency labels for the syntactic representation two categories: baselines based on semantic information and
of the target mention (lines 3–7). In the second step, DPM baselines based on syntactic information.
selects semantic dependency labels from the IS2 tree to obtain 1) Semantic Baselines:
the semantic representation of the target mention (lines 8–12). r IAN [10] uses two LSTM networks and attention mecha-
Finally, we concatenate these two contextual representations to nisms for the fine-grained interaction between targets and
determine the sentiment polarity (lines 13–14). l ∈ [1, 2, . . ., L] context words.
layers of DCGCN encoders can be stacked to form deeper r MGAN [13] employs the Bi-LSTM to capture contextual
networks. information and a multi-grained attention mechanism to
obtain the relationships between targets and context words.
V. EXPERIMENTS r AOA [49] models the interaction between target men-
tions and context words by introducing the Attention-over-
In this paper, experiments are conducted on four public
Attention module.
datasets including SemEval 2014 Restaurant, SemEval 2014 r DSMN [15] designs a selective attention mechanism to
Laptop [45], ACL14 Twitter [23], and MAMS [46]. Each sen-
make full use of the inter-target information.
tence in each dataset contains the target mention and its cor- r AEN [50] uses an attentional encoder network to enhance
responding sentiment polarity label including positive, neutral,
the feature representation of target mentions.
and negative [20]. The statistics of the samples in each dataset r KGCapsAN [51] utilizes syntactical and n-gram structures
are shown in Table I.
to guide the capsule attention network.
r DC-GCN [57] considers both syntactic structure informa-
A. Settings
tion and multi-aspect sentiment dependencies in sentences
Two types of contextual encoders are employed: Bi-LSTM and employs GCNs to learn its node information represen-
encoders and BERT encoders. Given a Bi-LSTM encoder, there tation.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1117
TABLE III
PERFORMANCE COMPARISONS OF DIFFERENT MODELS ON BENCHMARK DATASETS
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1118 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
on semantic dependency labels allows DCGCN to focus on 77.55, and 84.01 on Restaurant, Laptop, Twitter, and MAMS,
semantic information while reducing redundancy and complex- respectively.
ity in syntax-insensitive scenarios. In contrast, DC-GCN does
not consider syntax-insensitive scenarios. By calculating the
model complexity, we found that the training time cost and total VI. ANALYSIS
model parameters of DC-GCN on the Restaurant dataset are
A. Ablation Study
higher than those of DCGCN by 3.43 s and 0.21 M, respectively.
Although DC-GCN performs well on the Restaurant dataset, To verify the effectiveness of the IS2 tree and DCGCN,
using two different GCNs to encode syntactic dependency trees we conduct ablation experiments on all datasets. As shown in
and multi-aspect sentiment graphs increases model complexity Table IV, we consider five baselines for comparison purposes:
and leads to severe overfitting on syntax-insensitive datasets. 1) DCGCN-Attention using the traditional attention mechanisms
For syntactic models, R-GAT calculates the contribution of to take the place of the semantic channels; 2) DCGCN w/o
contextual words based on the syntactic relative distance be- IS2 tree substituing the syntactic dependency tree provided
tween them and the target in the original dependency tree. How- by an external parser for the IS2 tree, i.e., replacing semantic
ever, the information provided by syntactic relative distances is relations Sim : wjs with syntactic relations rij ; 3) DCGCN w/o
limited due to parsing errors in the case of syntactic insensitivity. DPM removing the DPM and directly introducing the semantic
Therefore, it is necessary to label the semantic relations between dependency relations of all context words in the IS2 tree; 4)
words and embed them into the IS2 tree as a supplement to DCGCN w/o Syn-channel taking out syntactic representation
the syntactic information. RoBERTa+ASGCN/RGAT utilizes hl+1 l+1
Syni but reserving semantic representation hSemi ; 5) DCGCN
RoBERTa as a guide to build the dependency tree. However, w/o Sem-channel retaining only syntactic representation hl+1 Syni .
RoBERTa is a pre-trained model, it will receive interference Impacts of Attention Mechanisms
from the initial learning data and can only provide limited Because the attention mechanisms cannot appropriately
semantic information. DCGCN achieves the accuracy improve- handle the noise interference of irrelevant opinion words in
ment of 2.58 on the syntax-insensitive Twitter dataset with the sentences, the accuracy of DCGCN-Attention and DCGCN-
help of IS2 tree, which directly embeds the displayed semantic BERT-Attention decrease by 1.68 and 1.75 percent on average,
relations. DualGCN employs two GCN structures to extract respectively, compared with DCGCN. The average decrease
syntactic and semantic information respectively. However, the in accuracy is 1.47 and 1.75 percent on the syntax-insensitive
original dependency tree is utilized as input to the model, which Twitter and MAMS, which further demonstrates that the
can introduce parsing errors and will make the model more integration of semantic information with syntactic information
redundant and complex. DCGCN encodes IS2 tree by syntactic in the IS2 tree is beneficial in encoding informal reviews.
and semantic channels in parallel, rather than constructing dual Effectiveness of the IS2 tree
GCN structures. It is lightweight by integrating dual-channel The purpose of constructing the IS2 tree is to overcome the
mechanism in a single GCN. In addition, dynamic pruning parsing errors in the original dependency tree and introduce
for IS2 tree can further reduce the complexity of DCGCN. semantic labels to supplement the syntactic relations in the de-
Compared with DualGCN, the accuracy and macro F1 scores pendency tree. As shown in Table IV, when replacing the seman-
of DCGCN are improved by the highest 1.41 and 2.81 on all tic relations with the syntactic relations, we compare DCGCN
datasets. GraphMerge fuses the parsing results provided by w/o IS2 tree to DCGCN, the accuracy and macro F1 scores
different parsers into a syntactic dependency tree to supplement decrease by 2.26 and 2.55 percent on average on all datasets,
syntactic information. However, it not only requires a lot of respectively. Furthermore, the accuracy and macro F1 scores
preparation but also raises an issue of error accumulations across of DCGCN-BERT w/o IS2 tree compared to DCGCN-BERT
multiple parsers. DCGCN only employs one parser to obtain the decrease by 2.35 and 2.65 percent, respectively, which further
original dependency tree and then performs simple pruning and indicates that the semantic relations benefits targeted sentiment
semantic embedding as an effective supplement to syntactic in- classification and the proposed IS2 tree is effective.
formation. DSS-GCN establishes a dual-channel semantic GCN Effectiveness of the DPM
to obtain word semantics from general semantic information and To exclude the interference of contextual information irrele-
structural semantic information. However, in syntax-insensitive vant to the target, DCGCN adopts the DPM when encoding IS2
scenarios, the information obtained by GCN is limited, resulting tree, which calculates its contribution to the sentiment analysis
in poorer performance of DSS-GCN on the Twitter dataset. On by the semantic similarity between context words and the target.
the other hand, DCGCN overcomes parsing errors not only At the same time, with the help of the DPM, our model can
in the Restaurant and Laptop datasets but also in the syntax- avoid invalid connections in the dependency tree, and reduce the
insensitive Twitter dataset by introducing semantic dependency complexity while ensuring the classification accuracy. DCGCN
labels and target-specific reconstruction to enhance dependency w/o DPM lacks a pruning mechanism for the IS2 tree, which
relationships. Moreover, these results also demonstrate the ne- results in the impact of contextual information irrelevant to
cessity of integrating semantics with syntactic dependency trees. the target mention on the classification performance and sig-
With the help of the IS2 tree, DCGCN achieves better results nificantly decreases the accuracy and macro F1 scores by 1.34
than all the baselines, giving macro-F1 scores of 83.97, 80.96, and 1.37 percent compared to DCGCN. The performance of
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1119
TABLE IV
EXPERIMENTAL RESULTS OF ABLATION STUDY
BERT-based models is also lower than that of DCGCN, which the IS2 tree, so they cannot beat DualGCN. It is the collaboration
demonstrates the importance of introducing the DPM. of several new modules that makes our model achieve the best
Syn-Channel vs. Sem-Channel performance.
We design a dual-channel structure of syntax and semantics
within DCGCN for two reasons: on the one hand, DCGCN needs
B. Qualitative Case Study
to deal with the syntactic and semantic relations in the IS2 tree,
respectively; on the other hand, the dual-channel structure can As shown in Table V, we extract one example sentence from
help avoid redundant and complex network structures. In this each dataset and compare the attention weights offered by the
way, DCGCN can encode these two relations in parallel while strong baselines (DGEDT, ASGCN, RGAT) and our model
updating each layer of the network. It is also the reason why our DCGCN. We also give the original syntactic dependency tree
model is more lightweight than DualGCN. Comparing DCGCN and the ground truth label for each example. The shades of
to DCGCN w/o syn-channel and DCGCN w/o Sem-channel, the words represent the corresponding attention weights. These
we find that these two ablation models have essentially the same cases indicate the necessity of adding semantic dependency
degree of performance degradation on all datasets. For example, relations about the target mention into targeted sentiment clas-
the accuracy and macro F1 scores of DCGCN are higher than sification task again.
those of DCGCN w/o Syn-channel up to 0.96 and 1.07, and The first example: “This is literally a hot spot when it comes
higher than those of DCGCN w/o Sem-channel up to 1.09 and to the food.” with the target “food”. In the original syntactic
1.13 percent, respectively. These results show that the syntactic dependency tree, the dependency between the opinion term “hot
information of sentences is favorable but the semantic informa- spot” and the target “food” is weakened by the syntactic relative
tion related to the target mention cannot be neglected as well. distance, which leads to ASGCN and RGAT using syntactic
Specifically, the syntactic dependency relations preserved in the structures to make a wrong prediction. However, our DCGCN
IS2 tree are as important as the semantic dependency relations. focuses on the opinion term through the semantic channels
However, the proposed model cannot beat DualGCN (the combined with the semantic dependency labels in the IS2 tree.
most similar model) without any module. First, it is reason- DGEDT may correctly make a prediction with the help of dual
able that the module w/o IS2 Tree cannot beat DualGCN, attention mechanisms.
because once DCGCN uses the original syntactic dependency The second one: “If you are a Tequila fan you will not be
tree without labeling semantic information, the performance will disappointed.” with the double negatives “not be disappointed”.
decrease in the syntax-insensitive scenarios. Second, DCGCN It is difficult to model such a structure using the conventional
cannot accurately identify semantic dependency labels in the methods. For example, the negative expression “not” is easily
IS2 Tree that are related to the target without DPM. There- recognized by traditional methods, while the implicit opinion
fore, it results in a decline in sentiment classification perfor- word “disappointed” is often ignored. DCGCN can make pos-
mance. Third, syn-channel and sem-channel in DCGCN are itive prediction due to the dependency labels in the IS2 tree
respectively responsible for processing the syntactic depen- representing the semantic similarity of the target mention, so
dency labels and semantic dependency labels in the IS2 tree. that make the model consider both “not” and “disappointed”.
If we remove any channel, it means that the model is missing In the third example: “I never tried any external mics with that
some important information provided by the IS2 tree. Similarly, iMac.”, both ASGCN and RGAT are affected by “never tried”
the module DCGCN-Attention doesn’t use the IS2 tree to encode adjacent to the target mention “external mics” in the syntactic
the semantic relationship between words offline, it encodes dependency tree, leading to the negative prediction. The dual
the semantic information between words online during training attention mechanisms in DGEGT also fail to eliminate the noise
DCGCN based on the attention mechanism. Therefore, the above interference of “never”, which leads to the incorrect prediction.
benchmark modules are missing all or part of the information in However, DCGCN utilizes the DPM in the semantic channels
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1120 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
TABLE V
CASE VISUALIZATION FOR ATTENTION WEIGHTS OF DGEDT, ASGCN, RGAT, AND DCGCN
to exclude opinion word “never” that is not related to the target C. Impacts of Different Parsers
mention. The classification accuracy shows the impact of different
Finally, we give a negative example: “Last Kiss by Taylor external parsers on the classification performance. Specifically,
Swift if like the saddest song I have heard.”. There is a long-range
the following baselines are set: 1) Random using the dependency
words distance between the opinion expression and the target tree provided by the Biaffine parser [48], but randomly adjusting
mention, which make the traditional methods hard to detect the syntactic dependency labels corresponding to the words; 2)
implicit semantics. For example, ASGCN and RGAT focus
Stanford employing the syntactic dependency labels constructed
on the irrelevant term “Last Kiss” towards the target mention by Stanford Transition-based Parser [52]; 3) Biaffine applying a
“Taylor Swift”. DCGCN can handle such a sample. It utilizes
Deep Biaffine Parser [48] to obtain the syntactic dependency la-
the semantic dependency structures in the IS2 tree to enhance the
bels. The performances of the Stanford Parser and Deep Biaffine
relation between the target mention and the related opinion term Parser on the Treebank are given in Table VI.
“saddest song”, and employs the DPM to exclude the irrelevant
The impact of syntactic dependency trees provided by differ-
“Last Kiss”.
ent external parsers on the classification accuracy is shown in
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1121
TABLE VI
CLASSIFICATION ACCURACY BASED ON DIFFERENT EXTERNAL PARSERS
TABLE VII
ACCURACY WHEN USING DIFFERENT PARSERS AS THE INPUT
Fig. 6. Impacts of iteration number L on all datasets. (a) Accuracy-L curves.
(b) F1-L curves.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1122 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
TABLE VIII
COMPARISON OF MODEL PARAMETERS AND TRAINING TIME COST, 1 M = 1e6
H. Error Analysis
To analyze the limitation of the proposed model, we further
summarize the prediction errors of DCGCN. There are several
reasons for explaining the findings. First, DCGCN may not
always generate an accurate prediction that reflects what people
really think in ironic sentences. For example, the correct label
for the sentence “The scene hunky waiters dub dinners darling
and it sounds like they mean it.” is “negative”. Our method
tends to classify the sentence as “positive”, since DCGCN cannot
Fig. 7. Transfer Study of IS2 Tree. (a) Twitter. (b) Restaurant. (c) Laptop. understand the literally positive words “hunky” and “darling”,
(d) MAMS. which are actually ironic. Second, our method fails to recognize
the potential implications of the sentences such as “Premium
price for the OS more than anything else”. The correct label
in the IS2 tree while implementing network updates at each layer of the sentence is “positive”, but DCGCN may classify it as
based on the dual-channel mechanism. In addition, DCGCN “negative”. This is because the model cannot realize that the
adopts the DPM to prune irrelevant context words in IS2 tree, more expensive price often means the better OS performance,
which avoids invalid connections in the dependency tree and and the model only cares about “Premium price” and “more”.
reduces the complexity of the model. The above results show
that our algorithm makes DCGCN lightweight while fusing syn-
tactic and semantic information and ensuring the classification VII. CONCLUSION
accuracy.
In this paper, we propose a novel IS2 tree to integrate syntactic
and semantic information. By integrating semantic dependency
G. Transfer Study labels into the syntactic dependency trees, our proposed IS2 tree
To verify the transferability of the IS2 tree, we decouple the can overcome the inaccurate parsing results introduced by the
2
IS tree from DCGCN, and migrate it to DualGCN [54]. As external parser in syntax-insensitive scenarios. Furthermore, we
shown in Fig. 7, we can draw the following conclusions: (1) design DCGCN to encode both syntactic and semantic depen-
As shown in Fig. 7(a), in the syntactic-insensitive dataset, the dency relations in the IS2 tree and select the task-related contex-
IS2 Tree has a significant effect on enhancing the sentiment tual information using the DPM. The experiment results show
classification performance of the model. In this case, the model that DCGCN demonstrate the necessity of integrating semantic
needs the semantic dependency information contained in the IS2 relations with the syntactic structure and the effectiveness of
Tree. (2) As shown in Fig. 7(b), compared with the models with DCGCN in encoding both syntactic and semantic information.
GloVe embeddings, the IS2 Tree can give the models with BERT Worth noting is that the IS2 tree has an excellent generalization
embeddings higher performance gain. BERT can provide the capability to cooperate with different parsers and DCGCN im-
IS2 tree with a large amount of sentence structure information proves the accuracy of targeted sentiment classification tasks by
and semantic information of words, which can help the model large margins.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: INTEGRATED SYNTACTIC AND SEMANTIC TREE FOR TARGETED SENTIMENT CLASSIFICATION USING DCGCN 1123
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.
1124 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
[50] Y. Song et al., “Attentional encoder network for targeted sentiment classi- Boran Yang (Member, IEEE) received the Ph.D.
fication,” 2019, arXiv:1902.09314. degree from the Chongqing University of Posts and
[51] B. Zhang, X. Li, X. Xu, K.-C. Leung, Z. Chen, and Y. Ye, “Knowledge Telecommunications and joined the Chongqing Uni-
guided capsule attention network for aspect-based sentiment analysis,” versity of Technology, Chongqing, China as a faculty
IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2538–2551, Member and research Secretary with the School of
2020. Artificial Intelligence. His research interests include
[52] D. Chen and C. Manning, “A fast and accurate dependency parser us- edge computing, edge resource sharing, and network
ing neural networks,” in Proc. Conf. Empirical Methods Natural Lang. security. He has authored or coauthored more than
Process., 2014, pp. 740–750. 20 technical papers in top journals such as IEEE
[53] X. Hou et al., “Graph ensemble learning over multiple dependency trees for TRANSACTIONS ON GREEN COMMUNICATIONS AND
aspect-level sentiment classification,” in Proc. Assoc. Comput. Linguistics, NETWORKING, IEEE TRANSACTIONS ON VEHICULAR
2021, pp. 2884–2894. TECHNOLOGY, IEEE TRANSACTIONS ON MULTIMEDIA, IEEE INTERNET OF
[54] R. Li et al., “Dual graph convolutional networks for aspect-based sentiment THINGS JOURNAL, and GLOBECOM. He was the co-recipient of the Best Paper
analysis,” in Proc. Assoc. Comput. Linguistics, 2021, pp. 6319–6329. Awards from IEEE MSN 2020 and IEEE GreenCom 2019. He was the Technical
[55] J. Dai et al., “Does syntax matter? A strong baseline for Aspect-based Editor/Guest Editor of Digital Communications and Networks, Sensors, and
Sentiment Analysis with RoBERTa,” in Proc. Assoc. Comput. Linguistics, World Electric Vehicle Journal and reviewer for IEEE TRANSACTIONS ON
2021, pp. 1816–1829. COMMUNICATIONS and IEEE INTERNET OF THINGS JOURNAL. He was also a
[56] Q. Sun et al., “Dual-channel and hierarchical graph convolutional net- TPC Member of IEEE Healthcom 2023.
works for document-level relation extraction,” Expert Syst. Appl., vol. 205,
pp. 117678–117688, 2022.
[57] Z. Lan et al., “Dual-channel interactive graph convolutional networks
for aspect-level sentiment analysis,” in Mathematics, vol. 10, no. 18,
pp. 3317–3331, 2022.
[58] A. Dai et al., “Learning from word semantics to sentence syntax by graph
convolutional networks for aspect-based sentiment analysis,” Mathemat- Yuexian Li is currently working toward the mas-
ics, vol. 14, no. 1, pp. 17–26, 2022. ter’s degree with the Chongqing University of Posts
[59] J. Pennington et al., “Glove: Global vectors for word representa- and Telecommunications, Chongqing, China. Her re-
tion,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2014, search interests include sentiment analysis, machine
pp. 1532–1543. translation, and natural language processing.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:22:52 UTC from IEEE Xplore. Restrictions apply.