License: arXiv.org perpetual non-exclusive license
arXiv:2403.01063v1 [cs.CL] 02 Mar 2024

FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based Sentiment Analysis

Abstract

Multi-domain aspect-based sentiment analysis (ABSA) seeks to capture fine-grained sentiment across diverse domains. While existing research narrowly focuses on single-domain applications constrained by methodological limitations and data scarcity, the reality is that sentiment naturally traverses multiple domains. Although large language models (LLMs) offer a promising solution for ABSA, it is difficult to integrate effectively with established techniques, including graph-based models and linguistics, because modifying their internal architecture is not easy. To alleviate this problem, we propose a novel framework, Feature-aware In-context Learning for Multi-domain ABSA (FaiMA). The core insight of FaiMA is to utilize in-context learning (ICL) as a feature-aware mechanism that facilitates adaptive learning in multi-domain ABSA tasks. Specifically, we employ a multi-head graph attention network as a text encoder optimized by heuristic rules for linguistic, domain, and sentiment features. Through contrastive learning, we optimize sentence representations by focusing on these diverse features. Additionally, we construct an efficient indexing mechanism, allowing FaiMA to stably retrieve highly relevant examples across multiple dimensions for any given input. To evaluate the efficacy of FaiMA, we build the first multi-domain ABSA benchmark dataset. Extensive experimental results demonstrate that FaiMA achieves significant performance improvements in multiple domains compared to baselines, increasing F1 by 2.07% on average. Source code and data sets are available at https://github.com/SupritYoung/FaiMA.

Keywords: Multi-domain Aspect-based Sentiment Analysis, Graph Neural Networks, Large Language Model, In-Context Learning, Linguistics

\NAT@set@cites

FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based Sentiment Analysis

Songhua Yang11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Xinke Jiang22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTthanks:   Songhua Yang and Xinke Jiang contributed equally to this research. , Hanjie Zhao11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT Wenxuan Zeng22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, Hongde Liu11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Yuxiang Jia11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTthanks:   Yuxiang Jia is the corresponding author.
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT Zhengzhou University, Henan, China
22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Peking University, Beijing, China
{suprit,thinkerjiang}@foxmail.com, [email protected]
[email protected], [email protected], [email protected]

Abstract content

1.   Introduction

In the highly interconnected digital era, a myriad of social media platforms are continually emerging (Roccabruna et al., 2022). These platforms generate a vast corpus of user reviews across various domains, providing a rich reservoir of sentiment-related information. For years, aspect-based sentiment analysis (ABSA) has emerged as a long-standing solution to this problem (Pang et al., 2008; Zhang and Liu, 2012; Schouten and Frasincar, 2016). ABSA is a fine-grained sentiment analysis task that can meticulously extract the sentiment polarity of users towards specific aspects. However, the majority existing ABSA methods are confined to single-domain applications, struggling to capture the multifaceted sentiment information prevalent in the real world. Traditional approaches often encounter generalization challenges across multiple domains, limiting the practical and broad-scale applicability of ABSA (Luo et al., 2022). Customizing models and annotating data for each domain is inefficient and costly, especially in resource-limited settings.

Refer to caption
Figure 1: An example of feature-aware in-context learning for ABSA. By selecting one relevant example on each of the three features, sufficient reference is provided for LLM.

Fortunately, the advent of Large Language Models (LLMs) can imbue the multi-domain ABSA with renewed optimism, owing to their remarkable generalization and cross-domain capabilities (Wang et al., 2023; Zhang et al., 2023). Trained on extensive, multi-domain corpora, LLMs assimilate a broad spectrum of common sense and domain-agnostic knowledge, equipping them with the ability to discern nuanced differences and linguistic subtleties across various domains (Zhao et al., 2023; Dillion et al., 2023; Yang et al., 2023b; Luo et al., 2024a). Moreover, emerging in-context learning (ICL) techniques demonstrate that task-specific performance can be significantly amplified by simply incorporating concise, task-relevant instructions, demonstrations, and examples into the prompts (Ye et al., 2023; Jiang et al., 2023). Although initial research has begun to probe the potential of LLMs and associated techniques in ABSA (Fei et al., 2023; Scaria et al., 2023; Varia et al., 2022), empirical investigations explicitly focusing on multi-domain ABSA are still notably scarce.

Another line of ABSA research focuses on graph neural networks (GNNs) and linguistic features (Chen et al., 2022). Linguistic knowledge, epitomized by syntactic and part of speech, is widely regarded as essential for solving ABSA tasks, as they share intricate connections with the relationships between sentiment elements (Zhang et al., 2022a; Nazir et al., 2020). Numerous studies demonstrated that leveraging these linguistic features to construct relationships between words and leveraging the unique message-passing mechanism of GNNs can effectively capture complex and latent relationships among sentiment elements (Wu et al., 2021; Chen et al., 2021; Yang et al., 2023a; Shi et al., 2023; Zhong et al., 2023). Features such as domain and sentiment structure are also valuable in multi-domain ABSA (Wu et al., 2020; Gong et al., 2020). In the context of multi-domain ABSA’s complex and diverse landscape, general linguistic features can provide substantive support, while specific domain information can serve as unique augmentative features. On the other hand, LLMs are often perceived as inscrutable "black box", making it challenging to directly modify their internal architecture or incorporate additional features (Luo et al., 2023; Zhao et al., 2023). Solely fine-tuning LLMs for ABSA fails to integrate the wealth of domain-specific expertise and the intrinsic relationships between parts of speech and syntax. Seamlessly integrating these well-established traditional methods with cutting-edge LLMs to fully unleash their collective potential remains a pivotal challenge in current research.

Incorporating semantically similar examples into the instructions can significantly enhance the performance of LLMs on specific tasks (Liu et al., 2022). Unlike unsupervised strategies, supervised example retrieval methods have proven to be more effective (Rubin et al., 2022; Zhang et al., 2022b). In light of this, we propose the following critical hypothesis: ICL is not only a tool to guide the model but also an efficient feature-aware mechanism. We further hypothesize that the stable retrieval of representative examples for various features, followed by their precise incorporation into fine-tuning instructions, can give the model a structured and enriched feature context, as shown in 1. This, in turn, substantially enhances its performance on the target task. By undergoing supervised fine-tuning (SFT) on extensive data, LLMs with strong comprehension capabilities can fully grasp, understand, and apply these features, achieving marked performance improvements in ABSA tasks.

In light of the above, we introduce a novel Feature-aware In-context Learning for Multi-Domain ABSA (FaiMA) framework. FaiMA ingeniously amalgamates traditional techniques with cutting-edge LLMs, using ICL as the linchpin that coherently integrates these components. Explicitly, we architect a Multi-head Graph Attention Network Encoder (MGATE) to function as the sentence encoder. Employing a multi-headed Graph Attention Network (GAT) architecture, MGATE concentrates on a panoply of linguistic, domain, and sentiment features, thereby engendering a unique sentence encoding paradigm. The essence of MGATE is its ability to wisely choose examples that are highly aligned with any given input across a variety of feature dimensions. To achieve this, the ICL technique is combined with SFT in the training stage to impart LLMs a nuanced, feature-aware understanding and learning capacity. In order to make it easier to retrieve the most relevant examples, we craft a set of heuristic rules that quantify sentence similarity across various feature dimensions. This approach generates a balanced mix of positive and negative samples for the next MGATE contrastive learning training. After training and optimizing sentence representations, the MGATE can achieve a refined understanding of the features critical for multi-domain ABSA tasks, producing high-quality sentence representations. Building on this, we select the most similar examples across features and insert them into instruction prompts during both the training and inference stages, further enhancing performance for multi-domain ABSA tasks.

In response to the lack of specialized multi-domain ABSA datasets, we also constructed a benchmark dataset named MD-ASPE, which combines 16,000 sentences across nine diverse domains. Extensive experiments show that FaiMA performs in all these domains and increases average performance by 2.07% compared to baseline models.

Our contributions can be summarized as follows:

  • We introduce FaiMA, a novel framework based on LLMs for multi-domain ABSA tasks, demonstrating that ICL can be an effective feature-aware tool.

  • We propose a sentence encoding model, MGATE, which combines multi-head GAT and contrastive learning. It fully integrates linguistic, domain, and sentiment features, allowing the robust retrieval of highly relevant examples in multiple dimensions.

  • We present MD-ASPE, the first benchmark dataset for multi-domain ABSA. Extensive experiments demonstrate that our method achieves state-of-the-art performance across nearly all domains and on average.

2.   Related Work

Historically, extensive research has demonstrated the universal applicability of specific features for ABSA tasks Zhang et al. (2022a). For example, dependency parse trees and part-of-speech tagging naturally captured relationships between words and were considered crucial linguistic features for tackling ABSA; they were closely related to underlying sentiment elements (Zhang et al., 2019; Wu et al., 2021; Chen et al., 2021; Shi et al., 2023). Furthermore, Wu et al. (2020); Chen et al. (2022) introduced a grid tagging scheme, formalizing ABSA as a task to predict the types of edge relations between words. Given that ABSA spans multiple domains, domain-specific information is often considered a crucial feature (Gong et al., 2020; jie Tian et al., 2021). Since ABSA can be considered an edge-sensitive task, GNN-based models demonstrated remarkable performance (Zhang et al., 2019; Wang et al., 2022; Zhang et al., 2022c). Significantly, the multi-head GAT model, which can flexibly focus on multiple features, achieved superior performance (Wang et al., 2020; Liang et al., 2022; Yang et al., 2023a).

Recently, LLMs like ChatGPT or LLaMA have achieved groundbreaking success (Touvron et al., 2023a, b). With the increasing scale of LLMs, novel techniques such as ICL (Ye et al., 2023) and Chain of Thought (CoT) (Wei et al., 2022) emerged. ICL demonstrates that adding detailed instructions and examples to task prompts can significantly enhance task performance, whether in zero-shot inference or supervised training. Current research has begun to investigate optimal example selection to further augment ICL’s capabilities. Studies (Liu et al., 2022; Min et al., 2022) found that choosing examples semantically and label-wise closer to the actual input is more effective. Moreover, Rubin et al. (2022); Zhang et al. (2022b) revealed that training a retriever in a supervised way to find more relevant examples is a more practical approach.

Refer to caption
Figure 2: The overall architecture of FaiMA: MGATE training part and example retrieval part. MGATE training involves three steps: heuristic rules for positive/negative pairs generation, multi-head graph attention network to embed sentences upon three features, and contrastive learning. The diagram to the far right illustrates an ICL process that reliably fetches three domain-relevant and global average samples for any input sentence.

Traditional methods based on Small Language Models (SLMs) for multi-domain ABSA showed limited performance (Hu et al., 2019a; Ji et al., 2020; Luo et al., 2022). Previous approaches usually trained models for every domain, leading to computational and resource costs. Recent work has begun to explore the combination of LLMs for ABSA. For example, Varia et al. (2022) demonstrated a few-shot generalizability across various ABSA subtasks using SFT and multi-task learning. Scaria et al. (2023) employed ICL with fixed examples to achieve marked performance, while Fei et al. (2023) designed a multi-turn CoT for understanding implicit sentiments and opinions. These studies provide initial evidence of the substantial research potential of LLM in ABSA tasks.

3.   Methodology

In this section, we introduce our proposed FaiMA framework, depicted in Figure 2. In §3.2, we describe the MGATE, elaborate on the heuristic rules, and contrastive learning. In §3.3, we discuss how to perform feature-aware example retrieval, along with the specific implementation of the ICL strategy.

3.1.   Problem Definition

As a fine-grained sentiment analysis task, ABSA can be formalized as a hybrid task of extraction and classification. Given a sentence L={t1,t2,,tn}𝐿subscript𝑡1subscript𝑡2subscript𝑡𝑛L=\{t_{1},t_{2},...,t_{n}\}italic_L = { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, where tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the i𝑖iitalic_i-th word in this sentence, and the multi-domain ABSA refers to the extraction of all sentiment pairs P={(A1,S1),,(Am,Sm)}𝑃subscript𝐴1subscript𝑆1subscript𝐴𝑚subscript𝑆𝑚P=\{(A_{1},S_{1}),...,(A_{m},S_{m})\}italic_P = { ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) } in L𝐿Litalic_L jointly for different domains. 111This task is also referred to as aspect sentiment pair extraction (ASPE) in some literature. Formally, A𝐴Aitalic_A indicates an entity or phase in the sentence S𝑆Sitalic_S that are related to sentiment, defined as A={a1,a2,,ap}L𝐴subscript𝑎1subscript𝑎2subscript𝑎𝑝𝐿A=\{a_{1},a_{2},...,a_{p}\}\subseteq Litalic_A = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } ⊆ italic_L and the sentiment polarity 𝒮{positive,negative,neutral}𝒮positivenegativeneutral\mathcal{S}\in\{\text{positive},\text{negative},\text{neutral}\}caligraphic_S ∈ { positive , negative , neutral }.

3.2.   Multi-head Graph Attention Network Encoder

For a long time, the linguistic, domain, sentiment features, and the GNN model have been crucial components for ABSA Zhang et al. (2022a); Chen et al. (2022). In light of this, we propose MGATE, a submodel designed to investigate and understand the intricate interplay of these three complex features between words within sentences. To enhance the training process, we develop a set of sophisticated heuristic rules to generate positive and negative training sentence pairs for each feature, and then employ contrastive learning to train the graph neural encoder, optimizing sentence representations from these three perspectives. The detailed implementation is as follows.

3.2.1.   Feature Selection and Heuristic Rules

To accommodate the properties of the three different features, we devise unique processing rules for each. For linguistic and sentiment features, direct conversion to trainable positive and negative sample pairs presents challenges. Therefore, we design a set of heuristic algorithms to precisely calculate the similarity between two sentences given in the ABSA task.

Linguistic Similarity Linguistic knowledge has always been considered an essential resource to solve the ABSA task Chen et al. (2021); Shi et al. (2023). We select the most representative part-of-speech combinations and syntactic dependency types and define refined feature modeling methods. Initially, for a sentence L𝐿Litalic_L, using a parser to establish part-of-speech combination matrices Rposn×nsuperscript𝑅𝑝𝑜𝑠superscript𝑛𝑛R^{pos}\in\mathbb{R}^{n\times n}italic_R start_POSTSUPERSCRIPT italic_p italic_o italic_s end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and syntactic dependency matrices Rdepn×nsuperscript𝑅𝑑𝑒𝑝superscript𝑛𝑛R^{dep}\in\mathbb{R}^{n\times n}italic_R start_POSTSUPERSCRIPT italic_d italic_e italic_p end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, where each type of relationship corresponds to a unique numerical ID.

In the ABSA task, the aspect is always considered the key to solving this task and is closely related to other elements, so we redefine the aspects and opinions in the sentence as central words C=A={c1,c2,,ck}𝐶𝐴subscript𝑐1subscript𝑐2subscript𝑐𝑘C=A=\{c_{1},c_{2},...,c_{k}\}italic_C = italic_A = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }. The central word for multi-token phrases is selected based on its highest number of relationships with other words. For each main word cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we assign weights to the other words in the sentence using a Gaussian function, ensuring that terms closer to the main word receive more significant weight. It is defined as:

W(ck)=[e(1k)22σ2,,e(nk)22σ2]n.𝑊subscript𝑐𝑘superscript𝑒superscript1𝑘22superscript𝜎2superscript𝑒superscript𝑛𝑘22superscript𝜎2superscript𝑛W(c_{k})=\left[e^{-\frac{(1-k)^{2}}{2\sigma^{2}}},...,e^{-\frac{(n-k)^{2}}{2% \sigma^{2}}}\right]\in{\mathbb{R}^{n}}.italic_W ( italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = [ italic_e start_POSTSUPERSCRIPT - divide start_ARG ( 1 - italic_k ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT , … , italic_e start_POSTSUPERSCRIPT - divide start_ARG ( italic_n - italic_k ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . (1)

The similarity calculation uses the weighted Hamming distance 𝙷𝙼()𝙷𝙼\texttt{HM}(\cdot)HM ( ⋅ ), which can effectively capture minor structural changes in the sentence and amplify the influence of core words nearby, considering comprehensive linguistic structures beyond direct word-to-word connections, defined as the weighted Hamming distance:

H(i,j)=W(ci)HM([Rdep(ci),Rpos(ci)],[Rdep(cj),Rpos(cj)]),𝐻𝑖𝑗𝑊subscript𝑐𝑖HMsuperscript𝑅𝑑𝑒𝑝subscript𝑐𝑖superscript𝑅𝑝𝑜𝑠subscript𝑐𝑖superscript𝑅𝑑𝑒𝑝subscript𝑐𝑗superscript𝑅𝑝𝑜𝑠subscript𝑐𝑗H(i,j)=W(c_{i})\circ\text{HM}([R^{dep}(c_{i}),R^{pos}(c_{i})],\\ [R^{dep}(c_{j}),R^{pos}(c_{j})]),start_ROW start_CELL italic_H ( italic_i , italic_j ) = italic_W ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∘ HM ( [ italic_R start_POSTSUPERSCRIPT italic_d italic_e italic_p end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_R start_POSTSUPERSCRIPT italic_p italic_o italic_s end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] , end_CELL end_ROW start_ROW start_CELL [ italic_R start_POSTSUPERSCRIPT italic_d italic_e italic_p end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_R start_POSTSUPERSCRIPT italic_p italic_o italic_s end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] ) , end_CELL end_ROW (2)

where \circ denotes the dot product operation, and ciC1subscript𝑐𝑖subscript𝐶1c_{i}\in C_{1}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and cjC2subscript𝑐𝑗subscript𝐶2c_{j}\in C_{2}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the central word sets of two sentences L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, respectively. The overall similarity distance is calculated as:

D(L1,L2)=1|C1||C2|i=1|C1|j=i|C2|H(i,j).𝐷subscript𝐿1subscript𝐿21subscript𝐶1subscript𝐶2superscriptsubscript𝑖1subscript𝐶1superscriptsubscript𝑗𝑖subscript𝐶2𝐻𝑖𝑗D(L_{1},L_{2})=\frac{1}{|C_{1}||C_{2}|}\sum_{i=1}^{|C_{1}|}\sum_{j=i}^{|C_{2}|% }H(i,j).italic_D ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG | italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT italic_H ( italic_i , italic_j ) . (3)

Finally, the linguistic similarity score between the two sentences is obtained through the 𝚂𝚒𝚐𝚖𝚘𝚒𝚍()𝚂𝚒𝚐𝚖𝚘𝚒𝚍\texttt{Sigmoid}(\cdot)Sigmoid ( ⋅ ) function:

SLig(L1,L2)=𝙼𝚎𝚊𝚗(𝚂𝚒𝚐𝚖𝚘𝚒𝚍(D(L1,L2)))subscript𝑆𝐿𝑖𝑔subscript𝐿1subscript𝐿2𝙼𝚎𝚊𝚗𝚂𝚒𝚐𝚖𝚘𝚒𝚍𝐷subscript𝐿1subscript𝐿2S_{Lig}(L_{1},L_{2})=\texttt{Mean}(\texttt{Sigmoid}(D(L_{1},L_{2})))italic_S start_POSTSUBSCRIPT italic_L italic_i italic_g end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = Mean ( Sigmoid ( italic_D ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ) (4)

where Mean refers to the averaging operation, D(L1,L2)min(n,m)2𝐷subscript𝐿1subscript𝐿2superscript𝑚𝑖𝑛superscript𝑛𝑚2D(L_{1},L_{2})\in\mathbb{R}^{min(n,m)^{2}}italic_D ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m italic_i italic_n ( italic_n , italic_m ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and the final output is a scalar. This strategy combines part-of-speech, syntactic dependencies, and core word concepts, providing an effective quantitative measure of sentence linguistic similarity for the ABSA task.

Domain Similarity   In Multi-domain ABSA, texts from different domains may possess entirely different features and styles, while texts from the same domain share similar background knowledge and emotional objects. Therefore, taking into account domain similarity becomes a critical factor. We define a simple binary metric to measure this. Given two sentences L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that belong to domains D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and D2subscript𝐷2D_{2}italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT respectively, the domain similarity SDom=𝟙D1=D2subscript𝑆Domsubscript1𝐷1𝐷2S_{\text{Dom}}=\mathbbm{1}_{D1=D2}italic_S start_POSTSUBSCRIPT Dom end_POSTSUBSCRIPT = blackboard_1 start_POSTSUBSCRIPT italic_D 1 = italic_D 2 end_POSTSUBSCRIPT, where 𝟙()1\mathbbm{1}(\cdot)blackboard_1 ( ⋅ ) is the indicator function, taking the value of 1 if the condition is met and 0 otherwise.

Sentiment Similarity   Sentiment similarity in ABSA is not directly measurable. Review text often contains different sentiment polarities across multiple aspects, especially in long or complex sentences. To capture these nuanced variations, we introduce a sentiment vector representation. For each sentence L𝐿Litalic_L, we define a sentiment vector 𝐯=[npos,nneu,nneg]𝐯subscript𝑛𝑝𝑜𝑠subscript𝑛𝑛𝑒𝑢subscript𝑛𝑛𝑒𝑔\mathbf{v}=[n_{pos},n_{neu},n_{neg}]bold_v = [ italic_n start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_n italic_e italic_u end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_n italic_e italic_g end_POSTSUBSCRIPT ], where npos,nneu,nnegsubscript𝑛𝑝𝑜𝑠subscript𝑛𝑛𝑒𝑢subscript𝑛𝑛𝑒𝑔n_{pos},n_{neu},n_{neg}italic_n start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_n italic_e italic_u end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_n italic_e italic_g end_POSTSUBSCRIPT represent the count of positive, neutral, and negative sentiments in the text, respectively. For two sentences L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and their corresponding sentiment vectors 𝐯𝟏subscript𝐯1\mathbf{v_{1}}bold_v start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT and 𝐯𝟐subscript𝐯2\mathbf{v_{2}}bold_v start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT, their sentiment similarity is calculated as follows:

Ssen(L1,L2)=12𝐯𝟏𝐯𝟐𝐯𝟏𝐯𝟐+12.subscript𝑆𝑠𝑒𝑛subscript𝐿1subscript𝐿212subscript𝐯1subscript𝐯2normsubscript𝐯1normsubscript𝐯212S_{sen}(L_{1},L_{2})=\frac{1}{2}\cdot\frac{\mathbf{v_{1}}\circ\mathbf{v_{2}}}{% \|\mathbf{v_{1}}\|\|\mathbf{v_{2}}\|}+\frac{1}{2}.start_ROW start_CELL italic_S start_POSTSUBSCRIPT italic_s italic_e italic_n end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG bold_v start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ∘ bold_v start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_v start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ∥ ∥ bold_v start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ∥ end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG . end_CELL end_ROW (5)

Here, \circ denotes the dot product between the two vectors, and 𝐯norm𝐯\|\mathbf{v}\|∥ bold_v ∥ represents the Euclidean norm of the vector.

Through the aforementioned method, we obtain a quantified inter-sentence similarity measure as follows:

S(L1,L2)=[S𝙻𝚒𝚐(L1,L2),S𝙳𝚘𝚖(L1,L2),S𝚂𝚎𝚗(L1,L2)]𝑆subscript𝐿1subscript𝐿2subscript𝑆𝙻𝚒𝚐subscript𝐿1subscript𝐿2subscript𝑆𝙳𝚘𝚖subscript𝐿1subscript𝐿2subscript𝑆𝚂𝚎𝚗subscript𝐿1subscript𝐿2S(L_{1},L_{2})=\left[S_{\texttt{Lig}}(L_{1},L_{2}),S_{\texttt{Dom}}(L_{1},L_{2% }),S_{\texttt{Sen}}(L_{1},L_{2})\right]italic_S ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = [ italic_S start_POSTSUBSCRIPT Lig end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_S start_POSTSUBSCRIPT Dom end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_S start_POSTSUBSCRIPT Sen end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] (6)

which integrates the three feature dimensions of linguistics, domain, and sentiment. By further setting three thresholds θ𝙻𝚒𝚐,θ𝙳𝚘𝚖,θ𝚂𝚎𝚗subscript𝜃𝙻𝚒𝚐subscript𝜃𝙳𝚘𝚖subscript𝜃𝚂𝚎𝚗\theta_{\texttt{Lig}},\theta_{\texttt{Dom}},\theta_{\texttt{Sen}}italic_θ start_POSTSUBSCRIPT Lig end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT Dom end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT Sen end_POSTSUBSCRIPT, these continuous similarity values can be mapped into a three-dimensional 0-1 tensor T𝑇Titalic_T. Tijksubscript𝑇𝑖𝑗𝑘T_{ijk}italic_T start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT represents the value of sentence i𝑖iitalic_i and j𝑗jitalic_j in dimension k𝑘kitalic_k. The tensor T=[𝟙S𝙻𝚒𝚐θ𝙻𝚒𝚐,𝟙S𝙳𝚘𝚖θ𝙳𝚘𝚖,𝟙S𝚂𝚎𝚗θ𝚂𝚎𝚗]𝑇subscript1subscript𝑆𝙻𝚒𝚐subscript𝜃𝙻𝚒𝚐subscript1subscript𝑆𝙳𝚘𝚖subscript𝜃𝙳𝚘𝚖subscript1subscript𝑆𝚂𝚎𝚗subscript𝜃𝚂𝚎𝚗T=[\mathbbm{1}_{S_{\texttt{Lig}}\geq\theta_{\texttt{Lig}}},\mathbbm{1}_{S_{% \texttt{Dom}}\geq\theta_{\texttt{Dom}}},\mathbbm{1}_{S_{\texttt{Sen}}\geq% \theta_{\texttt{Sen}}}]italic_T = [ blackboard_1 start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT Lig end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT Lig end_POSTSUBSCRIPT end_POSTSUBSCRIPT , blackboard_1 start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT Dom end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT Dom end_POSTSUBSCRIPT end_POSTSUBSCRIPT , blackboard_1 start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT Sen end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT Sen end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] serves as a multi-dimensional matrix of positive and negative samples, and will be subsequently used for contrastive learning to optimize the training of the graph encoder.

Table 1: The statistics of train and test datasets. #S, #P represent the number of sentences and sentiment pairs in the dataset, and #Pos, #Neg, #Neu refer to the amount of corresponding sentiment polarity.
Dataset Train set Test set
#S #P #Pos #Neg #Neu #S #P #Pos #Neg #Neu
laptop 1148 1384 745 518 121 339 418 279 93 46
restaurant 1500 2125 1525 452 148 496 726 555 128 43
twitter 1500 1500 353 390 757 500 500 134 112 254
books 1411 1780 1282 445 53 421 538 394 127 17
clothing 1303 1567 1158 381 28 318 369 274 88 7
device 948 1405 905 500 0 482 696 480 216 0
finance 1500 2139 675 608 856 500 593 291 220 82
hotel 1468 1963 1856 100 7 500 678 636 42 0
service 1432 1842 1032 698 112 500 618 350 229 39
Overall 12210 15705 9531 4092 2082 4056 5136 3393 1255 488

3.2.2.   Multi-head Graph Attention Network

The multi-head GAT is designed to discern intricate interrelations among linguistic, domain, and sentiment features at the token level. To this end, we deploy three distinct sub-linear layers that serve as encoders for token adjacency matrices, subsequently leveraging multi-head graph attention networks Zhang et al. (2024); Luo et al. (2024b) for aggregating features. The aggregated features are globally pooled to produce a graph-level (or sentence-level) representation.

Given a sentence L={w1,w2,,wn}𝐿subscript𝑤1subscript𝑤2subscript𝑤𝑛L=\{w_{1},w_{2},\ldots,w_{n}\}italic_L = { italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, we apply a pre-trained BERT model to encode word token:

(h1,h2,,hn)=𝙱𝙴𝚁𝚃(w1,w2,,wn),subscript1subscript2subscript𝑛𝙱𝙴𝚁𝚃subscript𝑤1subscript𝑤2subscript𝑤𝑛(h_{1},h_{2},\ldots,h_{n})=\texttt{BERT}(w_{1},w_{2},\ldots,w_{n}),( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = BERT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , (7)

we denote sentence’s token vectors H=(w1,w2,,wn)𝐻subscript𝑤1subscript𝑤2subscript𝑤𝑛H=(w_{1},w_{2},\ldots,w_{n})italic_H = ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Then an Adaptive Adjacency Matrix is employed as the feature propagation matrix for these token vectors. For instance, the Adaptive Adjacency Matrix corresponding to linguistic features denoted as A(Lig)superscript𝐴(Lig)A^{{\text{(Lig)}}}italic_A start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT, is computed through:

A(Lig)=𝚂𝚒𝚐𝚖𝚘𝚒𝚍(HW(Lig)H𝖳).superscript𝐴(Lig)𝚂𝚒𝚐𝚖𝚘𝚒𝚍𝐻superscript𝑊(Lig)superscript𝐻𝖳A^{{\texttt{(Lig)}}}=\texttt{Sigmoid}(HW^{{\texttt{(Lig)}}}H^{\mathsf{T}}).italic_A start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT = Sigmoid ( italic_H italic_W start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) . (8)

and W(Lig)superscript𝑊(Lig)W^{\texttt{(Lig)}}italic_W start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT is learnable weight to linguistic feature. We also compute A(Dom),A(Sen)superscript𝐴(Dom)superscript𝐴(Sen)A^{{\texttt{(Dom)}}},A^{{\texttt{(Sen)}}}italic_A start_POSTSUPERSCRIPT (Dom) end_POSTSUPERSCRIPT , italic_A start_POSTSUPERSCRIPT (Sen) end_POSTSUPERSCRIPT by learnable weights W(Dom),W(Sen)superscript𝑊(Dom)superscript𝑊(Sen)W^{\texttt{(Dom)}},W^{\texttt{(Sen)}}italic_W start_POSTSUPERSCRIPT (Dom) end_POSTSUPERSCRIPT , italic_W start_POSTSUPERSCRIPT (Sen) end_POSTSUPERSCRIPT for domain and sentiment features, respectively.

The attention coefficients between the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT tokens are then calculated as:

αij=exp(𝙻𝚎𝚊𝚔𝚢𝚁𝚎𝙻𝚄(a𝖳[WahiWahj]))Aik(Lig)>δexp(𝙻𝚎𝚊𝚔𝚢𝚁𝚎𝙻𝚄(a𝖳[WahiWahk]))subscript𝛼𝑖𝑗𝙻𝚎𝚊𝚔𝚢𝚁𝚎𝙻𝚄superscript𝑎𝖳delimited-[]conditionalsubscript𝑊𝑎subscript𝑖subscript𝑊𝑎subscript𝑗subscriptsuperscriptsubscript𝐴𝑖𝑘(Lig)𝛿𝙻𝚎𝚊𝚔𝚢𝚁𝚎𝙻𝚄superscript𝑎𝖳delimited-[]conditionalsubscript𝑊𝑎subscript𝑖subscript𝑊𝑎subscript𝑘\alpha_{ij}=\frac{\exp(\texttt{LeakyReLU}(\vec{a}^{\mathsf{T}}[W_{a}h_{i}\|W_{% a}h_{j}]))}{\sum_{A_{ik}^{\texttt{(Lig)}}>\delta}\exp(\texttt{LeakyReLU}(\vec{% a}^{\mathsf{T}}[W_{a}h_{i}\|W_{a}h_{k}]))}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG roman_exp ( LeakyReLU ( over→ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT > italic_δ end_POSTSUBSCRIPT roman_exp ( LeakyReLU ( over→ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) ) end_ARG (9)

where δ𝛿\deltaitalic_δ serves as a threshold to filter out noise in the adjacency matrix. a,Wa𝑎subscript𝑊𝑎\vec{a},W_{a}over→ start_ARG italic_a end_ARG , italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT are learnable weights.

Then the token-level representation for the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT token Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, is computed via attention mechanism:

Ei=𝙻𝚊𝚢𝚎𝚛𝙽𝚘𝚛𝚖(Aij(Lig)>δαijAij(Lig)Wahj).E_{i}=\texttt{LayerNorm}\bigl{(}\sum_{A^{{\texttt{(Lig)}}}_{ij}>\delta}\alpha_% {ij}A^{{\texttt{(Lig)}}}_{ij}W_{a}h_{j}\bigl{)}.italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = LayerNorm ( ∑ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT > italic_δ end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) . (10)

To synthesize the graph-level representation, an average pooling operation is applied across all token-level features:

E(Lig)=𝙰𝚅𝙶(E1,,En).superscript𝐸(Lig)𝙰𝚅𝙶subscript𝐸1subscript𝐸𝑛E^{\texttt{(Lig)}}=\texttt{AVG}(E_{1},\ldots,E_{n}).italic_E start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT = AVG ( italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) . (11)

Lastly, we compute the comprehensive graph-level representations for linguistic, domain-specific and sentiment features—denoted E(Lig),E(Dom),E(Sen)superscript𝐸(Lig)superscript𝐸(Dom)superscript𝐸(Sen)E^{\texttt{(Lig)}},E^{\texttt{(Dom)}},E^{\texttt{(Sen)}}italic_E start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT , italic_E start_POSTSUPERSCRIPT (Dom) end_POSTSUPERSCRIPT , italic_E start_POSTSUPERSCRIPT (Sen) end_POSTSUPERSCRIPT, and their average E(Avg)superscript𝐸(Avg)E^{\text{(Avg)}}italic_E start_POSTSUPERSCRIPT (Avg) end_POSTSUPERSCRIPT serves as the global feature representation.

When calculating the multi-head attention mechanism, attention is paid to the knowledge of the “Lig”, “Dom”, and “Sen” three levels. Unlike the traditional multi-head attention method that applies multi-head attention for “Lig”, “Dom”, and “Sen” respectively, we treat the “Lig”, “Dom”, and “Sen” as three heads respectively, as FaiMA mainly focuses on the knowledge of these three levels.

3.2.3.   Contrastive Learning

Next, we will introduce the graph-level contrastive learning loss Li et al. (2022), which aims at optimizing the representation of the three aspects after the multi-head graph attention network.

In Section 3.2.1, we obtain positive and negative sample pairs from linguistic, domain, and sentiment feature perspectives through heuristic rules. Similarly, taking the linguistic perspective as an example, for any sentence Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we define its positive sample sentence set as 𝒫isubscript𝒫𝑖\mathcal{P}_{i}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and its negative sample sentence imposed as 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We take the global representation (sentence representation) obtained in Section 3.2.2 as input to maximize the similarity between positive sample pairs and minimize the similarity between negative sample pairs. To this end, we define the contrastive learning formula as follows:

CL(Lig)=1BLi[logLj𝒫iexp(Γ(Ei,Ej)/τ)Lk𝒩iexp(Γ(Ei,Ek)/τ)]superscriptsubscript𝐶𝐿(Lig)1𝐵subscriptsubscript𝐿𝑖delimited-[]subscriptsubscript𝐿𝑗subscript𝒫𝑖Γsubscript𝐸𝑖subscript𝐸𝑗𝜏subscriptsubscript𝐿𝑘subscript𝒩𝑖Γsubscript𝐸𝑖subscript𝐸𝑘𝜏\centering\mathcal{L}_{CL}^{\texttt{(Lig)}}=\frac{1}{B}\sum_{L_{i}}\left[-\log% \frac{\sum_{L_{j}\in\mathcal{P}_{i}}\exp(\Gamma(E_{i},E_{j})/\tau)}{\sum_{L_{k% }\in\mathcal{N}_{i}}\exp(\Gamma(E_{i},E_{k})/\tau)}\right]\@add@centeringcaligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ - roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( roman_Γ ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( roman_Γ ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_τ ) end_ARG ] (12)

where (Li,Lj),Lj𝒫isubscript𝐿𝑖subscript𝐿𝑗subscript𝐿𝑗subscript𝒫𝑖(L_{i},L_{j}),L_{j}\in\mathcal{P}_{i}( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the positive pair and (Li,Lk),Lk𝒩isubscript𝐿𝑖subscript𝐿𝑘subscript𝐿𝑘subscript𝒩𝑖(L_{i},L_{k}),L_{k}\in\mathcal{N}_{i}( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the negative pair for sentence Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Moreover, we define the critic function as: Γ(u,v)=cos(Linear(u),Linear(v))Γ𝑢𝑣Linear𝑢Linear𝑣\Gamma(u,v)=\cos(\text{Linear}(u),\text{Linear}(v))roman_Γ ( italic_u , italic_v ) = roman_cos ( Linear ( italic_u ) , Linear ( italic_v ) ). Linear()Linear\text{Linear}(\cdot)Linear ( ⋅ ) represents the projection function implemented with a two-layer perceptron model. cos()\cos(\cdot)roman_cos ( ⋅ ) means cosine similarity, and we first normalize the embedding and then calculate point multiplication instead. τ(0,1)𝜏01\tau\in(0,1)italic_τ ∈ ( 0 , 1 ) is the annealing coefficient to avoid smoothing the exp function around 0 to speed up model convergence.

For linguistic features, domain features, sentiment features and global features, we calculate their respective contrastive learning loss functions CL(Lig),CL(Dom),CL(Sen),CL(Avg)superscriptsubscript𝐶𝐿(Lig)superscriptsubscript𝐶𝐿(Dom)superscriptsubscript𝐶𝐿(Sen)superscriptsubscript𝐶𝐿(Avg)\mathcal{L}_{CL}^{\text{(Lig)}},\mathcal{L}_{CL}^{\text{(Dom)}},\mathcal{L}_{% CL}^{\text{(Sen)}},\mathcal{L}_{CL}^{\text{(Avg)}}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT , caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Dom) end_POSTSUPERSCRIPT , caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Sen) end_POSTSUPERSCRIPT , caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Avg) end_POSTSUPERSCRIPT and sum them up as the model’s contrastive learning loss:

CL=β1CL(Lig)+β2CL(Dom)+β3CL(Sen)+CL(Avg),subscript𝐶𝐿subscript𝛽1superscriptsubscript𝐶𝐿(Lig)subscript𝛽2superscriptsubscript𝐶𝐿(Dom)subscript𝛽3superscriptsubscript𝐶𝐿(Sen)superscriptsubscript𝐶𝐿(Avg)\mathcal{L}_{CL}=\beta_{1}\mathcal{L}_{CL}^{\text{(Lig)}}+\beta_{2}\mathcal{L}% _{CL}^{\text{(Dom)}}+\beta_{3}\mathcal{L}_{CL}^{\text{(Sen)}}+\mathcal{L}_{CL}% ^{\text{(Avg)}},caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Lig) end_POSTSUPERSCRIPT + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Dom) end_POSTSUPERSCRIPT + italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Sen) end_POSTSUPERSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT (Avg) end_POSTSUPERSCRIPT , (13)

where β1,β2,β3subscript𝛽1subscript𝛽2subscript𝛽3\beta_{1},\beta_{2},\beta_{3}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are the reweighting coefficients. By continuously optimizing CLsubscript𝐶𝐿\mathcal{L}_{CL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT, we can achieve the goal of optimizing the model.

3.3.   Feature-aware Example Retrieval

After completing the MGATE training, the last phase of FaiMA involves example retrieval across different feature dimensions. Through the ICL method, the LLM can fully perceive the influence of different feature dimensions in the ABSA task on the output, thereby making more accurate predictions. Given an input sentence, the trained graph encoder 𝒢𝒢\mathcal{G}caligraphic_G can yield three feature vectors h𝙻𝚒𝚐,h𝙳𝚘𝚖,h𝚂𝚎𝚗subscript𝙻𝚒𝚐subscript𝙳𝚘𝚖subscript𝚂𝚎𝚗h_{\texttt{Lig}},h_{\texttt{Dom}},h_{\texttt{Sen}}italic_h start_POSTSUBSCRIPT Lig end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT Dom end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT Sen end_POSTSUBSCRIPT and a pooled average vector h𝙰𝚟𝚐subscript𝙰𝚟𝚐h_{\texttt{Avg}}italic_h start_POSTSUBSCRIPT Avg end_POSTSUBSCRIPT. We only use the training set as the retrieval library and adopt the efficient FAISS (Facebook AI Similarity Search) algorithm222github.com/facebookresearch/faiss Johnson et al. (2019) for approximate nearest neighbor search:

𝒩k(h)=𝚊𝚛𝚐𝚖𝚒𝚗x1,,xkShxi22subscript𝒩𝑘subscript𝚊𝚛𝚐𝚖𝚒𝚗subscript𝑥1subscript𝑥𝑘Ssuperscriptsubscriptnormsubscript𝑥𝑖22\mathcal{N}_{k}(h)=\texttt{argmin}_{x_{1},\ldots,x_{k}\in\text{S}}||h-x_{i}||_% {2}^{2}caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_h ) = argmin start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ S end_POSTSUBSCRIPT | | italic_h - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (14)

Here, k𝑘kitalic_k is the number of instances to be retrieved. For each feature dimension, we retrieve at least one nearest neighbour instance (i.e., k3𝑘3k\geq 3italic_k ≥ 3), and the retrieved instances are strictly de-duplicated.

For multi-domain ABSA tasks, we carefully design multiple different templates. The retrieved examples will be directly inserted into these templates to fine-tune the LLM in an ICL manner further.

Table 2: Performances over five different runs with Macro-F1 score (%) metric. The best performance is in bold and the second best results are underlined.
laptop restaurant twitter books clothing device finance hotel service Overall
BERT-CRF 55.32 68.15 57.85 42.15 61.41 55.78 55.37 61.44 54.77 58.03
SpanABSA-joint 59.12 72.65 61.05 45.55 65.61 59.38 59.47 65.74 58.47 61.89
BART-Index 65.57 76.71 66.89 60.93 72.96 67.78 69.29 80.21 67.52 69.57
T5-Index 68.05 79.44 67.85 64.12 78.59 71.13 75.81 82.05 70.96 73.18
T5-Paraphrase 68.29 80.77 69.52 64.25 79.13 70.82 75.77 82.17 71.47 73.55
LLaMA-SFT 68.50 76.26 62.80 60.29 73.71 71.21 73.74 82.87 67.13 70.54
LLaMA-Random 69.54 78.23 63.60 64.07 74.07 66.82 76.56 82.45 71.29 72.40
LLaMA-SBERT 68.41 76.80 60.90 62.32 73.64 67.58 74.29 81.04 68.89 70.99
LLaMA-FaiMA 70.58 81.39 68.00 66.33 77.08 70.85 77.60 83.45 72.24 75.62

4.   Experiments

4.1.   Dataset

To bridge the absence of multi-domain benchmark datasets, we combine nine high-quality datasets from various domains into a comprehensive dataset, named MD-ASPE, including 14Restaurant Pontiki et al. (2014), 14Laptop Pontiki et al. (2014), Device Hu and Liu (2004), Service Toprak et al. (2010), Books, Clothing, Hotel Luo et al. (2022), Twitter Dong et al. (2014), and Financial News Headlines Sinha et al. (2022). MD-ASPE incorporates annotations from diverse teams and draws from rich data sources, effectively mimicking real-world multi-domain scenarios. We ensure data balance by employing random sampling strategies and standardizing the selected data by rectifying punctuation errors and addressing whitespace inconsistencies. Train and Test datasets statistics are summarized in Table 1.

4.2.   Baselines

To rigorously and comprehensively evaluate our proposed approach, we chose a range of baseline models, from SLMs and Generative SLMs to LLMs. 1) Within the SLM category, we employ two cross-domain competent models based on the BERT framework (Devlin et al., 2018): SpanABSA-joint (a span-level focused model) (Hu et al., 2019b) and BERT-CRF (a BERT-based model augmented with a CRF layer). 2) Generative SLMs include BART-Index based on BART (Raffel et al., 2020), its T5 variant T5-Index, and the variance model T5-Paraphrase (labels transduced into sequences using text templates) (Zhang et al., 2021). 3) We also incorporate three LLM-based methods. The first is to conduct SFT directly based on LLaMA while keeping the instruction unchanged and only removing examples (LLaMA-SFT). Other ICL methods involved randomly selecting an equal number k𝑘kitalic_k of examples (LLaMA-Random), and utilizing Sentence-BERT333huggingface.co/sentence-transformers/all-MiniLM-L6-v2 as sentence encoder to index and retrieve the most similar instances using the Euclidean algorithm (LLaMA-SBERT).

4.3.   Experimental Settings

Our FaiMA comprises multiple stages, including the generation of pairs using heuristic rules, MGATE training, and SFT with ICL. For the heuristic rules, we set θLig=0.43subscript𝜃𝐿𝑖𝑔0.43\theta_{Lig}=0.43italic_θ start_POSTSUBSCRIPT italic_L italic_i italic_g end_POSTSUBSCRIPT = 0.43, θDom=0.5subscript𝜃𝐷𝑜𝑚0.5\theta_{Dom}=0.5italic_θ start_POSTSUBSCRIPT italic_D italic_o italic_m end_POSTSUBSCRIPT = 0.5 and θSen=0.8subscript𝜃𝑆𝑒𝑛0.8\theta_{Sen}=0.8italic_θ start_POSTSUBSCRIPT italic_S italic_e italic_n end_POSTSUBSCRIPT = 0.8 to differentiate feature similarity. Within the MGATE training phase, we employ the BERT-base-uncased444huggingface.co/bert-base-uncased as a token encoder with an initial learning rate of 2×1042superscript1042\times 10^{-4}2 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT running for 10 epochs. In the ICL and SFT stage, we insert the 5 (k=5𝑘5k=5italic_k = 5) most relevant examples into the prompt ordered by similarity Liu et al. (2022), including 2 average, 1 linguistic, 1 domain and 1 sentiment samples, respectively. We use LLaMA2-7b555huggingface.co/meta-llama/Llama-2-7b as the backbone model and leverage low-rank adaptation (LoRA) Hu et al. (2021) for efficient parameter tuning, coupled with gradient accumulation and mixed-precision training. The learning rate and epochs are set to 8×1058superscript1058\times 10^{-5}8 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT and 7, respectively. All methods use AdamW optimizer Loshchilov and Hutter (2017) with gradient decay, dynamic learning rate, and gradient clipping technique. The batch size B𝐵Bitalic_B is set to 128, and τ=0.1,δ=0.2,β1=β2=β3=1formulae-sequence𝜏0.1formulae-sequence𝛿0.2subscript𝛽1subscript𝛽2subscript𝛽31\tau=0.1,\delta=0.2,\beta_{1}=\beta_{2}=\beta_{3}=1italic_τ = 0.1 , italic_δ = 0.2 , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 1. All experiments are conducted on an Ubuntu 18.04.5 LTS server with an A800-80G GPU. We randomly divide 10% of the validation set from the training set, select the best-performing model on it, and employ the Macro-F1 value as the principal evaluation metric. We repeat experiments with different random seeds five times and report the average results.

Table 3: Macro-F1 score of ablation experiment results on different datasets. Values in green indicate the drop in performance after removing a feature.
Model laptop restaurant twitter books clothing avg.
All 70.58 81.39 68.00 66.33 77.08 73.94
w/o Lig. (-1.35) (-2.10) (-0.95) (-1.80) (-1.22) (-1.75)
w/o Dom. (-1.78) (-1.87) (-1.52) (-1.09) (-1.67) (-1.68)
w/o Sen. (-0.37) (-0.92) (-0.68) (-0.53) (-0.86) (-0.71)

4.4.   Main Results

Table 2 shows the main experimental results. Our proposed LLaMA-FaiMA outperforms all baseline models in most domains, demonstrating an average performance gain of 3.22% over the best previous method. This underscores the efficacy of the Feature-Aware ICL strategy in multi-domain ABSA scenarios. Among various SLMs, generative models such as BART, T5, and LLaMA evidently outclass BERT-based models. This superiority may stem from the generative models’ more efficient pretraining methodology, which enables them to undergo large-scale unsupervised training on massive corpora, thereby acquiring richer and more diverse domain knowledge. Intriguingly, although the other LLaMA-based methods (LLaMA-SFT, LLaMA-Random, and LLaMA-SBERT) have larger model sizes than T5, their performance is somewhat lacking. We speculate that this could be due to the excessive size of LLM models, resulting in difficulties in learning transfer and adaptability. That efficient parameter fine-tuning alone may not be sufficient for optimal training Wang et al. (2023). Despite employing a more advanced sentence encoder, for example, retrieval, LLaMA-SBERT experiences a decline in performance, indicating that conventional sentence encoding models struggle to adapt to the complexities of multi-domain ABSA tasks. In contrast, FaiMA provides stable examples from similar tasks, allowing the model to grasp the essence of the task at hand more rapidly. This demonstrates the effectiveness of our proposed approach and provides a robust new framework for the multi-domain ABSA task.

4.5.   Ablation Study

To investigate the impact of different features on the performance of different domains, we sequentially remove three features (linguistic, domain, and sentiment) and then report the changes in the Macro-F1 score in the top five domains to validate the efficacy of the three features. The overall results are demonstrated in Table 3. Taking the results of the average drop, linguistic features have the most significant reduction to 1.751.75-1.75- 1.75 in performance, followed by domain at 1.681.68-1.68- 1.68 and sentiment feature at 0.710.71-0.71- 0.71, substantiating the crucial role of linguistic features in ABSA tasks. Additionally, in some domains, such as Twitter, due to its unique characteristics, the impact of domain features is especially notable compared to linguistic features. In contrast, linguistic characteristics have the most significant impact in the restaurant and book domains. Text data in linguistic domains are generally more structured, making them more susceptible to the influence of linguistic features. Meanwhile, the clothing and restaurant domains show a more pronounced dependency on sentiment features due to the high diversity in aspects and sentiments. The variation in the impact of linguistic features across domains is a reflection of unique language usage and contextual factors inherent to each domain. Typically, the lack of any variation leads to a decrease in performance when compared to the complete model.

4.6.   Effectiveness Analysis of MGATE

Refer to caption
Figure 3: The retrieval success rate of the three relevant feature examples retrieved for each domain.
Table 4: Case study reports two representative samples, including the retrieved most relevant examples on three features using MGATE and the prediction of FaiMA.
Case #1 Case #2
Sample

Input: The food was great - sushi was good, but the cooked food amazed us.
Output: [food, positive], [sushi, positive], [cooked food, positive]
Predict: [food, positive], [sushi, positive], [cooked food, positive]

Well, it happened because of a graceless manager and a rude bartender who had we waiting 20 minutes for drinks, and then tells us to chill out.
Output: [manager, negative], [bartender, negative], [drinks, neutral]
Predict: [manager, negative], [bartender, negative]

Lig.

The service was excellent, the food was excellent, but the entire experience was very cool.
Output: [service, positive], [food, positive], [experience, positive]

The whole setup is truly unprofessional and I wish Cafe Noir would get some good stuff, because despite the current one this is a great place.
Output: [staff, negative]

Dom.

The food was very expensive (we spent $160 for lunch for two) but extremely tasty.
Output: [food, positive]

One would think we’d get an apology or complimentary drinks - instead, we got a snobby waiter who wouldn’t even take our order for 15 minutes and gave us a lip when we asked him to do so.
Output: [waiter, positive]

Sen.

The spicy tuna roll was unusually good and the rock shrimp tempura was awesome, great appetizer to share!
Output: [spicy tuna roll, positive], [rock shrimp tempura, positive], [appetizer, positive]

We actually gave 10% tip (which we have never done despite mediocre food and service), because we felt totally ripped off.
Output: [food, neutral]

To validate the effect of MGATE (cf. Section 3.2) for Feature-aware ICL components, we employ gpt-3.5-turbo666openai/api/openai/chat-completion as an adjudicator to determine whether the examples retrieved by MGATE are similar in the validation set777Through testing, we found GPT can achieve human-like judgment due to excellent understanding ability.. The retrieval rates for three features are illustrated in Figure 3 in various domains, indicating that all three features achieve a relatively high success rate (over 50%), proving the effectiveness of MGATE for multi-domain ABSA sentence encoding. Domain features exhibit the most explicit retrieval rate. Sentiment features are inferior to Linguistic features, and we attribute that Sentiment features are more multi-component and complex, leading to relatively low retrieval rates.

4.7.   Case Study

To provide an insightful understanding of the efficacy of MGATE, we conduct case studies to detail retrieval and predictive results. 1) For the correctly predicted Case #1, we observe that all three examples show a high similarity to the input sentence in the corresponding feature dimensions. They share very similar syntactic structures linguistically, while also being from the same domain and possessing consistent sentiment polarity and quantity. These well-matched examples enable the model to fully apprehend each feature’s role and effectiveness. 2) On the other hand, for Case #2, the sentence composition and sentiment are somewhat complex, and only the domain feature successfully matching. The linguistic examples focus on partial similarity(“bartender,” “manager,” and “staff”), while the sentiment examples, possibly due to limited sample size, only offer support for the "neutral" label. Despite these limitations, the model still delivers accurate predictions, only overlooking the less frequent “neutral” label.

5.   Conclusion and Future Direction

In this paper, we introduce FaiMA, a novel framework tailored to address the challenges of multi-domain Aspect-Based Sentiment Analysis (ABSA). The core insight of FaiMA is to utilize in-context learning (ICL) as a feature-aware tool in LLM. Moreover, FaiMA leverages GNNs and proposes MGATE, which captures the intricate interplay between linguistic, domain-specific, and sentiment features. Together with contrastive learning, MGATE empowers the model to retrieve highly analogous examples for any given input. Comprehensive experiments carried out in several domains demonstrate the effectiveness of FaiMA.

In summary, our research reveals the potential of LLMs in advancing ABSA studies, especially in multi-domain and cross-domain intricacies, providing a new insight and solution for integrating traditional GNN-based methods and LLMs, holding promise for broader sentiment analysis applications, i.e. Aspect Sentiment Triplet Extraction (ASTE) and Aspect Sentiment Quad Prediction (ASQP). Despite these successes, FaiMA, as an LLM-based model, needs higher training and deployment costs compared to previous methods. Another limitation of our model is its current focus on extracting binary sentiment elements, we plan to explore the extraction of triplet and quadruple and continue to build the appropriate dataset in future.

6.   Ethics Statement

There are no ethics-related issues in this paper. We conduct experiments on publicly available datasets. These datasets do not share personal information and do not contain sensitive content that can be harmful to any individual or community.

7.   Acknowledgments

The authors thank anonymous reviewers for their insightful comments. This work is mainly supported by the Key Program of the Natural Science Foundation of China (NSFC) (Grant No. U23A20316) and Key R&D Project of Hubei Province (Grant No.2021BAA029).

8.   Bibliographical References

\c@NAT@ctr

  • Chen et al. (2022) Hao Chen, Zepeng Zhai, Fangxiang Feng, Ruifan Li, and Xiaojie Wang. 2022. Enhanced multi-channel graph convolutional network for aspect sentiment triplet extraction. In Annual Meeting of the Association for Computational Linguistics.
  • Chen et al. (2021) Zhexue Chen, Hong Huang, Bang Liu, Xuanhua Shi, and Hai Jin. 2021. Semantic and syntactic enhanced aspect sentiment triplet extraction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1474–1483.
  • Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Dillion et al. (2023) Danica Dillion, Niket Tandon, Yuling Gu, and Kurt Gray. 2023. Can ai language models replace human participants? Trends in Cognitive Sciences.
  • Fei et al. (2023) Hao Fei, Bobo Li, Qianchu Liu, Lidong Bing, Fei Li, and Tat-Seng Chua. 2023. Reasoning implicit sentiment with chain-of-thought prompting. In Annual Meeting of the Association for Computational Linguistics.
  • Gong et al. (2020) Chenggong Gong, Jianfei Yu, and Rui Xia. 2020. Unified feature and instance based domain adaptation for end-to-end aspect-based sentiment analysis. In Conference on Empirical Methods in Natural Language Processing.
  • Hu et al. (2021) Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models.
  • Hu et al. (2019a) Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li, and Yiwei Lv. 2019a. Open-domain targeted sentiment analysis via span-based extraction and classification. arXiv preprint arXiv:1906.03820.
  • Hu et al. (2019b) Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li, and Yiwei Lv. 2019b. Open-domain targeted sentiment analysis via span-based extraction and classification. arXiv preprint arXiv:1906.03820.
  • Ji et al. (2020) Qian Ji, Xiang Lin, Yinghua Ma, Gongshen Liu, and Shilin Wang. 2020. A unified labeling model for open-domain aspect-based sentiment analysis. 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), pages 186–189.
  • Jiang et al. (2023) Xinke Jiang, Ruizhe Zhang, Yongxin Xu, Rihong Qiu, Yue Fang, Zhiyuan Wang, Jinyi Tang, Hongxin Ding, Xu Chu, Junfeng Zhao, et al. 2023. Think and retrieval: A hypothesis knowledge graph enhanced medical large language models. arXiv preprint arXiv:2312.15883.
  • jie Tian et al. (2021) Ying jie Tian, LinRui Yang, Yunchuan Sun, and Dalian Liu. 2021. Cross-domain end-to-end aspect-based sentiment analysis with domain-dependent embeddings. Complex, 2021:5529312:1–5529312:11.
  • Johnson et al. (2019) Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
  • Li et al. (2022) Rongfan Li, Ting Zhong, Xinke Jiang, Goce Trajcevski, Jin Wu, and Fan Zhou. 2022. Mining spatio-temporal relations via self-paced graph contrastive learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 936–944.
  • Liang et al. (2022) Shuo Liang, Wei Wei, Xian-Ling Mao, Fei Wang, and Zhiyong He. 2022. Bisyn-gat+: Bi-syntax aware graph attention network for aspect-based sentiment analysis. In Findings of the Association for Computational Linguistics: ACL 2022, pages 799–810.
  • Liu et al. (2022) Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2022. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online. Association for Computational Linguistics.
  • Loshchilov and Hutter (2017) Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  • Luo et al. (2024a) Jiayuan Luo, Songhua Yang, Xiaoling Qiu, Panyu Chen, Yufei Nai, Wenxuan Zeng, Wentao Zhang, and Xinke Jiang. 2024a. Kuaiji: the first chinese accounting large language model. arXiv preprint arXiv:2402.13866.
  • Luo et al. (2024b) Jiayuan Luo, Wentao Zhang, Yuchen Fang, Xiaowei Gao, Dingyi Zhuang, Hao Chen, and Xinke Jiang. 2024b. Timeseries suppliers allocation risk optimization via deep black litterman model. arXiv preprint arXiv:2401.17350.
  • Luo et al. (2022) Yun Luo, Hongjie Cai, Linyi Yang, Yanxia Qin, Rui Xia, and Yue Zhang. 2022. Challenges for open-domain targeted sentiment analysis. arXiv preprint arXiv:2204.06893.
  • Luo et al. (2023) Ziyang Luo, Can Xu, Pu Zhao, Xiubo Geng, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. Augmented large language models with parametric knowledge guiding. arXiv preprint arXiv:2305.04757.
  • Min et al. (2022) Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
  • Nazir et al. (2020) Ambreen Nazir, Yuan Rao, Lianwei Wu, and Ling Sun. 2020. Issues and challenges of aspect-based sentiment analysis: A comprehensive survey. IEEE Transactions on Affective Computing, 13(2):845–863.
  • Pang et al. (2008) Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in information retrieval, 2(1–2):1–135.
  • Raffel et al. (2020) Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  • Roccabruna et al. (2022) Gabriel Roccabruna, Steve Azzolin, and Giuseppe Riccardi. 2022. Multi-source multi-domain sentiment analysis with bert-based models. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 581–589.
  • Rubin et al. (2022) Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2022. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671.
  • Scaria et al. (2023) Kevin Scaria, Himanshu Gupta, Siddharth Goyal, Saurabh Arjun Sawant, Swaroop Mishra, and Chitta Baral. 2023. Instructabsa: Instruction learning for aspect based sentiment analysis. arXiv preprint arXiv:2302.08624.
  • Schouten and Frasincar (2016) Kim Schouten and Flavius Frasincar. 2016. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, page 813–830.
  • Shi et al. (2023) Jingli Shi, Weihua Li, Quan Bai, Yi Yang, and Jianhua Jiang. 2023. Syntax-enhanced aspect-based sentiment analysis with multi-layer attention. Neurocomputing, 557:126730.
  • Touvron et al. (2023a) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023a. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  • Touvron et al. (2023b) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023b. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  • Varia et al. (2022) Siddharth Varia, Shuai Wang, Kishaloy Halder, Robert Vacareanu, Miguel Ballesteros, Yassine Benajiba, Neha Ann John, Rishita Anubhai, Smaranda Muresan, and Dan Roth. 2022. Instruction tuning for few-shot aspect-based sentiment analysis. In Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.
  • Wang et al. (2020) Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, and Rui Wang. 2020. Relational graph attention network for aspect-based sentiment analysis. In Annual Meeting of the Association for Computational Linguistics.
  • Wang et al. (2022) Yadong Wang, Chen Liu, Jinge Xie, Songhua Yang, Yuxiang Jia, and Hongying Zan. 2022. Aspect-based sentiment analysis with dependency relation graph convolutional network. 2022 International Conference on Asian Language Processing (IALP), pages 63–68.
  • Wang et al. (2023) Zengzhi Wang, Qiming Xie, Zixiang Ding, Yi Feng, and Rui Xia. 2023. Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339.
  • Wei et al. (2022) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  • Wu et al. (2021) Shengqiong Wu, Hao Fei, Yafeng Ren, Donghong Ji, and Jingye Li. 2021. Learn from syntax: Improving pair-wise aspect and opinion terms extractionwith rich syntactic knowledge. In International Joint Conference on Artificial Intelligence.
  • Wu et al. (2020) Zhen Wu, Chengcan Ying, Fei Zhao, Zhifang Fan, Xinyu Dai, and Rui Xia. 2020. Grid tagging scheme for aspect-oriented fine-grained opinion extraction. arXiv preprint arXiv:2010.04640.
  • Yang et al. (2023a) Songhua Yang, Tengxun Zhang, Hongfei Xu, and Yuxiang Jia. 2023a. Improving aspect sentiment triplet extraction with perturbed masking and edge-enhanced sentiment graph attention network. In 2023 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
  • Yang et al. (2023b) Songhua Yang, Hanjia Zhao, Senbin Zhu, Guangyu Zhou, Hongfei Xu, Yuxiang Jia, and Hongying Zan. 2023b. Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. arXiv preprint arXiv:2308.03549.
  • Ye et al. (2023) Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, and Minjoon Seo. 2023. In-context instruction learning. arXiv preprint arXiv:2302.14691.
  • Zhang et al. (2019) Chen Zhang, Qiuchi Li, and Dawei Song. 2019. Syntax-aware aspect-level sentiment classification with proximity-weighted convolution network. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.
  • Zhang and Liu (2012) Lei Zhang and Bing Liu. 2012. Sentiment analysis and opinion mining. In Encyclopedia of Machine Learning and Data Mining.
  • Zhang et al. (2024) Ruizhe Zhang, Xinke Jiang, Yuchen Fang, Jiayuan Luo, Yongxin Xu, Yichen Zhu, Xu Chu, Junfeng Zhao, and Yasha Zhao. 2024. Infinite-horizon graph filters: Leveraging power series to enhance sparse information aggregation. arXiv preprint arXiv:2401.09943.
  • Zhang et al. (2021) Wenxuan Zhang, Yang Deng, Xin Li, Yifei Yuan, Lidong Bing, and Wai Lam. 2021. Aspect sentiment quad prediction as paraphrase generation. arXiv preprint arXiv:2110.00796.
  • Zhang et al. (2023) Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan, and Lidong Bing. 2023. Sentiment analysis in the era of large language models: A reality check. arXiv preprint arXiv:2305.15005.
  • Zhang et al. (2022a) Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing, and Wai Lam. 2022a. A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. IEEE Transactions on Knowledge and Data Engineering.
  • Zhang et al. (2022b) Yiming Zhang, Shi Feng, and Chenhao Tan. 2022b. Active example selection for in-context learning. arXiv preprint arXiv:2211.04486.
  • Zhang et al. (2022c) Zheng Zhang, Zili Zhou, and Yanna Wang. 2022c. Ssegcn: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In North American Chapter of the Association for Computational Linguistics.
  • Zhao et al. (2023) Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223.
  • Zhong et al. (2023) Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, and Dacheng Tao. 2023. Knowledge graph augmented network towards multiview representation learning for aspect-based sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, page 1–14.

9.   Language Resource References

\c@NAT@ctr

 

  • Dong et al. (2014) Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. 2014. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: Short papers), pages 49–54.
  • Hu and Liu (2004) Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177.
  • Luo et al. (2022) Yun Luo, Hongjie Cai, Linyi Yang, Yanxia Qin, Rui Xia, and Yue Zhang. 2022. Challenges for open-domain targeted sentiment analysis. arXiv preprint arXiv:2204.06893.
  • Pontiki et al. (2014) Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 27–35, Dublin, Ireland. Association for Computational Linguistics.
  • Sinha et al. (2022) Ankur Sinha, Satishwar Kedas, Rishu Kumar, and Pekka Malo. 2022. Sentfin 1.0: Entity-aware sentiment analysis for financial news. Journal of the Association for Information Science and Technology, 73(9):1314–1335.
  • Toprak et al. (2010) Cigdem Toprak, Niklas Jakob, and Iryna Gurevych. 2010. Sentence and expression level annotation of opinions in user-generated discourse. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 575–584.