Bertgcn: Transductive Text Classification by Combining GCN and Bert

The document proposes BertGCN, a model that combines BERT (a large pretrained language model) and graph convolutional networks (GCN) for transductive text classification. BertGCN constructs a graph with documents as nodes, represented using BERT embeddings, and propagates label information across the graph using GCN. This allows it to leverage the advantages of both pretrained language models and graph-based semi-supervised learning. The model achieves state-of-the-art performance on several text classification datasets.

Uploaded by

Larissa Feliciana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views7 pages

Bertgcn: Transductive Text Classification by Combining GCN and Bert

Uploaded by

Larissa Feliciana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

BertGCN: Transductive Text Classification

by Combining GCN and BERT

Yuxiao Lin♠ , Yuxian Meng♣ , Xiaofei Sun♣
Qinghong Han♣ , Kun Kuang♠ , Jiwei Li♠♣ and Fei Wu♠
♠
Computer Science Department, Zhejiang University
♣
ShannonAI
{yuxiaolinling, kunkuang, jiwei_li, wufei}@zju.edu.cn
{yuxian_meng, xiaofei_sun, qinghong_han}@shannonai.com

Abstract This makes the model more immune to data out-

liers; (2) at the training time, since the model prop-
In this work, we propose BertGCN, a model
agates influence from supervised labels across both
arXiv:2105.05727v3 [cs.CL] 16 May 2021

that combines large scale pretraining and trans-

ductive learning for text classification. Bert- training and test instances through graph edges,
GCN constructs a heterogeneous graph over unlabeled data also contributes to the process of
the dataset and represents documents as nodes representation learning, and consequently higher
using BERT representations. By jointly train- performances.
ing the BERT and GCN modules within Bert-
GCN, the proposed model is able to lever- Large-scale pretraining has recently demonstrated
age the advantages of both worlds: large-scale their effectiveness on a variety of NLP tasks (De-
pretraining which takes the advantage of the vlin et al., 2018; Liu et al., 2019). Trained on
massive amount of raw data and transductive large-scale unlabeled corpora in an unsupervised
learning which jointly learns representations
manner, large-scale pretrained models are able to
for both training data and unlabeled test data
by propagating label influence through graph learn implicit but rich text semantics in language
convolution. Experiments show that BertGCN at scale. Intuitively, large-scale pretrained mod-
achieves SOTA performances on a wide range els have potentials to benefit transductive learning.
of text classification datasets.1 However, existing models for transductive text clas-
sification (Yao et al., 2019; Liu et al., 2020) did not
1 Introduction take large-scale pretraining into consideration, and
Text classification is a core task in natural language its effectiveness still remains unclear.
processing (NLP) and has been used in many real- In this work, we propose BertGCN, a model that
world applications such as spam detection (Wang, combines the advantages of both large-scale pre-
2010) and opinion mining (Bakshi et al., 2016). training and transductive learning for text clas-
Transductive learning (Vapnik, 1998) is a particular sification. BertGCN constructs a heterogeneous
method for text classification which makes use of graph for the corpus with node being word or docu-
both labeled and unlabeled examples in the train- ment, and node embeddings are initialized with pre-
ing process. Graph neural networks (GNNs) serve trained BERT representations, and uses graph con-
as an effective approach for transductive learning volutional networks (GCN) for classification. By
(Yao et al., 2019; Liu et al., 2020). In these works, jointly training the BERT and GCN modules, the
a graph is constructed to model the relationship be- proposed model is able to leverage the advantages
tween documents. Nodes in the graph represent text of both worlds: large-scale pretraining which takes
units such as words and documents, while edges the advantage of the massive amount of raw data
are constructed based on the semantic similarity be- and transductive learning which jointly learns repre-
tween nodes. GNNs are then applied to the graph sentations for both training data and unlabeled test
to perform node classification. The merits of GNNs data by propagating label influence through graph
and transductive learning are as follows: (1) the de- edges. The proposed BertGCN model successfully
cision for an instance (both training and test) does combines the powers of large-scale pretraining and
not depend merely on itself, but also its neighbors. graph networks, and achieves new state-of-the-art
1
Code available at https://github.com/ performances on a wide range of text classification
ZeroRin/BertGCN. datasets.
2 Related Work large-scale pretraining. Existing works that com-
bine BERT and GNNs uses graph to model rela-
Graph neural networks (GNNs) are connectionist tionships between tokens within a single document
models that capture dependencies and relations be- sample (Lu et al., 2020; He et al., 2020b), which
tween graph nodes via message passing through fall into the category of inductive learning. Dif-
edges that connect nodes (Scarselli et al., 2008; ferent from these works, we use graph to model
Hamilton et al., 2017; Xu et al., 2018). GNNs relationships between different samples from the
are practically categorized into (Wu et al., 2020): whole corpus to utilize the similarity between la-
graph convolutional networks (Kipf and Welling, beled and unlabeled documents, and uses GNNs to
2016a; Wu et al., 2019), graph attention networks learn their relationships.
(Veličković et al., 2017; Zhang et al., 2018a), graph
auto-encoder (Cao et al., 2016; Kipf and Welling, 3 Method
2016b), graph generative networks (De Cao and
Kipf, 2018; Li et al., 2018b) and graph spatial- 3.1 BertGCN
temporal networks (Li et al., 2017; Yu et al., 2017). In the proposed BertGCN model, we initialize rep-
GNNs serve as powerful tools to utilize the relation- resentations for document nodes in a text graph
ship between different objects, and have been ap- using a BERT-style model (e.g., BERT, RoBERTa).
plied to various domains such as traffic prediction These representations are used as inputs to GCN.
(Yu et al., 2018; Zhang et al., 2018a) and recom- Document representations will then be iteratively
mendation (Zhang et al., 2020; Monti et al., 2017). updated based on the graph structures using GCN,
In the context of NLP, GNNs have achieved re- the outputs of which are treated as final represen-
markable successes across a wide range of end tations for document nodes, and are sent to the
tasks such as relation extraction (Zhang et al., softmax classifier for predictions. In this way, we
2018b), semantic role labeling (Marcheggiani and are able to leverage the complementary strengths
Titov, 2017), data-to-text generation (Marcheggiani of pretrained models and graph models.
and Perez-Beltrachini, 2018), machine translation
(Bastings et al., 2017) and question answering Specifically, we construct a heterogeneous graph
(Song et al., 2018; De Cao et al., 2018). containing both word nodes and document nodes
following TextGCN (Yao et al., 2019). We define
The prevalence of neural networks has motivated a word-document edges and word-word edges based
diverse array of works on developing neural models on the term frequency-inverse document frequency
for text classification. Different neural model ar- (TF-IDF) and positive point-wise mutual informa-
chitectures (Kim, 2014; Zhou et al., 2015; Radford tion (PPMI), respectively. The weight of an edge
et al., 2018; Chai et al., 2020) have demonstrated between two nodes i and j is defined as:
their effectiveness against traditional statistical fea-
i, j are words and i 6= j

ture based methods (Wallach, 2006). Other works  PPMI(i, j),
TF-IDF(i, j), i is document, j is word

leverage label embeddings and jointly train them Ai,j =
 1, i=j
along with input texts (Wang et al., 2018; Pappas

0, otherwise
and Henderson, 2019). More recently, the suc- (1)
cess achieved by large-scale pretraining models
has spurred great interests in adapting the large- In TextGCN, an identity matrix X = Indoc +nword
scale pretraining framework (Devlin et al., 2018) is used as initial node features, where ndoc is the
into text classification (Reimers and Gurevych, number of document nodes, nword is the number of
2019), leading to remarkable progressive on few- word nodes (including both training and test). In
shot (Mukherjee and Awadallah, 2020) and zero- BertGCN, we use a BERT-style model to obtain
shot (Ye et al., 2020) learning. the document embeddings, and treat them as input
representations for document nodes. Document
Our work is inspired by the work of using graph node embeddings are denoted by Xdoc ∈ Rndoc ×d ,
neural networks for text classification (Yao et al., where d is the embedding dimensionality. Overall,
2019; Huang et al., 2019; Zhang and Zhang, 2020). the initial node feature matrix is given by:
But different from these works, we focus on com-
bining large-scale pretrained models and GNNs, Xdoc
X= (2)
and show that GNNs can significantly benefit from 0 (ndoc +nword )×d
We feed X into a GCN model (Kipf and Welling, 3.3 Optimization using Memory Bank
2016a) which iteratively propagates messages
The original GCN model uses the full-batch gra-
across training and test examples. Specifically, the
dient descent method for training, which is in-
output feature matrix of the i-th GCN layer L(i) is
tractable for the proposed BertGCN model, since
computed as
the full-batch method can not be applied to BERT
L(i) = ρ(ÃL(i−1) W (i) ) (3) due to the memory limitation. Inspired by tech-
niques in contrastive learning which decouples the
where ρ is an activation function, Ã is the normal- dictionary size from the mini-batch size (Wu et al.,
ized adjacency matrix and W (i) ∈ Rdi−1 ×di is a 2018; He et al., 2020a), we introduce a memory
weight matrix of the layer. L(0) = X is the in- bank that stores all document embeddings to decou-
put feature matrix of the model. Outputs of GCN ple the training batch size from the total number of
are treated as final representations for documents, nodes in the graph.
which is then fed to the softmax layer for classifi-
cation: Specifically, during training, we maintain a mem-
ory bank M that tracks input features for all doc-
ZGCN = softmax(g(X, A)) (4) ument nodes. At the beginning of each epoch, we
first compute all document embeddings using the
where g represents the GCN model. We use the
current BERT module and store them in M . Dur-
cross entropy loss over labeled document nodes to
ing each iteration, we sample a mini batch from
jointly optimize parameters for BERT and GCN.
both labeled and unlabeled document nodes with
3.2 Interpolating BERT and GCN the index set B = {b0 , b1 ...bn }, where n is the
Predictions mini-batch size. We then compute their document
Practically, we find that optimizing BertGCN with embeddings MB also using the current BERT mod-
a auxiliary classifier that directly operates on BERT ule and update the corresponding memories in M .2
embeddings leads to faster convergence and better Next, we use the updated M as input to derive the
performances. Specifically, we construct an auxil- GCN output and compute the loss for the current
iary classifier by directly feeding document embed- mini batch. For back-propagation, M is considered
dings (denoted by X) to a dense layer with softmax as constant except the records in B.
activation:
With the memory bank, we are able to efficiently
ZBERT = softmax(W X) (5) train the BertGCN model including the BERT mod-
ule. However, during training, the embeddings in
The final training objective is the linear interpola- the memory bank are computed using the BERT
tion of the prediction from BertGCN and the pre- module at different steps in an epoch and are thus
diction from BERT, which is given by: inconsistent. To overcome this issue, we set a small
learning rate for the BERT module to improve con-
Z = λZGCN + (1 − λ)ZBERT (6)
sistency of the stored embeddings. With low learn-
where λ controls the tradeoff between the two ob- ing rate the training takes more time. In order to
jectives. λ = 1 means we use the full BertGCN speed up training, we fine-tune a BERT model on
model, and λ = 0 means we only use the BERT the target dataset before training begins, and use it
module. When λ ∈ (0, 1), we are able to balance to initialize the BERT parameters in BertGCN.
the predictions from both models, and the BertGCN
model can be better optimized. 4 Experiments
The explanation for better performances achieved 4.1 Experiment Setups
by the interpolation is as follows: The ZBERT di-
rectly operates on the input of GCN, making sure We run experiments on five widely-used text classi-
that inputs to GCN are regulated and optimized fication benchmarks: 20 Newsgroups (20NG)3 , R8
towards the objective. This helps the multi-layer
2
GCN model to overcome intrinsic drawbacks such Note that the BERT module used to compute MB is the
one finished training in the last iteration, which is different
as gradient vanishing or over-smoothing (Li et al., from the the BERT module used to compute the initial M .
3
2018a), and thus leads to better performances. http://qwone.com/~jason/20Newsgroups/
Model 20NG R8 R52 Ohsumed MR
TextGCN 86.3 97.1 93.6 68.4 76.7
SGC 88.5 97.2 94.0 68.5 75.9
BERT 85.3 97.8 96.4 70.5 85.7
RoBERTa 83.8 97.8 96.2 70.7 89.4
BertGCN 89.3 98.1 96.6 72.8 86.0
RoBERTaGCN 89.5 98.2 96.1 72.8 89.7
BertGAT 87.4 97.8 96.5 71.2 86.5
RoBERTaGAT 86.5 98.0 96.1 71.2 89.2

Table 1: Results for different models on transductive

text classification datasets. We run all models 10 times Figure 1: Accuracy of RoBERTaGCN when varying λ
and report the mean test accuracy. on 20NG development set. The dotted line indicates
the corresponding RoBERTa baseline.7
and R524 , Ohsumed5 and Movie Review (MR)6 .
Strategy w/ both w/o finetune w/o small lr. w/o both
We compare BertGCN to current state-of-the-art Accuracy 94.7 93.8 10.38 10.38

pretrained and GCN models: TextGCN (Yao et al.,

Table 2: Accuracy on 20NG development set for differ-
2019), SGC (Wu et al., 2019), BERT (Devlin et al., ent strategies. “finetune” means we use the finetuned
2018) and RoBERTa (Liu et al., 2019). Details for RoBERTa as initialization, and “small lr.” means we
datasets and baseline are left in the supplementary use a smaller learning rate for the RoBERTa module.
material.
We follow protocols in TextGCN to preprocess data.
tics, which means that long texts may produce
For BERT and RoBERTa, we use the output feature
more document connections transited via an in-
of the [CLS] token as the document embedding,
termediate word node, and this potentially benefits
followed by a feedforward layer to derive the final
message passing through the graph, leading to bet-
prediction. We use BERTbase and a two-layer GCN
ter performances when combined with GCN. This
to implement BertGCN. We initialize the learning
may also explain why GCN models perform bet-
rate to 1e-3 for the GCN module and 1e-5 for the
ter than BERT models on 20NG. For datasets with
fine-tuned BERT module. We also implement our
shorter documents such as R52 and MR, the power
model with RoBERTa and GAT (Veličković et al.,
of graph structure is limited, and thus the perfor-
2017). GAT variants are trained over the same
mance boost is smaller relative to 20NG. BertGAT
graph as GCN variants, but learn edge weights
and RoBERTaGAT can also benefit from the graph
through attention mechanism instead of using pre-
structure, but their performance are not as good
defined weight matrix.
as GCN variants due to the lack of edge weight
4.2 Main Results information.

Table 1 presents the test accuracy of each model. 4.3 The Effect of λ
We can see that BertGCN and RoBERTaGCN per-
form the best across all datasets. Only using BERT λ controls the trade-off between training BertGCN
and RoBERTa generally performs better than GCN and BERT. The optimal value of λ can be different
variants except 20NG, which is due to the great for different tasks. Fig.1 shows the accuracy of
merits brought by large-scale pretraining. Com- RoBERTaGCN with different λ. On 20NG, the
pared with BERT and RoBERTa, the performance accuracy is consistently higher with larger λ value.
boost from BertGCN and RoBERTaGCN is signifi- This can be explained by the high performance of
cant on the 20NG and Ohsumed datasets. This is graph-based methods on 20NG. The model reaches
because the average length in 20NG and Ohsumed its best when λ = 0.7, performing slightly better
is much longer than that in other datasets: the than only using the GCN prediction (λ = 1).
graph is constructed using word-document statis-
4.4 The Effect of Strategies in Joint Training
4
https://www.cs.umb.edu/~smimarog/
textmining/datasets/ 7
The original training/test split of 20NG is based on post
5
http://disi.unitn.it/moschitti/ date, but the development set is randomly sampled from the
corpora.htm original training set. The accuracy on test set is thus much
6 lower than that on development set.
http://www.cs.cornell.edu/people/
8
pabo/movie-review-data/ Experiments without a small lr. failed to converge.
To overcome inconsistency of embeddings in the Nicola De Cao, Wilker Aziz, and Ivan Titov. 2018.
memory bank, we set a smaller learning rate for Question answering by reasoning across documents
with graph convolutional networks. arXiv preprint
the BERT module and use a finetuned BERT
arXiv:1808.09920.
model for initialization. We evaluate the effect
of the two strategies. Table 2 shows the results Nicola De Cao and Thomas Kipf. 2018. Molgan: An
implicit generative model for small molecular graphs.
of RoBERTaGCN on 20NG with and without arXiv preprint arXiv:1805.11973.
these strategies. With the same learning rate for
RoBERTa and GCN, the model cannot be trained Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2018. Bert: Pre-training of deep
due to inconsistency in the memory bank, regard- bidirectional transformers for language understanding.
less of whether the fine-tuned RoBERTa is used. arXiv preprint arXiv:1810.04805.
Models can be successfully trained when we set
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017.
a smaller learning rate for the RoBERTa module, Inductive representation learning on large graphs. In
and additional using finetuned RoBERTa leads to Advances in neural information processing systems,
the best performance. pages 1024–1034.
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and
5 Conclusion and Future Work Ross Girshick. 2020a. Momentum contrast for unsu-
pervised visual representation learning. In Proceedings
In this work, we propose BertGCN, which takes the of the IEEE/CVF Conference on Computer Vision and
best advantages from both large-scale pretraining Pattern Recognition, pages 9729–9738.
models and transductive learning for text classifi- Qi He, Han Wang, and Yue Zhang. 2020b. Enhancing
cation. We efficiently train BertGCN by using a generalization in natural language inference by syntax.
memory bank that stores all document embeddings In Proceedings of the 2020 Conference on Empirical
and updates part of them with respect to the sam- Methods in Natural Language Processing: Findings,
pages 4973–4978.
pled mini batch. The framework of BertGCN can
be built on top of any document encoder and any Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong
Zhang, and Houfeng Wang. 2019. Text level graph
graph model. Experiments demonstrate the power
neural network for text classification. arXiv preprint
of the proposed BertGCN model. However, in arXiv:1910.02356.
this work, we only use document statistics to build
Yoon Kim. 2014. Convolutional neural net-
the graph, which might be sub-optimal compared works for sentence classification. arXiv preprint
to models that are able to automatically construct arXiv:1408.5882.
edges between nodes. We leave this in future work.
Thomas N Kipf and Max Welling. 2016a. Semi-
supervised classification with graph convolutional net-
works. arXiv preprint arXiv:1609.02907.
References
Thomas N Kipf and Max Welling. 2016b. Vari-
Rushlene Kaur Bakshi, Navneet Kaur, Ravneet Kaur, ational graph auto-encoders. arXiv preprint
and Gurpreet Kaur. 2016. Opinion mining and sen- arXiv:1611.07308.
timent analysis. In 2016 3rd International Confer-
ence on Computing for Sustainable Global Develop- Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018a.
ment (INDIACom), pages 452–455. IEEE. Deeper insights into graph convolutional networks for
semi-supervised learning. In Proceedings of the AAAI
Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Conference on Artificial Intelligence, volume 32.
Marcheggiani, and Khalil Sima’an. 2017. Graph con-
volutional encoders for syntax-aware neural machine Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu.
translation. In Proceedings of the 2017 Conference on 2017. Diffusion convolutional recurrent neural net-
Empirical Methods in Natural Language Processing, work: Data-driven traffic forecasting. arXiv preprint
pages 1957–1967, Copenhagen, Denmark. Association arXiv:1707.01926.
for Computational Linguistics. Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu,
and Peter Battaglia. 2018b. Learning deep generative
Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep
models of graphs. arXiv preprint arXiv:1803.03324.
neural networks for learning graph representations. In
Proceedings of the AAAI Conference on Artificial Intel- Xien Liu, Xinxin You, Xiao Zhang, Ji Wu, and Ping
ligence, volume 30. Lv. 2020. Tensor graph convolutional networks for text
classification.
Duo Chai, Wei Wu, Qinghong Han, Fei Wu, and Jiwei
Li. 2020. Description based text classification with re- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
inforcement learning. In International Conference on dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke
Machine Learning, pages 1371–1382. PMLR. Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A
robustly optimized bert pretraining approach. arXiv Alex Hai Wang. 2010. Don’t follow me: Spam de-
preprint arXiv:1907.11692. tection in twitter. In 2010 international conference
on security and cryptography (SECRYPT), pages 1–10.
Zhibin Lu, Pan Du, and Jian-Yun Nie. 2020. Vgcn- IEEE.
bert: augmenting bert with graph embedding for text
classification. In European Conference on Information Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe
Retrieval, pages 369–382. Springer. Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao,
and Lawrence Carin. 2018. Joint embedding of words
Diego Marcheggiani and Laura Perez-Beltrachini. and labels for text classification. arXiv preprint
2018. Deep graph convolutional encoders for struc- arXiv:1805.04174.
tured data to text generation. In Proceedings of the
11th International Conference on Natural Language Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr,
Generation, pages 1–9, Tilburg University, The Nether- Christopher Fifty, Tao Yu, and Kilian Q Weinberger.
lands. Association for Computational Linguistics. 2019. Simplifying graph convolutional networks.
arXiv preprint arXiv:1902.07153.
Diego Marcheggiani and Ivan Titov. 2017. Encoding
sentences with graph convolutional networks for se- Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua
mantic role labeling. In Proceedings of the 2017 Con- Lin. 2018. Unsupervised feature learning via non-
ference on Empirical Methods in Natural Language parametric instance discrimination. In Proceedings of
Processing, pages 1506–1515, Copenhagen, Denmark. the IEEE Conference on Computer Vision and Pattern
Association for Computational Linguistics. Recognition, pages 3733–3742.

Federico Monti, Michael M Bronstein, and Xavier Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong
Bresson. 2017. Geometric matrix completion with re- Long, Chengqi Zhang, and S Yu Philip. 2020. A com-
current multi-graph neural networks. In Proceedings prehensive survey on graph neural networks. IEEE
of the 31st International Conference on Neural Infor- Transactions on Neural Networks and Learning Sys-
mation Processing Systems, pages 3700–3710. tems.

Subhabrata Mukherjee and Ahmed Hassan Awadallah. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie
2020. Uncertainty-aware self-training for text classifi- Jegelka. 2018. How powerful are graph neural net-
cation with few labels. works? arXiv preprint arXiv:1810.00826.

Nikolaos Pappas and James Henderson. 2019. Gile: Liang Yao, Chengsheng Mao, and Yuan Luo. 2019.
A generalized input-label embedding for text classifi- Graph convolutional networks for text classification. In
cation. Transactions of the Association for Computa- Proceedings of the AAAI Conference on Artificial Intel-
tional Linguistics, 7:139–155. ligence, volume 33, pages 7370–7377.

Alec Radford, Karthik Narasimhan, Tim Salimans, and Zhiquan Ye, Yuxia Geng, Jiaoyan Chen, Jingmin Chen,
Ilya Sutskever. 2018. Improving language understand- Xiaoxiao Xu, SuHang Zheng, Feng Wang, Jun Zhang,
ing by generative pre-training. and Huajun Chen. 2020. Zero-shot text classification
via reinforced self-training. In Proceedings of the 58th
Nils Reimers and Iryna Gurevych. 2019. Sentence- Annual Meeting of the Association for Computational
bert: Sentence embeddings using siamese bert- Linguistics, pages 3014–3024, Online. Association for
networks. arXiv preprint arXiv:1908.10084. Computational Linguistics.

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017.
Hagenbuchner, and Gabriele Monfardini. 2008. The Spatio-temporal graph convolutional networks: A deep
graph neural network model. IEEE Transactions on learning framework for traffic forecasting. arXiv
Neural Networks, 20(1):61–80. preprint arXiv:1709.04875.

Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018.
Florian, and Daniel Gildea. 2018. Exploring graph- Spatio-temporal graph convolutional networks: a deep
structured passage representation for multi-hop read- learning framework for traffic forecasting. In Proceed-
ing comprehension with graph neural networks. arXiv ings of the 27th International Joint Conference on Arti-
preprint arXiv:1809.02040. ficial Intelligence, pages 3634–3640.

Vladimir N. Vapnik. 1998. Statistical Learning Theory. Haopeng Zhang and Jiawei Zhang. 2020. Text graph
Wiley-Interscience. transformer for document classification. In Proceed-
ings of the 2020 Conference on Empirical Methods in
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Natural Language Processing (EMNLP), pages 8322–
Adriana Romero, Pietro Lio, and Yoshua Bengio. 8327.
2017. Graph attention networks. arXiv preprint
arXiv:1710.10903. Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin
King, and Dit Yan Yeung. 2018a. Gaan: Gated atten-
Hanna M Wallach. 2006. Topic modeling: beyond bag- tion networks for learning on large and spatiotemporal
of-words. In Proceedings of the 23rd international con- graphs. In 34th Conference on Uncertainty in Artificial
ference on Machine learning, pages 977–984. Intelligence 2018, UAI 2018.
Shengyu Zhang, Ziqi Tan, Zhou Zhao, Jin Yu, Kun
Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, and
Fei Wu. 2020. Comprehensive information integration
modeling framework for video titling. In Proceedings
of the 26th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining, pages 2744–
2754.
Yuhao Zhang, Peng Qi, and Christopher D Manning.
2018b. Graph convolution over pruned dependency
trees improves relation extraction. arXiv preprint
arXiv:1809.10185.
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Fran-
cis Lau. 2015. A c-lstm neural network for text classi-
fication. arXiv preprint arXiv:1511.08630.

RCNN
No ratings yet
RCNN
10 pages
Graph Neural Networks For Natural Language Processing: A Survey
No ratings yet
Graph Neural Networks For Natural Language Processing: A Survey
127 pages
Ye Et Al. 2024
No ratings yet
Ye Et Al. 2024
19 pages
Text Classification Optimization Algorithm Based On Graph Neural Network
No ratings yet
Text Classification Optimization Algorithm Based On Graph Neural Network
9 pages
Project Themes
No ratings yet
Project Themes
23 pages
Aph Language Models
No ratings yet
Aph Language Models
18 pages
Hierarchical Graph-Based Text Classification Framework With Contextual
No ratings yet
Hierarchical Graph-Based Text Classification Framework With Contextual
18 pages
ArXiv-2024-MingZhang-0-Towards Graph Contrastive Learning A Survey and Beyond
No ratings yet
ArXiv-2024-MingZhang-0-Towards Graph Contrastive Learning A Survey and Beyond
35 pages
Graph Neural Networks For Antisocial Behavior
No ratings yet
Graph Neural Networks For Antisocial Behavior
14 pages
57 Exploring The Potential of Lar
No ratings yet
57 Exploring The Potential of Lar
31 pages
GNNs
No ratings yet
GNNs
28 pages
12.advanced DL Topics
No ratings yet
12.advanced DL Topics
104 pages
Gnns
No ratings yet
Gnns
75 pages
2305 19523
No ratings yet
2305 19523
22 pages
推荐系统图提示
No ratings yet
推荐系统图提示
10 pages
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
No ratings yet
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
12 pages
L - N C G L L M (LLM) : Abel Free ODE Lassification ON Raphs With Arge Anguage Odels S
No ratings yet
L - N C G L L M (LLM) : Abel Free ODE Lassification ON Raphs With Arge Anguage Odels S
20 pages
Graph Neural Networks Without Training: Harnessing The Power of Labels As Input Features
No ratings yet
Graph Neural Networks Without Training: Harnessing The Power of Labels As Input Features
7 pages
Bert-Based Graph Unlinked Embedding For Sentiment Analysis: Youkai Jin Anping Zhao
No ratings yet
Bert-Based Graph Unlinked Embedding For Sentiment Analysis: Youkai Jin Anping Zhao
12 pages
(KDD 2023) All in One - Multi-Task Prompting For Graph Neural Networks
No ratings yet
(KDD 2023) All in One - Multi-Task Prompting For Graph Neural Networks
12 pages
A Survey of Graph Meets Large Language Model: Progress and Future Directions
No ratings yet
A Survey of Graph Meets Large Language Model: Progress and Future Directions
13 pages
Electronics 12 03087
No ratings yet
Electronics 12 03087
10 pages
Natural Language Is All A Graph Needs
No ratings yet
Natural Language Is All A Graph Needs
21 pages
Ai 2
No ratings yet
Ai 2
10 pages
An Integration Model Based On Graph
No ratings yet
An Integration Model Based On Graph
12 pages
Transformer and Graph Convolutional Network For Text Classification
No ratings yet
Transformer and Graph Convolutional Network For Text Classification
11 pages
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
No ratings yet
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
12 pages
Local Discriminative Graph Convolutional Networks For Text
No ratings yet
Local Discriminative Graph Convolutional Networks For Text
11 pages
News Classsification
No ratings yet
News Classsification
11 pages
Ke, P., Et Al. (2021) - Jointgt - Graph-Text Joint Representation Learning For Text Generation From Knowledge Graphs. Arxiv
No ratings yet
Ke, P., Et Al. (2021) - Jointgt - Graph-Text Joint Representation Learning For Text Generation From Knowledge Graphs. Arxiv
13 pages
(IJIT-V10I4P1) :anup Roy, Arnab Bhattacharya, Rachna Saxena, Nrutyangana Mohapatra, Abhijeet Kumar, Mridul Mishra
No ratings yet
(IJIT-V10I4P1) :anup Roy, Arnab Bhattacharya, Rachna Saxena, Nrutyangana Mohapatra, Abhijeet Kumar, Mridul Mishra
13 pages
49,797, Philosophy in The Flesh
56% (9)
49,797, Philosophy in The Flesh
298 pages
InducT-GCN: Inductive Graph Convolutional
No ratings yet
InducT-GCN: Inductive Graph Convolutional
7 pages
DocBERT - BERT For Document Classification
No ratings yet
DocBERT - BERT For Document Classification
7 pages
Document and Word Representations Generated by
No ratings yet
Document and Word Representations Generated by
7 pages
Text Classification Research Based On Bert Model and Bayesian Network
No ratings yet
Text Classification Research Based On Bert Model and Bayesian Network
5 pages
BLEURT: Learning Robust Metrics For Text Generation
No ratings yet
BLEURT: Learning Robust Metrics For Text Generation
12 pages
Dynamic Embedding Projection-Gated
No ratings yet
Dynamic Embedding Projection-Gated
10 pages
A Survey of Graph Meets Large Language Model: Progress and Future Directions
No ratings yet
A Survey of Graph Meets Large Language Model: Progress and Future Directions
13 pages
A Text Classification Model Based On GCN and BiGRU Fusion
No ratings yet
A Text Classification Model Based On GCN and BiGRU Fusion
5 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
Fast-and-Frugal Text-Graph Transformers Are Effective Link Predictors
No ratings yet
Fast-and-Frugal Text-Graph Transformers Are Effective Link Predictors
14 pages
Graph Contrastive Learning With Augmentations
No ratings yet
Graph Contrastive Learning With Augmentations
12 pages
LU5: Deep Feedforward Networks: Hidden Units, Architecture Design
No ratings yet
LU5: Deep Feedforward Networks: Hidden Units, Architecture Design
15 pages
A Survey of Graph Meets Large Language Model - Progress and Future Directions
No ratings yet
A Survey of Graph Meets Large Language Model - Progress and Future Directions
13 pages
CAAI Trans On Intel Tech - 2024 - Sharma - Image and Video Analysis Using Graph Neural Network For Internet of Medical
No ratings yet
CAAI Trans On Intel Tech - 2024 - Sharma - Image and Video Analysis Using Graph Neural Network For Internet of Medical
15 pages
Deep Learning
No ratings yet
Deep Learning
243 pages
A Comparison Between Recursive Neural Networks and Graph Neural Networks
No ratings yet
A Comparison Between Recursive Neural Networks and Graph Neural Networks
8 pages
Evaluating Generative Models For Graph-to-Text Generation
No ratings yet
Evaluating Generative Models For Graph-to-Text Generation
9 pages
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
No ratings yet
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
12 pages
Recurrent Neural Network For Text Classification With Multi-Task Learning
No ratings yet
Recurrent Neural Network For Text Classification With Multi-Task Learning
7 pages
A Generalization of Transformer Networks To Graphs
No ratings yet
A Generalization of Transformer Networks To Graphs
8 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
Deep Learning For Natural Language Inference: NAACL-HLT 2019 Tutorial
No ratings yet
Deep Learning For Natural Language Inference: NAACL-HLT 2019 Tutorial
181 pages
Recurrent Convolutional Neural Networks For Text Classification
No ratings yet
Recurrent Convolutional Neural Networks For Text Classification
7 pages
Fundamental of AI
No ratings yet
Fundamental of AI
111 pages
Cse Btech Vi Sem Scheme Syllabus Jan 2022 1
No ratings yet
Cse Btech Vi Sem Scheme Syllabus Jan 2022 1
24 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
22 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
16 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
30 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Technical Paper Adrina Chin Chui Mae
No ratings yet
Technical Paper Adrina Chin Chui Mae
13 pages
AVSR Project Report
No ratings yet
AVSR Project Report
62 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
24 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
TS03J Nyoagbe Ayer Et Al 12205
No ratings yet
TS03J Nyoagbe Ayer Et Al 12205
18 pages
Electronics 12 00911 v2
No ratings yet
Electronics 12 00911 v2
19 pages
Breast Cancer Classification-IEEE Paper
No ratings yet
Breast Cancer Classification-IEEE Paper
17 pages
MatterPort Dataset
No ratings yet
MatterPort Dataset
20 pages
The Promise of Convolutional Neural Networks
No ratings yet
The Promise of Convolutional Neural Networks
13 pages
Guide For LULC Prediction
100% (1)
Guide For LULC Prediction
7 pages
5G and Artificial Intelligence Interactive Technology Applied in Preschool Education Courses - Liao, Gu - Wireless Communications and Mo
No ratings yet
5G and Artificial Intelligence Interactive Technology Applied in Preschool Education Courses - Liao, Gu - Wireless Communications and Mo
11 pages
Batch A6 - Project Requirements
No ratings yet
Batch A6 - Project Requirements
5 pages
Unit-4 Soft Comp Fuzzy Logic
No ratings yet
Unit-4 Soft Comp Fuzzy Logic
12 pages
IISc DL Detailed Curriculum
No ratings yet
IISc DL Detailed Curriculum
7 pages
Multiphase Flowing BHP Prediction
No ratings yet
Multiphase Flowing BHP Prediction
15 pages
Subtitle
No ratings yet
Subtitle
7 pages
A Deep Learning-Based Approach To Predict Crop Yield in Indian States Based On Climate and Soil Data
No ratings yet
A Deep Learning-Based Approach To Predict Crop Yield in Indian States Based On Climate and Soil Data
2 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
11 pages
HCL Shiv Vlsi
No ratings yet
HCL Shiv Vlsi
12 pages
Sammy 4
No ratings yet
Sammy 4
6 pages
Self-Driving Car Engineer Nanodegree Syllabus
No ratings yet
Self-Driving Car Engineer Nanodegree Syllabus
7 pages
Cmpe 256 - Midterm - Report
No ratings yet
Cmpe 256 - Midterm - Report
3 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Bertgcn: Transductive Text Classification by Combining GCN and Bert

Uploaded by

Bertgcn: Transductive Text Classification by Combining GCN and Bert

Uploaded by

BertGCN: Transductive Text Classification

by Combining GCN and BERT

Abstract This makes the model more immune to data out-

that combines large scale pretraining and trans-

Table 1: Results for different models on transductive

pretrained and GCN models: TextGCN (Yao et al.,

You might also like