Local Discriminative Graph Convolutional Networks For Text
Local Discriminative Graph Convolutional Networks For Text
https://doi.org/10.1007/s00530-023-01112-y
REGULAR PAPER
Received: 21 February 2023 / Accepted: 14 May 2023 / Published online: 29 May 2023
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023
Abstract
Recently, graph convolutional networks (GCNs) has demonstrated great success in the text classification. However, the GCN
only focuses on the fitness between the ground-truth labels and the predicted ones. Indeed, it ignores the local intra-class
diversity and local inter-class similarity that is implicitly encoded by the graph, which is an important cue in machine learning
field. In this paper, we propose a local discriminative graph convolutional network (LDGCN) to boost the performance of
text classification. Different from the text GCN that minimize only the cross entropy loss, our proposed LDGCN is trained
by optimizing a new discriminative objective function. So that, in the new LDGCN feature spaces, the texts from the same
scene class are mapped closely to each other and the texts of different classes are mapped as farther apart as possible. So as
to ensure that the features extracted by GCN have good discriminative ability, achieve the maximum separability of samples.
Experimental results demonstrate its superiority against the baselines.
Keywords Text classification · Graph convolutional network · Discriminative information · Manifold structures
13
Vol.:(0123456789)
2364 B. Wang et al.
errors. For this reason, how to learn a more powerful Text- improved Text CNN models have emerged. For instance, Yin
GCN feature representation with smaller intra-class disper- et al. [16] introduced a model that incorporates both CNN
sion and larger inter-class separation at the same time is and Attention mechanisms, utilizing Euclidean distance to cal-
an urgent problem to be solved. Existing GNN-based text culate the attention matrix, and then applying it to the CNN
classifiers only make use of softmax or cross entropy objec- pooling operation. Zhang et al. [12] proposed a character-level
tive function to learn an optimal text representation that is CNN, which uses features as the basic unit of input, allow-
most similar to the ground-truth label for each document, ing for greater versatility, and tries to learn from character
while these classifiers fail to consider both intra-class and sequences to obtain text representations. Liu et al. [17] used
inter-class manifold structures of samples in a corpus. The long short-term memory (LSTM) to learn text representation
intra-class term describes manifold structures of samples in for text classification. Although these methods have proven
the same class, while the inter-class term describes mani- to be effective, they typically focus on local, contiguous word
fold structures of samples from different classes. Addition- sequences without explicitly leveraging the global word co-
ally, considering the manifold structures is able to further occurrence information present in the corpus. Recently, pre-
improve performance on various classification tasks. As trained models such as BERT [18] and RoBERTa [19] have
such, how to make full use of manifold structures of docu- shown impressive results in different NLP tasks, even with
ments has become a crucial task for text classification. limited training data. Nevertheless, utilizing these pre-trained
To overcome the aforementioned challenges, we pro- models necessitates significant computation and external
pose a local discriminative graph convolutional network knowledge resources, which might not always be readily
(LDGCN). Our approach involves constructing the local available.
inter-class scatter matrix and local intra-class scatter of text Recently, Graph-based neural networks have shown supe-
data, which are introduced into the TextGCN as a discrimi- rior ability to capture global information compared to sequen-
native term. In the new LDGCN feature spaces, texts from tial learning models and have been widely applied in NLP
the same class are mapped closely together, while texts from tasks [20–23]. Yao et al. [7] utilized standard GCN [24] in
different classes are mapped as far apart as possible. In con- text classification. Cao et al. [25] encodes graphs from dif-
trast to TextGCN, which solely minimizes the cross-entropy ferent perspectives, allowing it to learn aligned embeddings
loss, our proposed method can minimize the intra-class that enhance the model’s robustness to structural changes. Dai
distance and maximize the inter-class distance while mini- et al. [26] introduced the graph fusion network (GFN), which
mizing the cross-entropy loss function. This feature makes integrates external knowledge and multiple views of the text
the LDGCN model more discriminative and improves the graph to capture sufficient structural information. Several stud-
effectiveness of TextGCN in text classification. Two contri- ies have focused on combining local and global information to
butions are as follows: enhance GCN research. Zhu et al. [27] proposed a global and
local dependency-guided GCN (GL-GCN). Jin et al. [28] pro-
1. To solve the problem that GCN ignores the local intra- posed BiTe-GCN, which employs bi-directional convolution
class diversity and local inter-class similarity, we pro- of topology and features to merge global and local information
pose a novel GCN method, our proposed LDGCN is in a text-rich network for modeling.
trained by optimizing a new discriminative objective Our method is also based on the remarkable GCN frame-
function. work. However, different from prior work, our approach is
2. We design a discriminant regularization framework, primarily concerned with enhancing the GCN feature repre-
which uses manifold learning to capture the local dis- sentations through local intra-class diversity and local inter-
criminative manifold structure of text data. class similarity. Our method not only preserves the GCN
model’s capability to capture global information but also
considers the local manifold information of the data. To the
2 Related work best of our knowledge, local manifold information has been
extensively applied in the area of image processing [29–31].
Deep learning has shown exceptional performance in text However, there is currently no research on the impact of
classification, existing commonly used approaches based on local intra-class diversity and local inter-class similarity on
CNN [12], RNN [13], and combinations of various models the performance of GCN for text classification tasks.
with other traditional techniques [14, 15]. Among these, CNN
has proven to be particularly effective. Kim et al. [4] proposed
a classic Text CNN model, which takes a matrix composed 3 Method
of word vectors as input, and obtains a vector representing
the sentence through a convolution operation to handle text This section introduces the LDGCN model. And we pro-
classification tasks. Building on this foundation, a number of vides the overall framework and algorithm tables.
13
Local discriminative graph convolutional networks for text classification 2365
3.1 Overview of the proposed method vocabulary. The weights for document and word nodes are
based on the term frequency-inverse document frequency
Our proposed LDGCN model is illustrated in Fig. 1. We (TF-IDF), while the connections between word nodes are
start by extracting text features, and then compute the intra- determined by global word co-occurrence information. We
class scatter and inter-class scatter of the text features. obtain this information by sliding a fixed-size window across
Finally, we add the scatter as discriminative regular terms the corpus and calculating the point-wise mutual informa-
to the TextGCN loss function. This approach incorporates tion (PMI) between pairs of words to assign connection
discriminative information into the fully connected feature weights. The PMI is calculated as follows:
layer extracted by TextGCN, and enables joint training of the
p(i, j)
TextGCN classification loss and discriminant regularization PMI(i, j) = log (1)
p(i) ⋅ p(j)
loss to obtain the final text classification result. This joint
method enhances the model’s ability to distinguish associa-
tions and improves the feature expression of the model. In p(i, j) = log
N(i, j)
(2)
Fig. 1, the blue dots in the text heterogeneous graph repre- N
sent documents, while the yellow dots represent words. The
solid line represents the connections between documents N(i)
p(i) = log (3)
and words, and the dotted line represents the connections N
between words. We use the adjacency matrix calculated
Where N is the total number of sliding windows, N(i, j) rep-
from the heterogeneous graph as the TextGCN model input.
resents the number of sliding windows containing nodes i
and j at the same time, N(i) represents the number of sliding
3.2 Text graph convolutional networks
windows containing node i, p(i, j) represents the probability
that both nodes i and j are included, and p(i) represents the
To construct a topological graph from a text corpus, we rep-
probability that the sliding window contains node i. Thus,
resent its nodes as a combination of documents and words.
the weight Aij of the edge between nodes i and j is obtained,
The graph consists of |voc| + |doc| nodes, where |doc| repre-
which is defined as follows:
sents the number of documents and |voc| represents the total
13
2366 B. Wang et al.
13
Local discriminative graph convolutional networks for text classification 2367
Table 1 Dataset description Dataset # Docs # Train # Test # Words # Nodes #Classes #Avg Length
We obtain a concrete representation of the discriminant using the NLTK5 library and removed words that appeared
regularity term. less than 5 times in the 20NG, R8, R52, and Ohsumed data-
sets. However, we did not remove any words from the MR
J2 = tr(Sw − 𝜆Sb ) (15) dataset after cleaning and tokenizing the raw text, since the
Where 𝜆 is the ratio coefficient for adjusting Sw and Sb , tr is documents in this dataset are extremely short.
the operation to find the trace of the matrix. The loss func-
tion of the proposed method LDGCN in this paper:
∑∑
C
J=− Y + 𝜆1 tr(Sw − 𝜆Sb ) (16)
i∈k j=1
1
http://disi.unitn.it/moschitti/corpora.htm.
2
https://www.cs.umb.edu/smimarog/textmining/datasets/
3
http://www.cs.cornell.edu/people/pabo/movie-review-data/.
4
http://qwone.com/jason/20Newsgroups/.
5
http://www.nltk.org/.
13
2368 B. Wang et al.
13
Local discriminative graph convolutional networks for text classification 2369
5 Results analysis a window size of 5. Notably, when the window size was set
to 20, both models yielded a test accuracy of 0.8624. Based
In Table 2, LDGCN model shows the significant improve- on these results, we selected a window size of 20 for our
ment over all baseline models across five datasets. Nota- experiments on the 20NG dataset, as this value can better
bly, when using the pre-trained word embeddings, the CNN reflect the effectiveness of our proposed model. For the R8
obtain the best results on the MR dataset, which suggests dataset, our proposed model achieved the highest testing
that it excels at modeling short-range semantics and con- accuracy with a window size of 5, while the lowest accuracy
tinuous data. was observed with a window size of 25. On the other hand,
We observed that TextGCN performs significantly worse TextGCN achieved its highest testing accuracy with a win-
than sequential models on the MR dataset. PTE and fast- dow size of 30 and its lowest with a window size of 15. To
Text exhibit superior performance over PV-DBOW as they obtain an average performance across all window sizes, we
adopt supervised learning approaches to generate document conducted experiments with a sliding window size of 20.The
embeddings, allowing for more discriminative embeddings experiment on the R52 dataset showed that our proposed
through the use of label information. Two recently intro- model and TextGCN achieved the highest test accuracy with
duced methods, SWEM and LEAM, showcase notable a sliding window size of 30. We chose a sliding window
results, highlighting the effectiveness of pooling methods size of 30 for our experiments on this dataset. Regarding
and the use of label descriptions and embeddings. the Ohsumed dataset, our proposed model and TextGCN
It is worth noting that models GNN-based models tend to achieved the best performance with a sliding window size
outperform sequential and bag-of-words models, primarily of 25, while both models performed the worst with a sliding
due to their ability to capture global word co-occurrence window size of 20. Hence, we chose a sliding window size
in the corpus. Compared with the TextGCN and the other of 25 for our experiments. On the MR dataset, both our pro-
variants, LDGCN has significant improvements in terms of posed model and TextGCN achieved the best test accuracy
accuracy on five datasets. Specifically, on the R52, Ohsumed with a sliding window size of 15. Therefore, we selected a
and MR datasets, the results obtained by LDGCN improves sliding window size of 15 for our experiments on this data-
by about 2%. While that on both the 20NG and the R8 set. According to the results, the optimal starting position of
dataset is nearly 1%. Such enhancements demonstrate that the sliding window can be selected for each dataset, which
considering the power of discriminative is effective for the will result in the best classification performance.
long text classification. LDGCN incorporates the concept
of local linear discriminant analysis to develop a novel loss 5.2 Effects of the proportions of labeled data
function for training the text graph convolutional neural
network. This loss function aims to minimize the feature We choose the TextGCN and LDGCN to study the impact
distance within a class while simultaneously maximizing of the number of labelled documents. We vary the ratio of
the feature distance between classes, thereby enhancing the labelled documents and compare their performance on the
GCN’s discriminative feature capabilities. five datasets. Figure 4 reports test accuracies with 1%, 5%,
10%, and 20% of the dataset. We note that our LDGCN
5.1 Effects of the Size of Sliding Window outperforms TextGCN consistently. For instance, LDGCN
achieves a test accuracy of 0.7715 on 20NG with only 10%
In this subsection, we investigate the effects of the size of training documents which are higher than the TextGCN
the sliding window will affect the final model classifica- with even the 20% training documents and a test accuracy
tion performance when building the text map structure. We of 0.9132 on R8 with only 1% training documents. We can
adjusted the parameters of the sliding window on five data- also observe that the results are getting better and better
sets, and the experimental results are shown in Fig. 3. That with the increase of the proportion of labeled data, while
the accuracy of the proposed method is better than TextGCN our method fluctuate less than the experimental results of the
in all cases. We reports test accuracies with 5, 10, 15, 20, TextGCN. This shows that our approach is better equipped
25, 30 size of sliding window. We note that our LDGCN to make the most out of the limited amount of labeled data
outperforms TextGCN consistently. For instance, LDGCN available for text classification. Which again demonstrates
achieves a test accuracy of 0.9812 on R8 with a sliding win- that our method can capture the local discriminative mani-
dow size of 30 which is higher than the TextGCN’s accuracy fold structure of text data and extract richer latent semantic
of 0.9671. information.
On the 20NG dataset, both our proposed model and Text-
GCN achieved their highest test accuracy with a sliding win-
dow size of 30, while the lowest accuracy was observed with
13
2370 B. Wang et al.
13
Local discriminative graph convolutional networks for text classification 2371
Table 3 Recall and Macro-F1 Evaluation standards Model 20NG R8 R52 Ohsumed MR
results for each dataset on
TextGCN and LDGCN Macro-F1 TextGCN 85.57 92.43 65.20 59.09 76.75
LDGCN 87.75 97.33 92.69 59.10 78.22
Recall TextGCN 85.98 92.90 73.80 69.26 76.84
LDGCN 87.97 97.62 95.73 70.94 78.25
combining the classification softmax loss, LDGCN can not and inter-class manifold structures to enhance discrimina-
only learn representative text embeddings and optimize clas- tive power.
sification error, but also leverage the underlying intra-class
13
2372 B. Wang et al.
Acknowledgements This work is supported by the National Key recurrent neural networks. Adv. Neural Inf. Process. Syst. 30
Research and Development Program of China (No. 2022YFC3301801), (2017)
the Fundamental Research Funds for the Central Universities (No. 16. Yin, W., Schütze, H., Xiang, B., Zhou, B.: Abcnn: attention-based
DUT22ZD205). convolutional neural network for modeling sentence pairs. Trans.
Assoc. Comput. Linguist. 4, 259–272 (2016)
17. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text clas-
sification with multi-task learning. In: Proceedings of the Twenty-
References Fifth International Joint Conference on Artificial Intelligence, pp.
2873–2879 (2016)
1. Phan, H.T., Nguyen, N.T., Hwang, D.: Aspect-level sentiment 18. Kenton, J.D.M.-W.C., Toutanova, L.K.: Bert: Pre-training of deep
analysis: a survey of graph convolutional network methods. Inf. bidirectional transformers for language understanding. In: Pro-
Fus. 91, 149–172 (2023) ceedings of NAACL-HLT, pp. 4171–4186 (2019)
2. Parlak, B., Uysal, A.K.: A novel filter feature selection method 19. Oh, S.H., Kang, M., Lee, Y.: Protected health information recog-
for text classification: extensive feature selector. J. Inf. Sci. nition by fine-tuning a pre-training transformer model. Healthc.
49(1), 59–78 (2023) Inform. Res. 28(1), 16–24 (2022)
3. Rao, S., Verma, A.K., Bhatia, T.: A review on social spam 20. Wu, L., Chen, Y., Shen, K., Guo, X., Gao, H., Li, S., Pei, J., Long,
detection: challenges, open issues, and future directions. Expert B.: Graph neural networks for natural language processing: a sur-
Syst. Appl. 186, 115742 (2021) vey. Found. Trends Mach. Learn. 16(2), 119–328 (2023)
4. Chen, Y.: Convolutional neural network for sentence classifica- 21. Wu, J., Zhang, C., Liu, Z., Zhang, E., Wilson, S., Zhang, C.:
tion. Master’s thesis, University of Waterloo (2015) Graphbert: Bridging graph and text for malicious behavior detec-
5. Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text clas- tion on social media. In: 2022 IEEE International Conference on
sification improved by integrating bidirectional lstm with two- Data Mining (ICDM), pp. 548–557 (2022). IEEE
dimensional max pooling. arXiv preprint arXiv:1 611.0 6639 22. Yang, Y., Miao, R., Wang, Y., Wang, X.: Contrastive graph convo-
(2016) lutional networks with adaptive augmentation for text classifica-
6. Peng, H., Li, J., He, Y., Liu, Y., Bao, M., Wang, L., Song, Y., tion. Inf. Process. Manag. 59(4), 102946 (2022)
Yang, Q.: Large-scale hierarchical text classification with recur- 23. Krishnaveni, P., Balasundaram, S.: Generating fuzzy graph based
sively regularized deep graph-cnn. In: Proceedings of the 2018 multi-document summary of text based learning materials. Expert
World Wide Web Conference, pp. 1063–1072 (2018) Syst. Appl. 214, 119165 (2023)
7. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for 24. Kipf, T.N., Welling, M.: Semi-supervised classification with graph
text classification. In: Proceedings of the AAAI Conference on convolutional networks. In: International Conference on Learning
Artificial Intelligence, vol. 33, pp. 7370–7377 (2019) Representations
8. Vashishth, S., Bhandari, M., Yadav, P., Rai, P., Bhattacharyya, 25. Cao, Y., Liu, Z., Li, C., Li, J., Chua, T.-S.: Multi-channel graph
C., Talukdar, P.: Incorporating syntactic and semantic informa- neural network for entity alignment. In: Proceedings of the 57th
tion in word embeddings using graph convolutional networks. Annual Meeting of the Association for Computational Linguistics,
In: Proceedings of the 57th Annual Meeting of the Association pp. 1452–1461 (2019)
for Computational Linguistics, pp. 3308–3318 (2019) 26. Dai, Y., Shou, L., Gong, M., Xia, X., Kang, Z., Xu, Z., Jiang, D.:
9. Liu, X., You, X., Zhang, X., Wu, J., Lv, P.: Tensor graph con- Graph fusion network for text classification. Knowl. Based Syst.
volutional networks for text classification. In: Proceedings of 236, 107659 (2022)
the AAAI Conference on Artificial Intelligence, vol. 34, pp. 27. Zhu, X., Zhu, L., Guo, J., Liang, S., Dietze, S.: Gl-gcn: global and
8409–8416 (2020) local dependency guided graph convolutional networks for aspect-
10. Ragesh, R., Sellamanickam, S., Iyer, A., Bairi, R., Lingam, V.: based sentiment classification. Expert Syst. Appl. 186, 115712
Hetegcn: heterogeneous graph convolutional networks for text (2021)
classification. In: Proceedings of the 14th ACM International 28. Jin, D., Song, X., Yu, Z., Liu, Z., Zhang, H., Cheng, Z., Han, J.:
Conference on Web Search and Data Mining, pp. 860–868 Bite-gcn: A new gcn architecture via bidirectional convolution of
(2021) topology and features on text-rich networks. In: Proceedings of
11. Liu, Y., Guan, R., Giunchiglia, F., Liang, Y., Feng, X.: Deep atten- the 14th ACM International Conference on Web Search and Data
tion diffusion graph neural networks for text classification. In: Mining, pp. 157–165 (2021)
Proceedings of the 2021 Conference on Empirical Methods in 29. Jin, T., Cao, L., Zhang, B., Sun, X., Deng, C., Ji, R.: Hypergraph
Natural Language Processing, pp. 8142–8152 (2021) induced convolutional manifold networks. In: IJCAI, pp. 2670–
12. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional 2676 (2019)
networks for text classification. Adv. Neural Inf. Process. Syst. 30. Deng, Y., Yang, J., Xiang, J., Tong, X.: Gram: Generative radiance
28 (2015) manifolds for 3d-aware image generation. In: Proceedings of the
13. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic repre- IEEE/CVF Conference on Computer Vision and Pattern Recogni-
sentations from tree-structured long short-term memory networks. tion, pp. 10673–10683 (2022)
In: Proceedings of the 53rd Annual Meeting of the Association for 31. Vepakomma, P., Balla, J., Raskar, R.: Privatemail: Supervised
Computational Linguistics and the 7th International Joint Confer- manifold learning of deep features with privacy for image
ence on Natural Language Processing (Volume 1: Long Papers), retrieval. In: Proceedings of the AAAI Conference on Artificial
pp. 1556–1566 (2015) Intelligence, vol. 36, pp. 8503–8511 (2022)
14. Campos Camunez, V., Jou, B., Giró Nieto, X., Torres Viñals, J., 32. Sugiyama, M.: Dimensionality reduction of multimodal labeled
Chang, S.-F.: Skip rnn: learning to skip state updates in recurrent data by local fisher discriminant analysis. J. Mach. Learn. Res.
neural networks. In: Sixth International Conference on Learn- 8(5) (2007)
ing Representations: Monday April 30-Thursday May 03, 2018, 33. Le, Q., Mikolov, T.: Distributed representations of sentences and
Vancouver Convention Center, Vancouver:[proceedings], pp. 1–17 documents. In: International Conference on Machine Learning,
(2018) pp. 1188–1196 (2014). PMLR
15. Chang, S., Zhang, Y., Han, W., Yu, M., Guo, X., Tan, W., Cui, 34. Joulin, A., Grave, É., Bojanowski, P., Mikolov, T.: Bag of tricks for
X., Witbrock, M., Hasegawa-Johnson, M.A., Huang, T.S.: Dilated efficient text classification. In: Proceedings of the 15th Conference
13
Local discriminative graph convolutional networks for text classification 2373
of the European Chapter of the Association for Computational Conference on Empirical Methods in Natural Language Process-
Linguistics: Volume 2, Short Papers, pp. 427–431 (2017) ing and the 9th International Joint Conference on Natural Lan-
35. Tang, J., Qu, M., Mei, Q.: Pte: Predictive text embedding through guage Processing (EMNLP-IJCNLP), pp. 3444–3450 (2019)
large-scale heterogeneous text networks. In: Proceedings of the 40. Zhang, C., Zhu, H., Peng, X., Wu, J., Xu, K.: Hierarchical infor-
21th ACM SIGKDD International Conference on Knowledge Dis- mation matters: Text classification via tree based graph neural
covery and Data Mining, pp. 1165–1174 (2015) network. In: Proceedings of the 29th International Conference on
36. Shen, D., Wang, G., Wang, W., Min, M.R., Su, Q., Zhang, Y., Li, Computational Linguistics, pp. 950–959 (2022)
C., Henao, R., Carin, L.: Baseline needs more love: On simple 41. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization.
word-embedding-based models and associated pooling mecha- arXiv preprint arXiv:1412.6980 (2014)
nisms. In: Proceedings of the 56th Annual Meeting of the Asso- 42. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors
ciation for Computational Linguistics (Volume 1: Long Papers), for word representation. In: Proceedings of the 2014 Conference
pp. 440–450 (2018) on Empirical Methods in Natural Language Processing (EMNLP),
37. Wang, G., Li, C., Wang, W., Zhang, Y., Shen, D., Zhang, X., pp. 1532–1543 (2014)
Henao, R., Carin, L.: Joint embedding of words and labels for
text classification. In: Proceedings of the 56th Annual Meeting of Publisher's Note Springer Nature remains neutral with regard to
the Association for Computational Linguistics (Volume 1: Long jurisdictional claims in published maps and institutional affiliations.
Papers), pp. 2321–2331 (2018)
38. Liu, T., Zhang, X., Zhou, W., Jia, W.: Neural relation extraction Springer Nature or its licensor (e.g. a society or other partner) holds
via inner-sentence noise reduction and transfer learning. In: Pro- exclusive rights to this article under a publishing agreement with the
ceedings of the 2018 Conference on Empirical Methods in Natural author(s) or other rightsholder(s); author self-archiving of the accepted
Language Processing, pp. 2195–2204 (2018) manuscript version of this article is solely governed by the terms of
39. Huang, L., Ma, D., Li, S., Zhang, X., Wang, H.: Text level graph such publishing agreement and applicable law.
neural network for text classification. In: Proceedings of the 2019
13