An Integration Model Based On Graph
An Integration Model Based On Graph
ABSTRACT Graph Convolutional Network (GCN) is extensively used in text classification tasks and
performs well in the process of the non-euclidean structure data. Usually, GCN is implemented with the
spatial-based method, such as Graph Attention Network (GAT). However, the current GCN-based methods
still lack a more reasonable mechanism to account for the problems of contextual dependency and lexical pol-
ysemy. Therefore, an improved GCN (IGCN) is proposed to address the above problems, which introduces
the Bidirectional Long Short-Term Memory (BiLSTM) Network, the Part-of-Speech (POS) information,
and the dependency relationship. From a theoretical point of view, the innovation of IGCN is generalizable
and straightforward: use the short-range contextual dependency and the long-range contextual dependency
captured by the dependency relationship together to address the problem of contextual dependency and use a
more comprehensive semantic information provided by the BiLSTM and the POS information to address the
problem of lexical polysemy. What is worth mentioning, the dependency relationship is daringly transplanted
from relation extraction tasks to text classification tasks to provide the graph required by IGCN. Experiments
on three benchmarking datasets show that IGCN achieves competitive results compared with the other seven
baseline models.
INDEX TERMS Bidirectional long short-term memory network, dependency relationship, graph
convolutional network, part-of-speech information, text classification.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 148865
H. Tang et al.: Integration Model Based on GCN for Text Classification
FIGURE 1. The difference of capturing the dependency between the original GCN and IGCN.
The original GCN cannot capture the short-range contex- (BiLSTM) [18], [19] Network, the Part-of-Speech (POS)
tual dependency and the long-range contextual dependency information, and the dependency relationship1 reasonably.
together. Look at this example ‘‘the movie, which is the prod- In this paper, the text feature and the POS feature sequen-
uct of an unknown French director, is wonderful.’’. Due to the tially obtained through BiLSTM will be applied to solve
mechanism that GCN only aggregates the information of the the problem of lexical polysemy. Through constructing the
direct neighbor nodes, GCN can only capture the short-range dependency relationship, IGCN can take good advantage of
contextual dependency information. This question can only the short-range contextual dependency and the long-range
be solved by increasing the number of GCN layers to contextual dependency together. Meanwhile, the adjacency
capture the long-range contextual dependency, such as the matrix based on the dependency relationship will also be
dependency between the words ‘‘movie’’ and ‘‘wonderful’’. generated to provide syntactic constraints. Subsequently,
However, the current researches reveal that the multi-layer the different attentions of the neighbor nodes to the cen-
GCN for text classification tasks will give rise to a high tral node can be learned during the aggregation process.
spatial complexity [15]. Meanwhile, the over-smoothing of Namely, the weights of the features will be calculated by
the node feature will also be caused by increasing the number the attention mechanism in the propagation process. What is
of network layers, which will make local features converge to worth mentioning, to provide the graph required by IGCN,
a similar value. the dependency relationship is daringly transplanted from
In addition to the problem of contextual dependency, relation extraction tasks to text classification tasks. The dif-
the problem of lexical polysemy also exists in GCN. The ference of capturing the dependency between the original
problem of lexical polysemy can be described that the same GCN and IGCN is shown in Fig. 1.
word may express different semantics in same or differ- Experiments on three benchmarking datasets demonstrate
ent positions. In the sentences ‘‘I bought an apple.’’ and that the problems in the current GCN-based method can be
‘‘I bought an apple X.’’, the meaning of the word ‘‘apple’’ effectively addressed by IGCN which has some advantages
is different due to the difference of the context. Meanwhile, than the other researches. The main contributions of this
in the sentences ‘‘Our object is to further cement trade rela- paper are as follows:
tions.’’ and ‘‘Their voters do not object much.’’, the meaning • The text information and the POS information can be
of the word ‘‘object’’ is also different owing to the differ- effectively applied to generate the initial features by
ence of the part-of-speech. Although the relevant researches BiLSTM. Simultaneously, the features can not only
have claimed that the problem of lexical polysemy can be make up for the deficiency of the content-level context
solved without relying on the syntactic information and the effectively, but also provide a new idea to address the
semantic information, their effects are not up to expectation problem of lexical polysemy.
unfortunately [16], [17].
To overcome the problems of contextual dependency and
1 the POS information and the dependency relationship are generated
lexical polysemy, an improved GCN (IGCN) is proposed for
through spaCy toolkit. The POS information of each word is generated by
text classification in this paper. Based on the original GCN, the POS tagging. The dependency relationship between words is generated
IGCN introduces the Bidirectional Long Short-Term Memory by the dependency parsing.
• The dependency relationship and the attention mech- information. The self-attention mechanism not only over-
anism are fully integrated into the IGCN. They can came the shortcoming of being unable to calculate paral-
effectively deal with the problem of contextual depen- lelly on RNN, but also solved the problem of capturing
dency, and reduce the number of GCN layers partly. the long-range dependency information difficultly on CNN.
Namely, they can solve the high space complexity and Dong et al. [33] obtained text representations containing
over-smoothing caused by the multi-layer GCN indi- more comprehensive semantics for text classification by
rectly. Such a cross-task study is useful to meet the chal- introducing the Bidirectional Encoder Representation from
lenges in NLP, and further demonstrates the significance Transformers(BERT) [34] and the self-interaction attention
of the dependency relationship. mechanism.
• A large number of experimental results not only prove When processing text classification through the attention-
the reasonability of integrating the BiLSTM, the POS based model, many breakthroughs in the field of node clas-
information, and the dependency relationship, but also sification and edge prediction have been made with GCN in
prove the efficiency of IGCN for text classification. recent years. Hamilton et al. [35] proposed a variety of aggre-
IGCN will contribute to the continuous development of gation functions to learn the feature representation of each
the research field of the non-euclidean structure data. node to enhance the effect of GCN. Chen et al. [36] proposed
a random training method. Their method can reduce the time
II. RELATED WORK complexity greatly with selecting two neighbor nodes for
Unlike the traditional classification methods based on extract- the convolution operation randomly. In the case of different
ing text features manually, the current methods based on the sampling sizes, features will converge to a local optimum.
deep learning can directly output the category of the text by Li et al. [37] proposed a method that can adaptively construct
training the neural network. For example, Tang et al. [20] new Laplacian matrices based on tasks and generate different
adopted two LSTMs for text classification which effec- task-driven convolution kernels. This method is superior to
tively integrate sentiment words and contextual information. GCN in processing multitask datasets. Velickovic et al. [38]
Zhang et al. [21] classified texts through introducing the used the attention mechanism to calculate the correlation
sentiment word information into BiLSTM. Yang et al. [22] between nodes dynamically and achieved good results in
integrated the common sense into the deep neural networks many public datasets. Yao et al. [39] introduced GCN for
based on BiLSTM to enhance the accuracy of text classifi- text classification and modeled the whole corpus into a het-
cation. Xue and Li [23] utilized CNN and the gate mecha- erogeneous network, which can learn word embeddings and
nism inversely to achieve higher accuracy which broke away document embeddings simultaneously. Cavallari et al. [40]
from the network structure on the base of RNN and atten- introduced a new setting for graph embedding, which con-
tion mechanism. Huang and Carley [24] achieved amazing siders embedding communities instead of individual nodes.
results through designing the parameterized filters and the Although GCN performs well in text classification, it still
gate mechanism on CNN to capture text features. Li et al. [25] fails to solve the problems of contextual dependency and
proposed a feature transformation component and a context lexical polysemy in text classification tasks. With addressing
retention mechanism to learn contextual information and the two problems as a starting point, an improved model based
combined contextual features with their transformed contex- on GCN is proposed accordingly.
tual features to obtain local salient features. Dong et al. [26]
proposed a CNN with multiple non-linear transformations III. IGCN
which has pursued a good result. Akhter et al. [27] achieved a Through an in-depth research on text classification based on
great performance by introducing the single-layer CNN with the neural network, IGCN is proposed in this paper, which
multiple filters into document-level text classification. builds three components of BiLSTM, the POS information,
At the same time, the attention-based model had also been and the dependency relationship on GCN. The whole process
proposed, which was used to capture the weight of each word of IGCN can be divided into steps of extracting the text
within a sentence. Wang et al. [28] introduced the attention feature, concatenating the POS feature, constructing the adja-
mechanism into LSTM, which provided a new idea for text cency matrix based on the dependency relationship, training
classification. Chen et al. [29] introduced the product and the neural network, and making a final prediction. The archi-
user information of different semantic levels to classify texts tecture of IGCN is shown in Fig. 2. Firstly, the text features
through the attention mechanism. Liu and Zhang [30] pro- and their corresponding POS features will be successively
posed a method which introduces three attention mechanisms obtained by BiLSTM, which can utilize their respective con-
to determine the contribution of each word in the context in textual information effectively. Then, two kinds of features
text classification. Gu et al. [31] proposed a position-aware will be concatenated together to form the required feature of
bidirectional attention network based on the Bidirectional IGCN. Meanwhile, the dependency relationship will be gen-
Gated Recurrent Unit(BiGRU) for text classification. Espe- erated to confront the problem of contextual dependency, and
cially, Vaswani et al. [32] proposed the self-attention to construct the adjacency matrix required by IGCN. After
mechanism which is good at capturing the internal rele- that, the feature and the adjacency matrix will be input to
vance between features and is less dependence on external train the neural network. With the hidden state vectors of the
the self-loop adjacency matrix of the graph G. The adjacency information with full consideration of the context. Such a
matrix of an undirected graph can be expressed as follows. bi-directional model can provide a deeper text feature repre-
sentation to the neural network. The architecture of BiLSTM
0, i →
/ j is shown in Fig. 4.
Aij = Aji = 1, i → j (1)
1, i = j
FIGURE 7. An example to explain how to construct the adjacency matrix with dependency relations.
IV. EXPERIMENTS
In this section, this paper first describes how to set up the
experiment and analyzes the experimental results of different
models. Subsequently, the ablation study is implemented to
further prove the performance brought by each component.
medical field but a negative tendency in other fields. The [22] M. Yang, Q. Qu, X. Chen, C. Guo, Y. Shen, and K. Lei, ‘‘Feature-
difference of the tendency may cause a wrong prediction. enhanced attention network for target-dependent sentiment classification,’’
Neurocomputing, vol. 307, pp. 91–97, Sep. 2018.
Therefore, an in-depth study will be carried out in the domain [23] W. Xue and T. Li, ‘‘Aspect based sentiment analysis with gated convolu-
knowledge transfer. tional networks,’’ in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics
(Long Papers), vol. 1, 2018, pp. 2514–2523.
[24] B. Huang and K. Carley, ‘‘Parameterized convolutional neural networks for
REFERENCES aspect level sentiment classification,’’ in Proc. Conf. Empirical Methods
[1] W. Zhao, G. Zhang, G. Yuan, J. Liu, H. Shan, and S. Zhang, ‘‘The study Natural Lang. Process., 2018, pp. 1091–1096.
on the text classification for financial news based on partial information,’’ [25] X. Li, L. Bing, W. Lam, and B. Shi, ‘‘Transformation networks for target-
IEEE Access, vol. 8, pp. 100426–100437, 2020. oriented sentiment classification,’’ in Proc. 56th Annu. Meeting Assoc.
[2] K. Liu and L. Chen, ‘‘Medical social media text classification integrating Comput. Linguistics (Long Papers), vol. 1, 2018, pp. 946–956.
consumer health terminology,’’ IEEE Access, vol. 7, pp. 78185–78193, [26] M. Dong, Y. Li, X. Tang, J. Xu, S. Bi, and Y. Cai, ‘‘Variable convolution and
2019. pooling convolutional neural network for text sentiment classification,’’
[3] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and IEEE Access, vol. 8, pp. 16174–16186, 2020.
J. Gao, ‘‘Deep learning based text classification: A comprehensive review,’’ [27] M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M. Abdelmajeed, A. Mehmood,
2020, arXiv:2004.03705. [Online]. Available: http://arxiv.org/abs/2004. and M. T. Sadiq, ‘‘Document-level text classification using single-layer
03705 multisize filters convolutional neural network,’’ IEEE Access, vol. 8,
[4] S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, ‘‘A survey pp. 42689–42707, 2020.
on knowledge graphs: Representation, acquisition and applications,’’ [28] Y. Wang, M. Huang, X. Zhu, and L. Zhao, ‘‘Attention-based LSTM for
2020, arXiv:2002.00388. [Online]. Available: http://arxiv.org/abs/2002. aspect-level sentiment classification,’’ in Proc. Conf. Empirical Methods
00388 Natural Lang. Process., 2016, pp. 606–615.
[5] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn., [29] H. Chen, M. Sun, C. Tu, Y. Lin, and Z. Liu, ‘‘Neural sentiment classifica-
vol. 20, no. 3, pp. 273–297, 1995. tion with user and product attention,’’ in Proc. Conf. Empirical Methods
Natural Lang. Process., 2016, pp. 1650–1659.
[6] Y. Kim, ‘‘Convolutional neural networks for sentence classification,’’ in
[30] J. Liu and Y. Zhang, ‘‘Attention modeling for targeted sentiment,’’ in Proc.
Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2014,
15th Conf. Eur. Chapter Assoc. Comput. Linguistics, Short Papers, vol. 2,
pp. 1746–1751.
Apr. 2017, pp. 572–577.
[7] H. Zhang, L. Xiao, Y. Wang, and Y. Jin, ‘‘A generalized recurrent neural
[31] S. Gu, L. Zhang, Y. Hou, and Y. Song, ‘‘A position-aware bidirectional
architecture for text classification with multi-task learning,’’ in Proc. 26th
attention network for aspect-level sentiment analysis,’’ in Proc. 27th Int.
Int. Joint Conf. Artif. Intell., Aug. 2017, pp. 2873–2879.
Conf. Comput. Linguistics (COLING), Aug. 2018, pp. 774–784.
[8] W. Zhao, H. Peng, S. Eger, E. Cambria, and M. Yang, ‘‘Towards scalable [32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
and reliable capsule networks for challenging NLP applications,’’ in Proc. L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 1549–1559. Neural Inf. Process. Syst., Dec. 2017, pp. 5998–6008.
[9] M. Schuster and K. K. Paliwal, ‘‘Bidirectional recurrent neural networks,’’ [33] Y. Dong, P. Liu, Z. Zhu, Q. Wang, and Q. Zhang, ‘‘A fusion model-based
IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997. label embedding and self-interaction attention for text classification,’’
[10] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural IEEE Access, vol. 8, pp. 30548–30559, 2020.
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [34] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
[11] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, of deep bidirectional transformers for language understanding,’’ in Proc.
H. Schwenk, and Y. Bengio, ‘‘Learning phrase representations using Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Tech-
RNN encoder–Decoder for statistical machine translation,’’ in Proc. nol., vol. 1, Jun. 2019, pp. 4171–4186.
Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2014, [35] W. L. Hamilton, Z. Ying, and J. Leskovec, ‘‘Inductive representation learn-
pp. 1724–1734. ing on large graphs,’’ in Proc. Adv. Neural Inf. Process. Syst., Dec. 2017,
[12] R. Wang, Z. Li, J. Cao, T. Chen, and L. Wang, ‘‘Recurrent convolutional pp. 1024–1034.
neural networks for text classification,’’ in Proc. Int. Joint Conf. Neural [36] J. Chen, J. Zhu, and L. Song, ‘‘Stochastic training of graph convolutional
Netw. (IJCNN), Jul. 2019, pp. 2267–2273. networks with variance reduction,’’ in Proc. 35th Int. Conf. Mach. Learn.
[13] R. Wang, Z. Li, J. Cao, T. Chen, and L. Wang, ‘‘Convolutional recurrent (ICML), Jul. 2018, pp. 941–949.
neural networks for text classification,’’ in Proc. Int. Joint Conf. Neural [37] R. Li, S. Wang, F. Zhu, and J. Huang, ‘‘Adaptive graph convolutional neural
Netw. (IJCNN), Jul. 2019, pp. 1–6. networks,’’ in Proc. 32th AAAI Conf., Feb. 2018, pp. 3546–3553.
[14] T. N. Kipf and M. Welling, ‘‘Semi-supervised classification with graph [38] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio,
convolutional networks,’’ in Proc. 5th Int. Conf. Learn. Represent. (ICLR), ‘‘Graph attention networks,’’ in Proc. 6th Int. Conf. Learn. Represent.
Apr. 2017, pp. 1–14. (ICLR), Apr. 2018, pp. 1–12.
[15] G. Li, M. Muller, A. Thabet, and B. Ghanem, ‘‘DeepGCNs: Can GCNs go [39] L. Yao, C. Mao, and Y. Luo, ‘‘Graph convolutional networks for text
as deep as CNNs?’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), classification,’’ in Proc. 31th AAAI Conf., Jan. 2019, pp. 7370–7377.
Oct. 2019, pp. 9266–9275. [40] S. Cavallari, E. Cambria, H. Cai, K. C.-C. Chang, and V. W. Zheng,
[16] J. Liu, F. Meng, Y. Zhou, and B. Liu, ‘‘Character-level neural networks for ‘‘Embedding both finite and infinite communities on graphs [application
short text classification,’’ in Proc. Int. Smart Cities Conf. (ISC2), Sep. 2017, notes],’’ IEEE Comput. Intell. Mag., vol. 14, no. 3, pp. 39–50, Aug. 2019.
pp. 560–567.
[17] X. Zhang, J. J. Zhao, and Y. LeCun, ‘‘Character-level convolutional net-
works for text classification,’’ in Proc. Adv. Neural Inf. Process. Syst.,
Dec. 2015, pp. 649–657.
[18] P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, ‘‘Attention-based
bidirectional long short-term memory networks for relation classification,’’
in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics (Short Papers),
vol. 2, 2016, pp. 207–212.
HENGLIANG TANG received the B.Sc. and Ph.D.
[19] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and
degrees from the Beijing University of Technol-
L. Zettlemoyer, ‘‘Deep contextualized word representations,’’ in Proc. ogy, in 2005 and 2011, respectively. He is currently
Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Tech- a Professor with Beijing Wuzi University. His
nol., (Long Papers), vol. 1, 2018, pp. 2227–2237. main research interests include computer vision
[20] D. Tang, B. Qin, X. Feng, and T. Liu, ‘‘Effective LSTMs for target- and the IoT information technology.
dependent sentiment classification,’’ in Proc. 26th Int. Conf. Comput.
Linguistics (COLING), Dec. 2016, pp. 3298–3307.
[21] M. Zhang, Y. Zhang, and D. Vo, ‘‘Gated neural networks for targeted
sentiment analysis,’’ in Proc. 30th AAAI Conf., Feb. 2016, pp. 3087–3093.
YUAN MI received the B.Sc. degree from YANG CAO received the B.Sc. and M.Sc. degrees
Northwest A&F University, in 2015. He is cur- from the Taiyuan University of Science and Tech-
rently pursuing the M.Sc. degree with Beijing nology, in 2011 and 2015, respectively, and the
Wuzi University. His main research interests Ph.D. degree from the Beijing University of
include natural language processing and the IoT Technology, in 2019. He is currently a Lecturer
information technology. with Beijing Wuzi University. His main research
interests include machine learning and big data
analysis.