0% found this document useful (0 votes)
51 views5 pages

A Text Classification Model Based On GCN and BiGRU Fusion

This document proposes a text classification model that fuses a graph convolutional neural network (GCN) and bidirectional gated recurrent unit (BiGRU). It first preprocesses text with Word2Vec vectorization. Then, it fuses a GCN and BiGRU to extract complex semantic relations and spatial feature information from text. Finally, it performs classification with a softmax classifier. The model is evaluated on a public dataset and is shown to improve text classification performance by leveraging the strengths of GCN and BiGRU.

Uploaded by

nithin reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

A Text Classification Model Based On GCN and BiGRU Fusion

This document proposes a text classification model that fuses a graph convolutional neural network (GCN) and bidirectional gated recurrent unit (BiGRU). It first preprocesses text with Word2Vec vectorization. Then, it fuses a GCN and BiGRU to extract complex semantic relations and spatial feature information from text. Finally, it performs classification with a softmax classifier. The model is evaluated on a public dataset and is shown to improve text classification performance by leveraging the strengths of GCN and BiGRU.

Uploaded by

nithin reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Text Classification Model Based on GCN and BiGRU Fusion

Yonghao dong Zhenmin Yang Hui Cao∗


Key Laboratory of China’s Ethnic Key Laboratory of China’s Ethnic Key Laboratory of China’s Ethnic
Languages and Information Languages and Information Languages and Information
Technology of Ministry of Education, Technology of Ministry of Education, Technology of Ministry of Education,
Northwest Minzu University, Northwest Minzu University, Northwest Minzu University,
Lanzhou, Gansu, China Lanzhou, Gansu, China Lanzhou, Gansu, China
ABSTRACT system or criteria . Traditional classification algorithms are mainly
A text classification model with the fusion of graph convolutional based on mathematical representation of machine learning meth-
neural network (GCN) and bi-directional gated recurrent unit (Bi- ods for classification, such as plain Bayes, support vector machines,
GRU) is designed to address the lack of ability of simple neural decision tree methods, etc. are widely used in text classification
networks to capture the contextual semantics of text, extract spa- research. Traditional classification algorithms can only solve simple
tial feature information of text and nonlinear complex semantic problems, and their generalization ability is limited when facing
relations. First, the text is preprocessed and text vectorization is complex situations.
performed by Word2Vec; then, the graph convolutional neural net- In recent years, deep neural networks have been widely used in
work and bi-directional gated recurrent unit are fused to form a the field of classification due to their excellent feature capture capa-
hybrid model so that it can extract complex semantic relations and bility [2]. Recurrent Neural Network (RNN) and its corresponding
spatial feature information of the text; finally, the classification is variants such as Long Short Term Memory (LSTM), Gated Recur-
performed by a softmax classifier. Experiments are conducted on rent Unit (GRU) are widely used in text sequence modeling. [3] Niu
a publicly available dataset, and the results demonstrate that the et al. [4] proposed a recurrent neural network based on Word2Vec
model can effectively improve the performance of text classification. and improved for long and short term memory networks and ap-
plied it to microblogging texts. In order to further improve the
CCS CONCEPTS performance of neural network models for text classification, At-
tention mechanism [5] was introduced into neural networks, and
• Computing methodologies; • Artificial intelligence; • Natu-
Liu et al. [6] built BiGRU networks with user and product atten-
ral language processing; • Lexical semantics;
tion mechanisms for multi-category text classification. The Graph
Convolutional Network (GCN) is a method that can perform deep
KEYWORDS
learning on graph data. The GCN is a cleverly designed method to
Text classification, Graph Convolutional Network, Bidirectional extract features from graph data with feature learning capability
Gated Recurrent Unit and nonlinear modeling capability, so that these features can be
ACM Reference Format: used for text semantic modeling. Kipf et al. [7]proposed a scalable
Yonghao dong, Zhenmin Yang, and Hui Cao. 2022. A Text Classification Yao et al. [8]proposed a novel graph neural network-based approach
Model Based on GCN and BiGRU Fusion. In 2022 8th International Conference that uses graph convolutional networks to jointly learn word and
on Computing and Artificial Intelligence (ICCAI ’22), March 18–21, 2022, document embeddings for text classification, and experimentally
Tianjin, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/
demonstrated that for the case of small amount of labeled data,
3532213.3532260
it is more efficient than some other deep learning models (CNN,
LSTM, etc.), the model is more robust and its performance is still
1 INTRODUCTION outstanding when the amount of labeled data is large and small.
Due to the rapid development of the information age, the Inter- To improve the accuracy of text classification, a BiGRU-GCN
net has produced a large amount of textual information, and it is text classification model is proposed in this paper, drawing on
more and more important to classify these texts accurately. Text the respective advantages of the above two models. The model
classification technology is a technique for automatically classify- uses distributed word embeddings to convert natural language text
ing and labeling text [1]sets according to a certain classification into a vector representation with rich semantic information, and
∗ CaoHui, Key Laboratory of China’s Ethnic Languages and Information Technol-
combines the advantages of both GCN and BiGRU models to obtain
ogy of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu, China, global contextual features and long-range dependencies of text
Corresponding author E-mail: [email protected] temporal order through a stacked BiGRU network with a hopping
Permission to make digital or hard copies of all or part of this work for personal or
connection structure, while using the GCN network to effectively
classroom use is granted without fee provided that copies are not made or distributed extract spatial feature information and nonlinear complex semantic
for profit or commercial advantage and that copies bear this notice and the full citation relations , thus improving the text feature extraction capability of
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, the model. Finally, the analytical experimental results show that
to post on servers or to redistribute to lists, requires prior specific permission and/or a the BiGRU-GCN model has better classification performance.
fee. Request permissions from [email protected].
ICCAI ’22, March 18–21, 2022, Tianjin, China
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9611-0/22/03. . . $15.00
https://doi.org/10.1145/3532213.3532260

318
ICCAI ’22, March 18–21, 2022, Tianjin, China Yonghao Dong et al.

Figure 1: CBOW model Figure 2: GRU structural model diagram

2 RELATED WORK 2.3 Graphical Convolutional Neural Network


2.1 Text Vectorization To solve some strictly graph-theoretic problems, Gori et al. [13]
Before a neural network learns text features it first needs to convert introduced the concept of graph neural networks (GNNs) in 2015
words in the text into word vectors, traditional text vectorization and [14] further elucidated by Scarselli et al. The graph convolu-
methods mainly include one-hot coding [9] and bag-of-words model tional neural network (GCN) used in this paper is a branch of GNN,
[10]. one-hot coding expresses features that ignore the relationship which includes graph attention networks (GAT), graph autoen-
between words, which is easy to cause dimensional disaster, while coders (GAE), graph generative networks (GGN), and graph spatio-
bag-of-words model cannot learn text context and cannot charac- temporal networks (GSTN), in addition to GCN. The most widely
terize word order, semantics, syntax and other features. in 2013, used GCNs are two graph convolutional neural networks based on
Mikolov et al. Mikolov et al. [11] proposed the Word2Vec model to frequency-domain (Spectral-domain) and spatial-domain (Spatial-
build a neural network word embedding representation, which can domain) firstly proposed by Bruna et al. [15] in 2013. Numerous
map high-dimensional sparse vectors to low-dimensional dense studies have shown that the results of various natural language
word vectors effectively reduce word dimensionality and effectively processing tasks appear to be somewhat improved after the use of
solve problems such as dimensional catastrophe. The Word2Vec graph convolutional neural network models. The use of graph struc-
model is implemented based on two approaches, the Skip-Gram tures allows the complex semantic relationships between objects
model and the CBOW model. In this paper, we choose the CBOW to be effectively mined. Compared to the traditional for natural
model based on Word2vec to [11] assist in constructing the text language processing serialized modeling, the use of graph convolu-
vector representation, and the structure of the CBOW model is tional neural networks is able to mine non-linear complex semantic
shown in Figure 1. The input layer inputs the word w t of the con- relations.
text vector, the mapping operation by the mapping layer, and the
output layer outputs the vector representation of the current word. 3 A TEXT CLASSIFICATION MODEL
INCORPORATING GCN AND BIGRU
2.2 Gated Circulation Unit Considering that text has the property of temporal sequence, this
GRU (Gate Recurrent Unit) is a type of Recurrent Neural Network paper selects BiGRU model as the baseline for text classification.
(RNN).GRU model is an improved model of LSTM (Long-Short GRU has the advantages of fewer parameters, high training effi-
Term Memory) proposed by Chung et al.[12] in 2014.GRU model ciency and strong expressive power compared with RNN and LSTM,
compensates the shortcomings of LSTM with long training The but GRU model can only obtain past context information, in order
GRU model merges the unit state and hidden layer state of LSTM to obtain complete past and future context information, BiGRU
and synthesizes the forgetting gate and the input gate into a single model is selected to extract long text contextual sequence features.
update gate, which simplifies its structure and largely improves the In order to further extract contextual spatial features and deep
training efficiency, which is simpler and better compared to the complex semantics, the GCN model is selected. In this paper, we
structure of LSTM network. The GRU model is shown in Figure propose and design a new BiGRU_GCN text classification model
2 to, merge the unit state and the output combined into a single by combining the respective advantages of GCN model and BiGRU
state s. Where.σ is the sigmoid activation function; xt is the input model, which is shown in Fig. 3 and mainly consists of three parts:
unit at moment t, which represents the word vector of the word text preprocessing, text vectorization and classifier.
at moment t. st is the hidden state at moment t. st −1 is the hidden In the text preprocessing stage, the text data is first cleaned by re-
state at moment t-1; r tdenotes the reset gate; z tdenotes the update moving deactivated words, removing line breaks, special characters,
gate; and s̃ t is the candidate memory unit at moment t. etc., and serializing the data; then the processed data is vectorized

319
A Text Classification Model Based on GCN and BiGRU Fusion ICCAI ’22, March 18–21, 2022, Tianjin, China

Figure 3: BiGRU-GCN text classification model

by Word2Vec to convert the text data into word vectors; finally the more parameters and complex structure in training. In order to
word vectors are put into the classifier for processing. In this part compensate the shortcomings of LSTM, GRU model is proposed.
the BiGRU model and GCN network using dependent syntactic tree GRU model uses internal gate control to process the timing infor-
has initialized nodes are used, the BiGRU model learns the data mation efficiently and filters the input information according to
and extracts the important features; the GCN network extracts the two gating systems, update gate and reset gate. The larger value
spatial feature information in the text, and finally the classification of update gate indicates that the previous moment of hidden layer
results are obtained through the global pooling layer. output has more influence on the current hidden layer, and the
smaller value of update gate indicates that more ignored. While the
3.1 Embedding Layer one-way GRU model suffers from the problem of ignoring the con-
Embedding layer i.e. word embedding layer. Before analyzing the textual information, the two-way GRU model learns the contextual
text, the words in the text need to be converted into word vectors information adequately. The two-way GRU model is composed of
before they can be used as input to the neural network. In this paper, two GRU models with opposite information transfer, where the first
CBOW model based on Word2vec [11] is chosen to implement text layer transfers information in temporal order and the second layer
vectorization by which words are converted into 512-dimensional transfers information in reverse temporal order. The computational
word vectors. The essence of CBOW model is to predict a target flow of the bidirectional GRU model is shown below.
word by using a neural network algorithm given a contextual word
s®t = GRU (x t , s®t −1 )
environment. The CBOW model consists of an input layer, a hidden  ← 

layer and an output layer. The input layer is a vector of the current s t = GRU x t , s t −1
word and its neighboring position words. The hidden layer is to ←
h ← i
map the input matrix to a vector, the output layer is a Hoffman st = w t s®t + vt s t + bt = s®t , s t
tree, the leaf nodes in the tree is each word, the path from the high where x t denotes the input at moment t, and w t , and vt denotes
frequency word to the root node is relatively short, the word to the the associated weight values of the anterior hidden layer and the
root node is also only one path, each intermediate node is a sigmoid posterior hidden layer corresponding to the BiGRU at moment t,
unit, from the root node to the specified word will pass through respectively, and bt denotes the bias value of the state of the hidden
multiple intermediate nodes, each passing through an intermediate layer at moment t, and st is the hidden state at moment t. st −1 is
node, in fact, is a binary classification task. Each word arriving at a the hidden state at moment t-1.
node in the root node path will have a corresponding weight vector.
After the model is trained to find this appropriate weight vector 3.3 GCN Layer
that maximizes the probability of occurrence of the root node to
GCN is a convolutional neural network running on graph struc-
the specified word, the weight vector of the intermediate nodes is
ture, which expands the perceptual field by receiving neighborhood
found and then the vector of the original word is known.
information. The workflow of the GCN layer in this paper is de-
scribed below. Firstly, this paper generates the dependent syntactic
3.2 BiGRU Layer graph by using the LTP tool of HIT, and generates the undirected
Although the traditional LSTM algorithm has good results in deal- graph G with n The output of BiGRU layer is then used as the
ing with sequential problems, it suffers from long training time, input of GCN layer to extract the spatial feature information and

320
ICCAI ’22, March 18–21, 2022, Tianjin, China Yonghao Dong et al.

Table 1: Data set structure table

data set training set test set


Chinese text classification dataset 160000(article) 40000(article)

nonlinear complex semantic relations of the text through GCN Table 2: Model parameter setting table
network.The GCN network extracts the spatial feature informa-
tion and nonlinear complex semantic relations of the text through Parameter Name parameter value
the adjacency matrix A and the feature matrix S multiplying by
Word vector dimension 512
each vertex neighbor feature to get the summary of each vertex
Learning rate (1-10 epoch) 1e-2
neighbor feature, and then multiplying by a parameter matrix W
Learning rate (10-20 epoch) 1e-3
plus an activation function σ A nonlinear transformation is done
Learning rate (20-30 epoch) 1e-4
to obtain the matrix of aggregated neighboring vertex features H .
loss function categorical_crossentropy
The reason why the adjacency matrix A to add a unit matrix I N
Epoch 30
is to preserve the information of the vertices’ own features when
Batch size 64
propagating the information, while the neighbor matrix à The nor-
1 1 Dropout 0.1
e ∗D̃ − 2 is to keep the feature matrix
malization operation D̃ − 2 ∗A L2 regularization parameter 5e-4
H The computational procedure of the GCN model is shown below. Optimizer Adam
 1 1

H = σ D̃ − 2 ∗Ae ∗D̃ − 2 ∗ S∗ W

of which à = A + I N , A, I N denote the adjacency matrix and the samples actually negative but incorrectly classified, and FN denotes
unit matrix of the undirected graph G, respectively. D̃ denotes the the number of samples actually positive but incorrectly classified.
à the degree matrix of G, and S denotes the output of the BiGRU
layer, and W denotes the parameter matrix, and σ denotes the 4.3 Comparison Experiments and Parameter
activation function, and H denotes the output of the GCN layer. Settings
In this experiment, the BiGRU_GCN model is experimentally com-
4 EXPERIMENTATION AND ANALYSIS pared with the following three models.
4.1 Data Pre-processing 1. The RNN neural network model proposed in the literature
This paper uses the THUCNews dataset, 120,000 news headlines [16].
were extracted from THUCNews with length between 20 and 30 2. The BiLSTM-Attention neural network model proposed in
words, including 10 types of finance, property, stock, education, the literature [17].
technology, society, current affairs, sports, games, and entertain- 3. The BiGRU-Attention neural network model proposed in the
ment, 20,000 items per category, a total of 120,000 items of data. For literature [18].
better text classification, this paper preprocessed the dataset, re- All experiments are divided into 3 parts: data preprocessing,
moved special characters such as line breaks and split the data with feature extraction, and text classification, and the preprocessing
stuttering, and then initialized the word embedding information process and experimental hyperparameters are kept consistent.
of the text using Word2Vec for the split data; then the dataset was In this paper, we use dynamically adjusted learning rate, and the
randomly divided into a training set and a test set in the ratio of 8:2, experiments are run for 30 epochs, and the learning rate is kept
which were used for model training and performance evaluation, consistent for the first 10 epochs, 10 to 20 epochs, and the learning
respectively. The structure of the dataset is shown in Table 1. rate is kept consistent for the last 10 epochs. The parameter settings
of this experimental model are shown in Table 2.
4.2 Evaluation Indicators
In this paper, precision (precision), recall (recall) and F1 (F1-score) 4.4 Analysis of Experimental Results for Text
values are used as evaluation indicators and are calculated as fol- Classification
lows. In this paper, experiments were conducted on the RNN model,
TP
precision = BiLSTM-Attention model, BiGRU-Attention model and the im-
TP + FP proved model proposed in this paper on Chinese text classification
TP dataset, and the experimental results are shown in Figure 4.
recall =
TP + FN As can be seen from Fig. 4, there is not much difference between
2∗recall∗precision several models in terms of accuracy, recall, and F1 values, which
F1 =
precision + recall indicates that the performance of the GRU model and the LSTM
where TP denotes the number of samples predicted to be positive model are relatively close when the dataset is large, and on balance
and correctly classified, TF denotes the number of samples predicted the BiGRU-Attention model works better on the text classification
to be negative and correctly classified, FP denotes the number of task compared to the BiLSTM-Attention model. The model that

321
A Text Classification Model Based on GCN and BiGRU Fusion ICCAI ’22, March 18–21, 2022, Tianjin, China

Figure 4: Comparison of the results of the models

uses the Attention mechanism significantly outperforms the RNN ACKNOWLEDGMENTS


model that does not use the Attention mechanism, indicating that My work was supported by the National Center for Monitoring and
increasing the complexity of the network is effective. Then, after Research on Language Resources (NCLR) Minority Languages Sub-
choosing the latter model with better metrics among the latter two Centre and the Contextually Relevant Tibetan Sentiment Resource
models and changing the latter Attention mechanism to a GCN Repository Construction Study: NMLR201601.
model, the model has significantly improved, indicating that the
text classification model proposed in this paper can better relate to REFERENCES
the context, while effectively extracting spatial feature information [1] Sijia D U, Haining Y U. Survey of text classification methods based on deep
and nonlinear complex semantic relationships of the text, making learning [J]. Chinese Journal of Netword and Information Security, 6(4): 1.
[2] LeCun Y, Bengio Y, Hinton G. Deep learning [J]. nature, 2015, 521(7553): 436-444.
the model outperform other models in terms of accuracy, recall and [3] Kumar A, Rastogi R. Attentional recurrent neural networks for sentence classifi-
F1 value. cation [M]//Innovations in Infrastructure. Springer, Singapore, 2019: 549-559.
[4] Niu C, Zhan G, Li Z. Chinese Weibo sentiment analysis based on deep neural
network [J]. Comput. Syst. Appl, 27: 205-210.
5 SUMMARY AND OUTLOOK [5] Luong M T, Pham H, Manning C D. Effective approaches to attention-based
neural machine translation [J]. arXiv preprint arXiv:1208.04025, 2015.
Traditional text classification models do not sufficiently consider [6] Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional
the relationships between contexts and the spatial features of the layer for text classification [J]. Neurocomputing, 2019, 337: 325-338.
text as well as the complex semantic relationships in the text that [7] Kipf T N, Welling M. Semi-supervised classification with graph convolutional
networks [J]. arXiv preprint arXiv:1609.02907, 2016.
are non-linear. Most of the models are a stack of neural network [8] Yao L, Mao C, Luo Y. Graph convolutional networks for text classifica-
models and Attention mechanisms. To address these shortcomings, tion[C]//Proceedings of the AAAI conference on artificial intelligence. 2019,
this paper proposes a text classification model with fusion of graph 33(01): 7370-7377.
[9] Fu Y L, Lu T L, Ma Z L. CNN malicious code detection technology based on
convolutional neural network and BiGRU, using BiGRU network to One-Hot [J]. Computer Applications and Software, 2020, 37(1): 304-308,333.
obtain global contextual features and long distance dependencies [10] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification
[J]. arXiv preprint arXiv:1607.01759, 2016.
of text temporal order, and using GCN network to effectively ex- [11] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations
tract spatial feature information and nonlinear complex semantic in vector space [J]. arXiv preprint arXiv:1301.3781, 2013.
relations of text, and optimizes the model to further improve the [12] Chung J, Gulcehre C, Cho K H, et al. Empirical evaluation of gated recurrent
neural networks on sequence modeling [J]. arXiv preprint arXiv:1412.3555, 2014.
accuracy of text classification. The experimental results show that [13] Gori M, Monfardini G, Scarselli F. A new model for learning in graph do-
the model in this paper has good effectiveness. mains[C]//Proceedings. 1205 IEEE International Joint Conference on Neural
In future work, further consideration will be given to fusing Networks, 1205. IEEE, 1205, 2: 729-734.
[14] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model [J]. IEEE
the Attention mechanism with BiGRU and incorporating more transactions on neural networks, 1208, 20(1): 61-120.
semantic information in the embedding layer, optimizing the loss [15] Bruna J, Zaremba W, Szlam A, et al. Spectral networks and locally connected
networks on graphs [J]. arXiv preprint arXiv:1312.6203, 2013.
function, and attempting to improve the model to further enhance [16] Liu P, Qiu X, Huang X. Recurrent neural network for text classification with
the model’s text classification capability. multi-task learning [J]. arXiv preprint arXiv:1605.05101, 2016.
[17] Chung J, Gulcehre C, Cho K H, et al. Empirical evaluation of gated recurrent
neural networks on sequence modeling [J]. arXiv preprint arXiv:1412.3555, 2014.
[18] Wang W, Sun Y, Qi Q, et al. Text sentiment classification model based on BiGRU-
Attention neural network [J/OL] [J]. Application Research of Computers, 2019,
10: 09-27.

322

You might also like