0% found this document useful (0 votes)
17 views4 pages

Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture

This paper presents an ensemble learning model based on BERT for sentiment classification of social network texts from middle school students. Experimental results indicate that while the ensemble model shows improved performance over a single BERT model, it requires more training time and offers similar predictive accuracy to a deeper BERT model. The study highlights the trade-off between interpretability and efficiency in model selection for sentiment analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture

This paper presents an ensemble learning model based on BERT for sentiment classification of social network texts from middle school students. Experimental results indicate that while the ensemble model shows improved performance over a single BERT model, it requires more training time and offers similar predictive accuracy to a deeper BERT model. The study highlights the trade-off between interpretability and efficiency in model selection for sentiment analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE)

Ensemble BERT: A Student Social Network Text


Sentiment Classification Model Based on Ensemble
Learning and BERT Architecture
Kai Jiang Honghao Yang Yuexiang Wang
2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE) | 979-8-3503-7364-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICSECE61636.2024.10729490

Business College Foreign Languages College School of Information and Business


Southwest University Central China Normal University Management
Chongqing, 400715, China Wuhan, 430079, China Dalian Neusoft University of
Information
Dalian, Liaoning, 116000, China

Qianru Chen Yiming Luo*


School of Art, Design and Fashion Business school
Zhejiang University of Science and Technology University of Sydney
Hangzhou, 310000, China Sydney, New South Wales, 2150, Australia
[email protected]
*
Yiming Luo is the corresponding author and responsible for
the design of the article.

Abstract—The mental health assessment of middle school combines the powerful semantic capturing ability of
students has always been one of the focuses in the field of transformers with the efficiency of ensemble learning.
education. This paper introduces a new ensemble learning Ensemble learning can increase robustness and reduce bias,
network based on BERT, employing the concept of enhancing while shallow networks can facilitate interpretive analysis.
model performance by integrating multiple classifiers. We
trained a range of BERT-based learners, which combined using This paper aims to investigate the following questions,
the majority voting method. We collect social network text data and the N is the number of single-layer BERT base model:
of middle school students through China's Weibo and apply the
method to the task of classifying emotional tendencies in middle
Question 1: Can an ensemble of N single-layer base
school students' social network texts. Experimental results models achieve good results in student sentiment
suggest that the ensemble learning network has a better classification?
performance than the base model and the performance of the Question 2: Does the predictive ability of an ensemble
ensemble learning model, consisting of three single-layer BERT model with N single-layer base models surpass that of
models, is barely the same as a three-layer BERT model but training a single N-layered BERT model?
requires 11.58% more training time. Therefore, in terms of
balancing prediction effect and efficiency, the deeper BERT Question 3: Is the training time for an ensemble of N
network should be preferred for training. However, for single-layer base models shorter than training a single N-
interpretability, network ensembles can provide acceptable layered BERT model when there is equal predictive accuracy?
solutions.
II. DATASET
Keywords—Social network text sentiment classification,
We demonstrate the effectiveness of our approach by
natural language processing, BERT, Ensemble learning
applying it to sentiment trend analysis tasks on social network
I. INTRODUCTION text. The single-layer decoder architecture of the BERT
model as the base model of the ensemble is used, and the
Research based on neural network technology has classification results are output using an average voting
gradually become more and more widely used in many fields method. To simplify the experiment, we select N(number) as
[1]. With the rapid growth of Deep Learning, especially in the 3. The data is social network data from students in three
field of Natural Language Processing (NLP), a growing middle schools in Xiangtan City, Hunan Province, China.
number of researchers are exploring the implementation of Weibo is the largest social network platform in China. After
deep networks for NLP tasks, especially text classification [2]. obtaining students' consent, students uploaded their Weibo
Deep networks, such as the transformational BERT, have accounts anonymously. We collected the account data of 324
achieved unprecedented performance in several NLP tasks. students and captured 100 Weibo text contents of each
However, significantly increasing the depth of networks student in the last 3 years. If there are not 100 pieces of
often results in a substantial rise in computational costs, content, all tweet content will be used. Finally, a total of
which is particularly problematic in resource-limited 30012 pieces of data were included. The students and their
scenarios. Additionally, deep networks face certain barriers parents approved and consented to this data, and the students
in terms of interpretability [3]. Therefore, this study and psychological education experts double-checked the data
introduces a novel ensemble learning framework that annotation. Next, we will present related work, followed by

979-8-3503-7364-6/24/$31.00 ©2024 IEEE 359 August 29-31, 2024 Jinzhou, China


Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.
an explanation of the model architecture. Then, we will
present and analyze the experimental results.
III. RELATED WORK
Traditional text vectorization methods such as one-hot
encoding, bag-of-words, and TF-IDF models have inherent
limitations [4]. They often lead to sparse text representations,
resulting in significant computational overhead and
insufficient understanding of contextual semantic
relationships. These limitations prevent them from solving
Fig. 2 The structure of BERT embedding layer
the challenges posed by polysemy [5]. However, extracting
text features through neural network models can effectively After entering the first layer of analysis, data that was
resolve these semantic issues in text features, effectively previously labeled incorrectly is utilized as a fresh training
addressing the challenges of polysemy and language set for preliminary instruction. In this training phase, 15% of
Ambiguity in text classification [6]. Particularly, the use of the words in the provided word sequence are randomly
pre-trained models enhances the acquisition of text semantic masked, and then predictions are made for the masked words.
representations. Widely used neural network-based text The strategy for masking includes substituting the chosen
representation models are modelled based on the relationship word with the mark ‘MASK’ token 80% of the cases, with a
between context and target words. The commonly used random word 10% of the cases, and leaving the chosen word
models include the pre-training model based on the static unchanged 10% of the cases, to resemble the actual observed
word embedding method like Word2Vec and the BERT words more closely. Once the pre-training tasks are
model, which based on the dynamic word embedding completed, the BERT model's representation of the input
method.Word2Vec is a neural network that converts each sentences can be obtained. This involves using the output of
word segment (token) in text data into a vector in k- the last layer of the BERT model, typically the researchers
dimensional space by training the network on a given employ the output corresponding to the [CLS] token, as the
corpus[7].Training models based on Word2Vec include the feature representation because we believe it has included the
Skip-gram model, which can predict the context of a known major information of sentences. These features are then fed
target word or token[6].BERT is a pre-training model based into a multilayer perceptron (MLP) for the final output
on deep learning for language representation proposed by classification in terms of the practical request.
Google in 2018[8]. It is also one of the most important pre-
training models in NLP now. It uses a bidirectional encoder Ensemble learning is a widely used technique in machine
structure, so it can well capture the differences between text learning that improves the performance or robustness of a
contexts. semantic relationship, and our base model comes machine learning model by combining predictions from
from this. The structure of BERT is shown in Figure 1. The multiple identical or different models. This includes methods
first layer is the word embedding layer. After word such as maximum voting, averaging, stacking, mixing,
embedding, it enters 12 transformer layers and obtains the bagging, and boosting [9].
final feature representation.
IV. MODEL ARCHITECTURE AND EXPERIMENTAL PROCESS
We use a single-layer BERT network as the base model,
integrate three base models, and finally output the final
classification based on the majority voting principle. The
structure of the model is shown in figure 3.

Fig. 1 The structure of BERT.

The embedding layer structure of the BERT model,


shown in Figure 2, consists of word, segment, and position
vectors that add together to create new comprehensive
embeddings. Word embeddings map each word or sub-world
unit to a vector in a large-dimensional space. Segment
embeddings enable BERT to distinguish and process
individual texts or pairs of texts. Positional embeddings are
essential for the Transformer architecture utilized in BERT,
as they provide sequential information and enable the model
to understand the position of words in a sentence. The
embeddings are non-recursive, which makes them crucial for
Fig. 3 The structure of ensemble BERT
the architecture.

360
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.
The specific steps of the experiment are as follows: When comparing various evaluation indicators between
Python is used for the experimental processing, and pandas the 3-layer BERT model and the integrated BERT model, the
are used to read the data, which contains two columns: one largest percentage gap occurs in precision. The difference is
for the review texts and the other for the sentiment labels. The about 0.68%, and the accuracy difference is only 0.05%, so
review texts are then tokenized using the Chinese BERT we assume the prediction performance of these two methods
tokenizer (Bert-base-Chinese). The tokenized inputs and is basically the same. However, it can be seen from Table IV
sentiment labels are then converted into tensors and a tensor that under the same effect, the training time of ensemble
dataset is created. This dataset is then divided into training learning exceeds the training time of deep network by about
and validation sets. Three basic BERT models, each with a 11.58%. Therefore, under the same circumstances, choosing
single hidden layer, are looped and trained. Each model is a deeper neural network may be a better choice.
independently trained using the train dataset and evaluated
using the validate dataset. The weights of the models are TABLE IV MODEL TRAINING TIME RESULTS
initialized with the built-in initial weights of the BERT model, Training Time Accura Accuracy per
and the order of the data batches is randomized in each epoch Model
(min) cy min
during training. The predictions of all models are then Ensemble BERT 212 0.9702 0.0046
combined using the majority voting method to determine the 3-layer BERT 190 0.9707 0.0051
Standard BERT(12
final prediction labels. Metrics such as the accuracy, layers)
792 0.9982 0.0013
precision, recall, F1 score, and confusion matrix of the
ensemble prediction are calculated. For comparative research, When comparing our final analysis of the confusion
a three-layer BERT network and a full 12-layer BERT are matrix for the ensemble model, the results are as follows:
also trained for text classification, using an MLP for the True Negative is 11858, False Positive is 150, False Negative
output layer. The training times for these models are recorded is 565, True Positive is 11427. We observed that the model
using the tqdm [10] toolkit. The training is conducted on a had significantly more False Negatives than False Positives,
CPU, specifically a Ryzen 5800H, and each model is fixed to indicating that it more often misclassified posts that were
train for three epochs. positive in sentiment as negative. Additionally, the content of
students' social media may include Internet slang and
V. MODEL RESULTS culturally specific ways of expressing emotions, which may
To answer the question 1, we compare the prediction challenge the model's ability to accurately identify sentiment,
effect of the base model and the prediction effect of ensemble resulting in a high rate of False Negatives.
learning. The prediction effect of ensemble learning is shown
VI. CONCLUSION
in Table I, and the prediction effect of the base model is
shown in Table II. This paper presents a novel ensemble learning network
based on BERT. We trained a set of BERT-based learners and
TABLE I MODEL EVALUATION FOR ENSEMBLE BERT combined them using the majority voting method. We
Evaluation index Value
proposed three questions and applied this method to classify
Accuracy 0.9702 emotional tendencies in the social network texts of middle
Precision 0.9870 school students. Experimental results indicate that ensemble
Recall 0.9529 learning shows some improvement over the base model, but
F1-score 0.9697
the performance of the ensemble learning model composed
of three single-layer BERT models is roughly equivalent to
TABLE II MODEL EVALUATION FOR BASE MODEL BERT
that of a three-layer BERT model, although it requires
Evaluation index Value additional training time. Therefore, in terms of predictive
Accuracy 0.9612 effectiveness, deeper networks should be considered the
Precision 0.9825
Recall 0.9510 preferred choice. However, in terms of interpretability, an
F1-score 0.9665 ensemble of shallow networks may provide an acceptable
From the results, the base model achieved high accuracy solution. In addition, a limitation of this model is that it only
and other indicators. This is due to the powerful decoder discusses the case where N equal to 3. Future research could
architecture of the BERT model. The ensemble model has explore the optimal balance between time and efficiency for
improved the accuracy of the base model to more than 97%. different numbers of layers and discuss the impact of
When comparing various evaluation indicators between the different base models on sentiment classification of student
integrated BERT model and the basic BERT model, the texts.
largest percentage gap occurs in the accuracy rate, with a gap
of approximately 0.94%. This is already a relatively REFERENCES
significant gap in a fairly accurate model. Therefore, we [1] Liu C, Pang Z, Ni G, Mu R, Shen X, Gao W, Miao S, 2023, A
believe that the integrated model does improve experimental comprehensive methodology for assessing river ecological health
based on subject matter knowledge and an artificial neural network,
results. Ecological Informatics, 77, 102199.
To answer the question 2 and 3, the result for BERT's [2] Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao
model with three layers is shown in Table III. J, 2021, Deep learning-based text classification: a comprehensive
review, ACM Computing Surveys (CSUR), 54 1-40.
TABLE III MODEL EVALUATION FOR 3-LAYERS BERT [3] Varghese J, 2020, Artificial intelligence in medicine: chances and
challenges for wide clinical adoption, Visceral Medicine, 36, 443-449.
Evaluation index Value [4] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J, 2013,
Accuracy 0.9707 Distributed representations of words and phrases and their
Precision 0.9937 compositionality, Advances in Neural Information Processing Systems,
Recall 0.9470 26.
F1-score 0.9699

361
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.
[5] Lauriola I, Lavelli A, Aiolli F, 2022, An introduction to deep learning [8] Devlin J, Chang MW, Lee K, Toutanova K, 2018, Bert: Pre-training of
in natural language processing: Models, techniques, and tools, deep bidirectional transformers for language understanding.
Neurocomputing, 470, 443-456. [9] Polikar R, 2012, Ensemble learning, Ensemble Machine Learning:
[6] Church KW, 2017, Word2Vec, Natural Language Engineering, 23, Methods and Applications 1-34.
155-162. [10] Chen XGH, 2023, Wearable sensors for human activity recognition
[7] Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y, 2006, A closer look based on a self-attention CNN-BiLSTM model, Sensor Review, 43,
at skip-gram modelling, In LREC, Vol. 6, 1222-1225. 347-358.

362
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.

You might also like