Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture
Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture
Abstract—The mental health assessment of middle school combines the powerful semantic capturing ability of
students has always been one of the focuses in the field of transformers with the efficiency of ensemble learning.
education. This paper introduces a new ensemble learning Ensemble learning can increase robustness and reduce bias,
network based on BERT, employing the concept of enhancing while shallow networks can facilitate interpretive analysis.
model performance by integrating multiple classifiers. We
trained a range of BERT-based learners, which combined using This paper aims to investigate the following questions,
the majority voting method. We collect social network text data and the N is the number of single-layer BERT base model:
of middle school students through China's Weibo and apply the
method to the task of classifying emotional tendencies in middle
Question 1: Can an ensemble of N single-layer base
school students' social network texts. Experimental results models achieve good results in student sentiment
suggest that the ensemble learning network has a better classification?
performance than the base model and the performance of the Question 2: Does the predictive ability of an ensemble
ensemble learning model, consisting of three single-layer BERT model with N single-layer base models surpass that of
models, is barely the same as a three-layer BERT model but training a single N-layered BERT model?
requires 11.58% more training time. Therefore, in terms of
balancing prediction effect and efficiency, the deeper BERT Question 3: Is the training time for an ensemble of N
network should be preferred for training. However, for single-layer base models shorter than training a single N-
interpretability, network ensembles can provide acceptable layered BERT model when there is equal predictive accuracy?
solutions.
II. DATASET
Keywords—Social network text sentiment classification,
We demonstrate the effectiveness of our approach by
natural language processing, BERT, Ensemble learning
applying it to sentiment trend analysis tasks on social network
I. INTRODUCTION text. The single-layer decoder architecture of the BERT
model as the base model of the ensemble is used, and the
Research based on neural network technology has classification results are output using an average voting
gradually become more and more widely used in many fields method. To simplify the experiment, we select N(number) as
[1]. With the rapid growth of Deep Learning, especially in the 3. The data is social network data from students in three
field of Natural Language Processing (NLP), a growing middle schools in Xiangtan City, Hunan Province, China.
number of researchers are exploring the implementation of Weibo is the largest social network platform in China. After
deep networks for NLP tasks, especially text classification [2]. obtaining students' consent, students uploaded their Weibo
Deep networks, such as the transformational BERT, have accounts anonymously. We collected the account data of 324
achieved unprecedented performance in several NLP tasks. students and captured 100 Weibo text contents of each
However, significantly increasing the depth of networks student in the last 3 years. If there are not 100 pieces of
often results in a substantial rise in computational costs, content, all tweet content will be used. Finally, a total of
which is particularly problematic in resource-limited 30012 pieces of data were included. The students and their
scenarios. Additionally, deep networks face certain barriers parents approved and consented to this data, and the students
in terms of interpretability [3]. Therefore, this study and psychological education experts double-checked the data
introduces a novel ensemble learning framework that annotation. Next, we will present related work, followed by
360
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.
The specific steps of the experiment are as follows: When comparing various evaluation indicators between
Python is used for the experimental processing, and pandas the 3-layer BERT model and the integrated BERT model, the
are used to read the data, which contains two columns: one largest percentage gap occurs in precision. The difference is
for the review texts and the other for the sentiment labels. The about 0.68%, and the accuracy difference is only 0.05%, so
review texts are then tokenized using the Chinese BERT we assume the prediction performance of these two methods
tokenizer (Bert-base-Chinese). The tokenized inputs and is basically the same. However, it can be seen from Table IV
sentiment labels are then converted into tensors and a tensor that under the same effect, the training time of ensemble
dataset is created. This dataset is then divided into training learning exceeds the training time of deep network by about
and validation sets. Three basic BERT models, each with a 11.58%. Therefore, under the same circumstances, choosing
single hidden layer, are looped and trained. Each model is a deeper neural network may be a better choice.
independently trained using the train dataset and evaluated
using the validate dataset. The weights of the models are TABLE IV MODEL TRAINING TIME RESULTS
initialized with the built-in initial weights of the BERT model, Training Time Accura Accuracy per
and the order of the data batches is randomized in each epoch Model
(min) cy min
during training. The predictions of all models are then Ensemble BERT 212 0.9702 0.0046
combined using the majority voting method to determine the 3-layer BERT 190 0.9707 0.0051
Standard BERT(12
final prediction labels. Metrics such as the accuracy, layers)
792 0.9982 0.0013
precision, recall, F1 score, and confusion matrix of the
ensemble prediction are calculated. For comparative research, When comparing our final analysis of the confusion
a three-layer BERT network and a full 12-layer BERT are matrix for the ensemble model, the results are as follows:
also trained for text classification, using an MLP for the True Negative is 11858, False Positive is 150, False Negative
output layer. The training times for these models are recorded is 565, True Positive is 11427. We observed that the model
using the tqdm [10] toolkit. The training is conducted on a had significantly more False Negatives than False Positives,
CPU, specifically a Ryzen 5800H, and each model is fixed to indicating that it more often misclassified posts that were
train for three epochs. positive in sentiment as negative. Additionally, the content of
students' social media may include Internet slang and
V. MODEL RESULTS culturally specific ways of expressing emotions, which may
To answer the question 1, we compare the prediction challenge the model's ability to accurately identify sentiment,
effect of the base model and the prediction effect of ensemble resulting in a high rate of False Negatives.
learning. The prediction effect of ensemble learning is shown
VI. CONCLUSION
in Table I, and the prediction effect of the base model is
shown in Table II. This paper presents a novel ensemble learning network
based on BERT. We trained a set of BERT-based learners and
TABLE I MODEL EVALUATION FOR ENSEMBLE BERT combined them using the majority voting method. We
Evaluation index Value
proposed three questions and applied this method to classify
Accuracy 0.9702 emotional tendencies in the social network texts of middle
Precision 0.9870 school students. Experimental results indicate that ensemble
Recall 0.9529 learning shows some improvement over the base model, but
F1-score 0.9697
the performance of the ensemble learning model composed
of three single-layer BERT models is roughly equivalent to
TABLE II MODEL EVALUATION FOR BASE MODEL BERT
that of a three-layer BERT model, although it requires
Evaluation index Value additional training time. Therefore, in terms of predictive
Accuracy 0.9612 effectiveness, deeper networks should be considered the
Precision 0.9825
Recall 0.9510 preferred choice. However, in terms of interpretability, an
F1-score 0.9665 ensemble of shallow networks may provide an acceptable
From the results, the base model achieved high accuracy solution. In addition, a limitation of this model is that it only
and other indicators. This is due to the powerful decoder discusses the case where N equal to 3. Future research could
architecture of the BERT model. The ensemble model has explore the optimal balance between time and efficiency for
improved the accuracy of the base model to more than 97%. different numbers of layers and discuss the impact of
When comparing various evaluation indicators between the different base models on sentiment classification of student
integrated BERT model and the basic BERT model, the texts.
largest percentage gap occurs in the accuracy rate, with a gap
of approximately 0.94%. This is already a relatively REFERENCES
significant gap in a fairly accurate model. Therefore, we [1] Liu C, Pang Z, Ni G, Mu R, Shen X, Gao W, Miao S, 2023, A
believe that the integrated model does improve experimental comprehensive methodology for assessing river ecological health
based on subject matter knowledge and an artificial neural network,
results. Ecological Informatics, 77, 102199.
To answer the question 2 and 3, the result for BERT's [2] Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao
model with three layers is shown in Table III. J, 2021, Deep learning-based text classification: a comprehensive
review, ACM Computing Surveys (CSUR), 54 1-40.
TABLE III MODEL EVALUATION FOR 3-LAYERS BERT [3] Varghese J, 2020, Artificial intelligence in medicine: chances and
challenges for wide clinical adoption, Visceral Medicine, 36, 443-449.
Evaluation index Value [4] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J, 2013,
Accuracy 0.9707 Distributed representations of words and phrases and their
Precision 0.9937 compositionality, Advances in Neural Information Processing Systems,
Recall 0.9470 26.
F1-score 0.9699
361
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.
[5] Lauriola I, Lavelli A, Aiolli F, 2022, An introduction to deep learning [8] Devlin J, Chang MW, Lee K, Toutanova K, 2018, Bert: Pre-training of
in natural language processing: Models, techniques, and tools, deep bidirectional transformers for language understanding.
Neurocomputing, 470, 443-456. [9] Polikar R, 2012, Ensemble learning, Ensemble Machine Learning:
[6] Church KW, 2017, Word2Vec, Natural Language Engineering, 23, Methods and Applications 1-34.
155-162. [10] Chen XGH, 2023, Wearable sensors for human activity recognition
[7] Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y, 2006, A closer look based on a self-attention CNN-BiLSTM model, Sensor Review, 43,
at skip-gram modelling, In LREC, Vol. 6, 1222-1225. 347-358.
362
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 07,2025 at 04:38:16 UTC from IEEE Xplore. Restrictions apply.