Can We Predict Student Performance Based On Tabular and Textual Data
Can We Predict Student Performance Based On Tabular and Textual Data
Received 22 July 2022, accepted 7 August 2022, date of publication 16 August 2022, date of current version 22 August 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3198682
ABSTRACT With the emergence of more new teaching systems, such as Massive Open Online Courses
(MOOCs), massive amounts of data are constantly being collected. There is a huge value in these massive
teaching data. However, the data, including both student behavior data and student comment data about the
course, is not processed to discover models and paradigms which can be useful for school management.
There is no multimodal dataset with tabular and textual data for educational data mining yet. We first collect
a dataset that included student behavior data and course comments textual data. Then we fuse the student
behavior data with course comments textual data to predict student performance, using a Transformer-based
framework with a uniform vector representation. The empirical results of the collected dataset show the
effectiveness of our proposed method. In terms of F1 and AUC the performance of our method improves by
up to 3.33% and 4.37% respectively. We find that the uniform feature vector representation learned by our
proposed method can indeed improve the classifier’s performance, compared with existing works. Further,
we validate our approach on an open dataset. The results of the empirical study show that our proposed
method has a strong generalization capability. Moreover, we perform interpretability analysis using the
SHapley Additive exPlanation (SHAP) method and find that text features have a more important influence
on the classification model. This further illustrates that fusing text features can improve the performance of
classification models.
INDEX TERMS Educational data mining, deep learning, multimodal, data fusion, random forest.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
86008 VOLUME 10, 2022
Y. Qu et al.: Can We Predict Student Performance Based on Tabular and Textual Data?
for educational institutions today [3]. The applications of timodal dataset for EDM, but the MUTLA dataset is not
data mining techniques for the specific data from educational open now. Cano et al. developed a multiview early warn-
environments are called educational data mining (EDM) [4]. ing system built with comprehensible Genetic Programming
EDM data comes from a wide variety of educational sys- classification rules adapted to specifically target underrep-
tems, such as traditional face-to-face education, computer- resented and underperforming student populations. The sys-
based educational systems and blended learning systems. tem integrated many student information repositories using
Each of the different educational systems provides different multi-view learning to improve the accuracy and timing of
data sources [5]. Using machine learning techniques, such as the predictions [16].
clustering, text mining, and classification techniques, these There are no open multimodal educational assessment
different types of data are analyzed to solve various edu- datasets available. To address the lack of multimodal datasets,
cational problems. The taxonomy comprises thirteen tasks we collected multimodal data from several teaching man-
addressed by EDM systems, including predicting student agement systems and MOOCs platforms. The data includes
performance, detecting undesirable student behaviors, pro- student behavior as well as students’ course comments. The
filing and grouping students, social network analysis, pro- reason for choosing course comments instead of other data
viding reports, creating alerts for stakeholders, planning and formats such as audio or video data is that the course com-
scheduling, creating courseware, developing concept maps, menting module exists in most MOOCs platforms. This
generating recommendation, adaptive systems, evaluation makes data collection less expensive and our proposed
and scientific inquiry [6], [7]. multimodal data fusion model has a strong generalization
D’Mello discussed the ubiquity and importance of emo- capability.
tion to learning [8]. The emotions may not always be con- Research on multimodal data fusion has focused more on
sciously experienced, but they existed and influenced cog- the processing of text and images [17], however, the educa-
nition nonetheless [9]. Language can express feelings very tional multimodal data fusion has not fully been exploited.
well, so text mining-based sentiment analysis techniques To address the problem of heterogeneous data mining, stu-
have great potential for analyzing the relationship between dents’ behavior data and comment textual data are col-
students’ thoughts and learning experiences. Yang et al. lected and manually aligned. Then a multimodal data fusion
applied sentiment analysis techniques on students’ posts on approach is designed to fuse structured students’ behavior
MOOCs courses. They found that a negative correlation data and unstructured students’ comment textual data into
between the ratio of positive to negative terms and dropout a unified semantic representation to predict student perfor-
across time [10]. Methods to automatically identify student mance. Based on the dataset we collected, we conducted an
confusion were developed from MOOCs posts [11]. This empirical study. The study results show that the classification
analysis method only uses unimodal data; MOOCs are now method can achieve better classification results in terms of
able to provide researchers with multimodal data, including RECALL, F1 and AUC.
students’ behavioral data, textual data, audio, video, brain- In our study, to better elucidate our proposed research
wave data, and more. idea of multimodal data fusion for educational data mining,
DataShop dataset was one of the first and biggest datasets we design the following four research questions (RQs):
that also provided a tool for intelligent tutoring systems [12]. RQ1: Whether a multimodal dataset can be used to obtain
While the student learned from the software, the student’s a better classification model than a unimodal dataset?
actions and the tutor’s responses were stored in a log database RQ2: Can our proposed method outperform other data
or file, which was imported into DataShop for storage and fusion methods when performing teaching effectiveness eval-
analysis. Graphical Interactive Student Monitoring Tool for uation?
Moodle (GISMO) is another popular public dataset and is RQ3: Does our proposed model have strong a generaliza-
a graphical interactive monitoring tool that provides use- tion ability?
ful visualization of students’ activities in online courses RQ4: Can we perform interpretable analysis on our pro-
to instructors. With GISMO instructors can examine vari- posed deep multimodal data fusion model?
ous aspects of distance students, such as the attendance to In summary, the contributions of this paper can be summa-
courses, reading of materials and submission of assignments. rized as follows:
Users of the popular learning management system Moodle
may benefit from GISMO for their teaching activities [13]. • To the best of our knowledge, we are the first to propose
Unimodal sentiment features and classifications (e.g., text, the use of student behavior data with course comments
audio, and video) are used for sentiment discovery and anal- textual data to predict student performance.
ysis (SDA) [14]. The Multimodal Teaching and Learning • we are the first to propose an open dataset that includes
Analytics (MUTLA) dataset was very well described and student behavior data as well as course comments textual
covered many academic subjects (i.e., Mathematics, English, data.
Physics and Chemistry). User records at question level log • We are the first to propose a Transformer-based frame-
of student responses, brainwave data and webcam data were work for creating deep multimodal data fusion algo-
collected [15]. The MUTLA dataset is the first rich mul- rithms with a uniform vector representation.
• Empirical results on real-world datasets show the effec- EDM/LA techniques in terms of chi-squared, information
tiveness of our proposed method. gain, symmetrical uncertainty, information gain ratio and
The rest of this paper is organized as follows. Section II Weighted Information gain. The data source was collected
introduces the background of educational data mining and in one per semester from the Spring of 2015 to the Fall
multimodal data fusion. Section III describes our proposed of 2017 [21]. Chui et al. stated that improved conditional
method in detail, including the framework of deep teach- generative adversarial network based deep support vector
ing quality assessment based on multimodal data fusion, machine (ICGAN-DSVM) algorithm was proposed to predict
the Transformer-based semantic representation of course students’ performance under supportive learning via school
comment texts and deep multimodal data fusion algorithm. and family tutoring [22]. For learning management systems,
Section IV reports our experimental setup, including exper- Partial Least Squares Structural Equation Model (PLS-SEM)
imental subjects, performance evaluation measures, strate- was used to analyze collaborative learning and to predict
gies for experimental comparison, and experimental design. the team grade in teamwork groups. The data source was
Section V discusses the results of our experiments. Section VI collected from a CS2 course [23]. For e-Learning Man-
analyzes the potential threats to the validity of our empirical agement Systems, an interpretable rule-based Genetic Pro-
results. Section VII concludes the paper with some future gramming classifier was used to predict student performance
work. and students at risk as soon as possible to intervene early
to facilitate student success in terms of Geometric mean,
AUC, and Kappa. The student data was from the Virginia
II. BACKGROUND AND RELATED WORK Commonwealth University [16]. In addition to analyzing
In this section, we mainly discuss the related studies on computer-based educational systems from students’ behavior
educational data mining, sentiment analysis, and multimodal data, sentiment analysis during online learning was also used
data fusion. to predict learning performance.
a multiview early warning system built with comprehen- The student learning process data of different modalities
sible Genetic Programming classification rules adapted to contain rich user information, and data mining can be per-
target underrepresented and underperforming student popu- formed for the student learning process data of different
lations [15]. The system integrated many student informa- modalities to build a student teaching quality assessment
tion repositories using multi-view learning to improve the model. The framework for deep education quality assessment
accuracy and timing of the predictions [16]. For MOOCs is shown in Figure 2, which is based on the semantic vector
courses, student behavior data can be obtained from the logs representation of students’ comment text, as well as stu-
of the software system, and course comments can reflect the dents’ behavior data. In the deep teaching quality assessment
emotional state of the student learning process. The datasets framework, the problem of predicting student performance is
for student behavior and course comments are easy to be formalized as a binary classification problem, and the model
collected and the cost of collecting these data is manage- classifies the results as excellent learning effect or average
able compared to collecting brainwave data, video data, etc. learning effect. The feature vector classification function is
Therefore, fusing student behavior data with course com- defined as:
ments can better reflect the learning process of students
y0 = argmaxc∈{0,1} fθ (x) (1)
and enable the prediction of student performance. Previous
research in educational data mining has been conducted in In Equation 1, x represents the input student learning status
a relatively isolated manner, either from student behavioural data, including student behavior data, such as MOOC learn-
data or from the perspective of student sentiment analysis. ing progress, learning progression for objective practice ques-
It is difficult for such studies to comprehensively measure the tions, etc., and also includes the students’ course comments,
behaviour of students during their online learning process. for example, the student’s comment, ‘‘The course is rather
Especially with the popularity of MOOCS, more and more obscure and covers a lot of underlying principles.’’. Student
students are involved in the learning process and they express behavior data and course comments are persistently stored in
their attitudes towards the course by leaving comments. a relational database from online education platforms, such
These student comments and student behaviour provide a as MOOC and SPOC Academy, as well as from third-party
good basis for our data modelling. we can predict student open data interfaces, such as Golden Classroom. fθ (.) denotes
performance based on tabular and textual data. the classifier obtained by historical training data of student
learning, such as random forest, etc. The training data of
III. OUR PROPOSED METHOD
the model is done by aligning multiple databases, and the
In this section, we first briefly describe the framework of excellent learning effect is labeled as 1, and the average
deep teaching quality assessment based on multimodal data learning effect is labeled as 0. For the training dataset Dtr ,
fusion; then, the Transformer-based semantic representation a dataset containing N training samples is defined Dtr =
of review texts and deep multimodal data fusion algorithm is {xn , yn }N
n=1 , the samples are labeled yn ∈ {0, 1},and the train-
proposed. ing samples xn = xn1 , xn2 , xn3 , xn4 , xn5 , xn6 , xn7 , xn1 to xn6 denote
the behavioral characteristics of student learning, including
learning progress (LP), learning progression for objective
A. FRAMEWORK FOR DEEP EDUCATION QUALITY practice questions (LPO), learning progression for subjec-
ASSESSMENT BASED ON MULTIMODAL FUSION DATA tive practice question (LPS), in-class discussion participation
Online education platforms, like MOOCs, provide a fast, (DP), number of posts and number of replies respectively. The
interactive platform for educational data mining. From the definitions of each behavioral characteristic are as follows.
MOOCs platform, students’ learning process data can be Number of studied chapters
collected, including both student behavior data and student LP = (2)
Total number of course chapters
interaction information, such as student comments on the Number of completed objective questions
learning course. The data can be extracted from relational LPO = (3)
Total number of objective questions
databases at a low cost. We can intuitively feel that students
Number of completed subjective questions
who study hard will be more motivated to complete their LPS = (4)
assignments and will eventually achieve better performance. Total number of subjective questions
In addition, we can also just get the students’ learning status Number of submitted class exercises
DP = (5)
from their course comment text. For example, students who Total number of class exercises
are more optimistic about their course tend to have more The number of posts and the number of replies refer
positive attitudes toward learning, leading to better academic to the number of posts made by students in the forum of
performance. The process of extracting data from a relational the MOOC platform. The above data features are collected
database is shown in Figure 1. Based on the Transformer from the MOOC platform, which are exported after stu-
architecture’s powerful learning capability of natural lan- dents finish a course on the MOOC platform. xn7 represents
guage, we have the potential to learn more information about the one-dimensional feature vector of students’ course com-
students’ learning status from course comments, which will ments, as shown in Figure 2, which is computed from a deep
ultimately enhance deeper mining of student learning data. semantic vector learning model based on Transformer. xn is
represented as a uniform feature vector of student learning information entropy calculation, |Dn | indicates the number of
state data. Multiple decision trees are constructed to form a samples for a given classification for a selected characteristic,
|Dn |
random forest-based on the training data Dtr and ensemble |D| indicates the probability of a classification for a selected
learning are used in the random forest. xn7 differs from other feature, H (Dn ) denotes the empirical information entropy
features in that its conditional entropy must be calculated of D.
considering the domain feature migration of the Transformer
network, and the parameters of the Transformer network are B. THE TRANSFORMER-BASED SEMANTIC
determined based on the training data of the review text, and REPRESENTATION OF REVIEW TEXTS
its specific calculation formula is as Eq. 6. The Transformer architecture has gained wide application
N
! in natural language processing. The pre-trained BERT mod-
X |Dn | els can achieve better classification performance after fine-
H (D | A) = wθ H (Dn ) (6)
|D| tuning domain-specific data, and its classification is done by
n=1
computing cross-entropy loss functions on feature vectors
wθ represents the Transformer network that determines the by a linear classifier [26]. The attention mechanism and
optimal network parameters, H (D | A) denotes the empirical the feature vector representation of text provide a unified
conditional entropy in the case of condition A of the selected representation for the fusion of multimodal data, such as
Algorithm 1 Deep Multimodal Data Fusion Algorithm TABLE 2. Confusion matrix for predicting student performance.
Input :
training set Dtr = {xn , yn }N n=1 ;
pre-trained model BERT;
Output:
unified features vector representation Rx ;
1 for data in Dtr do
2 Feed forward xn7 in BERT, compute loss value and proposed method on a publicly available dataset and has been
back propagation; used to perform binary classification [27]. The source of the
3 Record the neural network parameters and obtain the reviews is anonymous. Data examples consist of a review,
domain representation of the text; a rating, the clothing category of the product etc.
4 end
5 for data in Dtr do B. PERFORMANCE EVALUATION METRICS
6 Freeze deep neural networks and perform forward There is class imbalance in the teaching quality assess-
pass; ment dataset. We consider three performance metrics: recall,
7 Obtain a one-dimensional semantic vector of F1-measure (F1) and the area under the receiver operating
comment text vtext ; characteristic curve (AUC). The confusion matrix for the
Concatenate, Rxi = xi1 , xi2 , xi3 , xi4 , xi5 , xi6 , vtext ;
8 teaching quality assessment dataset is shown in Table 2, TP
9 end (true positive) indicates that the sample with average learning
x
10 Use R to train random forest classifier RFquality ; effect is correctly predicted as average, FN (false negative)
indicates that the sample with average learning effect is incor-
rectly predicted as excellent, FP (false positive) indicates
TABLE 1. The teaching quality assessment dataset.
that the sample with excellent learning effect is incorrectly
predicted as average, and TN (true negative) indicates that the
sample with excellent learning effect is correctly predicted as
excellent.
TP
precision = (9)
TP + FP
IV. EXPERIMENTAL SETUP TP
recall = (10)
In this section, we introduce the experiment setup, including TP + FN
experimental subjects, performance evaluation metrics, mul- FP
FPR = (11)
timodal data fusion methods and experimental design. FP + TN
2 × (precision × recall)
F1 = (12)
A. EXPERIMENTAL SUBJECTS precision + recall
To compare the data fusion methods, we collected one dataset The AUC is calculated as the area formed by the Receiver
for predicting student performance and used one publicly Operating Characteristic (ROC) curve and the coordinate
available dataset to evaluate the generalizability of our pro- axis, with the maximum value not exceeding 1. The larger the
posed method. AUC value, the better the classification effect. The TPR indi-
The first dataset we collected comes from the MOOCs cates the percentage of samples that are correctly predicted
platform we are using. The courses are intended for college as average learning effect among all samples that are actually
students. The collected data comes from three teaching sys- average learning effect, and its value is equal to recall; the
tems, including a MOOCs platform, the student course eval- FPR indicates the percentage of samples that are incorrectly
uation system and academic management system. Learning predicted as average learning effect among all samples that
progress, learning progression for objective practice ques- are actually excellent learning effect.
tion, learning progression for subjective practice question, To statistically evaluate the detailed results, we first
in-class discussion participation, number of posts and num- employ the Friedman test to determine whether there are sta-
ber of replies were obtained from a MOOCs platform; the tistically significant differences among compared methods.
students’ comments were obtained from the student course If there is a statistically significant difference, the post-hoc
evaluation system. Students’ course grades were obtained Nemenyi test is applied to compare the difference.
from academic management system. A brief description of When the null hypothesis is rejected, the average rank
the first teaching quality assessment dataset is described in should be calculated and compared with the critical distance
Table 1. (CD).
The second dataset is Women’s E-Commerce Clothing r
Reviews dataset, collected by Nick Brooks in 2018. This k × (k + 1)
dataset is used to evaluate the generalization ability of our CD = qa × (13)
6N
86014 VOLUME 10, 2022
Y. Qu et al.: Can We Predict Student Performance Based on Tabular and Textual Data?
other research settings. There are no commercial datasets [2] R. S. Baker and P. S. Inventado, ‘‘Educational data mining and learn-
available for testing yet, and we need to keep an eye ing analytics,’’ in Learning Analytics: From Research to Practice.
Springer, Jan. 2014, pp. 61–75. [Online]. Available: https://link.springer.
on developments based on multimodal tabular and textual com/chapter/10.1007/978-1-4614-3305-7_4
data fusion. [3] M. I. Baig, L. Shuib, and E. Yadegaridehkordi, ‘‘Big data in education:
A state of the art, limitations, and future research directions,’’ Int. J. Educ.
Technol. Higher Educ., vol. 17, no. 1, pp. 1–23, Dec. 2020. [Online]. Avail-
VII. CONCLUSION AND FUTURE WORK able: https://educationaltechnologyjournal.springeropen.com/articles/10.
With the emergence of more new teaching systems, such as 1186/s41239-020-00223-0
[4] B. Bakhshinategh, O. R. Zaiane, S. ElAtia, and D. Ipperciel, ‘‘Educational
MOOCs, massive amounts of data are constantly being col-
data mining applications and tasks: A survey of the last 10 years,’’ Educ.
lected. This massive amount of data is a vast gold mine. How- Inf. Technol., vol. 23, no. 1, pp. 537–553, Jul. 2017. [Online]. Available:
ever, the multimodal data including both student behavior https://link.springer.com/article/10.1007/s10639-017-9616-z
data and student course comments textual data, is not pro- [5] C. Romero and S. Ventura, ‘‘Educational data mining: A survey
from 1995 to 2005,’’ Exp. Syst. Appl., vol. 33, no. 1, pp. 135–146,
cessed to discover models and paradigms which can be Jul. 2007.
useful for school management. All these state data during [6] A. Hernández-Blanco, B. Herrera-Flores, D. Tomás, and
the learning process can reflect the effectiveness of student B. Navarro-Colorado, ‘‘A systematic review of deep learning approaches
to educational data mining,’’ Complexity, vol. 2019, May 2019,
learning. There is no multimodal dataset with tabular data Art. no. 1306039.
and textual data yet. So we first collected an open dataset that [7] M. D. Laddha, V. T. Lokare, A. W. Kiwelekar, and L. D. Netak, ‘‘Per-
included student behavior data as well as course comments formance analysis of the impact of technical skills on employability,’’ Int.
J. Performability Eng., vol. 17, no. 4, p. 371, Apr. 2021. [Online]. Avail-
textual data. We fused student behavior data with course able: http://www.ijpe-online.com/EN/10.23940/ijpe.21.04.p5.371378
comments textual data to predict student performance. Then a [8] C. Lang, G. Siemens, A. Wise, and D. Gasevic. (2017). Hand-
Transformer-based framework for creating deep multimodal book of Learning Analytics. [Online]. Available: https://www.academia.
edu/download/56326181/hla17.pdf
data fusion algorithms with a uniform vector representation [9] A. öhman and J. J. Soares, ‘‘Unconscious anxiety’: Phobic responses to
was proposed. The empirical results of the collected dataset masked stimuli,’’ J. Abnormal Psychol., vol. 103, no. 2, pp. 231–240, 1994.
show the effectiveness of our proposed method in terms [10] M. Wen, D. Yang, and C. P. Rosé. Sentiment Analysis in MOOC Discussion
Forums: What Does It Tell Us? Citeseer. Accessed: Apr. 2022. [Online].
of recall, F1 and AUC. The empirical research indicates Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.
that: (1)our proposed method can fully fuse two different 1.1.660.5804&rep=rep1&type=
kinds of data by learning feature vectors from text and then [11] D. Yang, M. Wen, I. Howley, R. Kraut, and C. Rosé, ‘‘Exploring the effect
of confusion in discussion forums of massive open online courses,’’ in
achieves the best performance. Course comment texts should Proc. 2nd ACM Conf. Learn. Scale, Mar. 2015, pp. 121–130.
be considered when creating student academic assessment [12] (2010). A Data Repository for the EDM Community. [Online].
models; (2)Our proposed method achieves the best classi- Available: https://www.researchgate.net/publication/254199600_A_
Data_Repository_for_the_EDM_Community
fication performance compared to the base methods. This [13] Graphical Interactive Student Monitoring Tool for Moodle.
implies that the uniform feature vector representation learned Accessed: Apr. 2022. [Online]. Available: http://gismo.sourceforge.net/
by our proposed method can indeed improve the classifier’s index.html
[14] Z. Han, J. Wu, C. Huang, Q. Huang, and M. Zhao, ‘‘A review on
performance. sentiment discovery and analysis of educational big-data,’’ Wiley Inter-
Further, we validated our approach on an open cloth- discipl. Rev., Data Mining Knowl. Discovery, vol. 10, no. 1, p. e1328,
ing dataset. The results of the empirical study showed that Jan. 2020. [Online]. Available: https://onlinelibrary.wiley.com/doi/full/
10.1002/widm.1328 and https://onlinelibrary.wiley.com/doi/abs/10.1002/
our proposed method had a strong generalization capabil- widm.1328 and https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.
ity. Moreover, we performed interpretability analysis using 1328
SHAP method and found that text features had more impor- [15] F. Xu, L. Wu, K. P. Thai, C. Hsu, W. Wang, and R. Tong, ‘‘MUTLA:
A large-scale dataset for multimodal teaching and learning analytics,’’
tant influence on the classification model. This further illus- Oct. 2019, arxiv:1910.06078v1.
trated that fusing text features can improve the performance [16] A. Cano and J. D. Leonard, ‘‘Interpretable multiview early warning system
of classification models. adapted to underrepresented populations,’’ IEEE Trans. Learn. Technol.,
vol. 12, no. 2, pp. 198–211, Apr. 2019.
In the future, we will continue to expand our dataset and [17] D. Kiela, S. Bhooshan, H. Firooz, E. Perez, and D. Testuggine, ‘‘Super-
apply our proposed method to other domains to validate its vised multimodal bitransformers for classifying images and text,’’ 2019,
generalization capability continuously. In addition, we will arXiv:1909.02950.
[18] C. Romero and S. Ventura, ‘‘Educational data science in massive open
also continue our in-depth research on the representation of online courses,’’ Wiley Interdiscipl. Rev., Data Mining Knowl. Discovery,
unified feature vectors based on natural language processing vol. 7, no. 1, p. e1187, Jan. 2017.
techniques.We will work on additional ways to fuse data to [19] C. Lang, G. Siemens, A. Wise, D. Gašević, and A. Research. Handbook of
Learning Analytics Society for Learning. Accessed: Apr. 2022. [Online].
improve the classification performance of student learning Available: https://www.solarresearch.com
classification models. [20] J. Campbell, P. DeBlois, D. Oblinger. (2007). Academic Analytics: A
New Tool for a New Era. [Online]. Available: https://er.educause.edu/
articles/2007/7/academic-analytics-a-new-tool-for-a-new-era
REFERENCES [21] Exploring Induced Pedagogical Strategies Through a Markov Decision
[1] C. Romero and S. Ventura, ‘‘Educational data science in massive Process Framework: Lessons Learned. Accessed: Apr. 2022. [Online].
open online courses,’’ Wiley Interdiscipl. Rev., Data Mining Knowl. Available: https://par.nsf.gov/biblio/10105557
Discovery, vol. 7, no. 1, p. e1187, Jan. 2017. [Online]. Available: [22] K. T. Chui, R. W. Liu, M. Zhao, and P. O. de Pablos, ‘‘Predicting students’
https://onlinelibrary.wiley.com/doi/full/10.1002/widm.1187 and https:// performance with school and family tutoring using generative adversar-
onlinelibrary.wiley.com/doi/abs/10.1002/widm.1187 and https://wires. ial network-based deep support vector machine,’’ IEEE Access, vol. 8,
onlinelibrary.wiley.com/doi/10.1002/widm.1187 pp. 86745–86752, 2020.
[23] Z. Li and S. Edwards. (2018). Applying Recent-Performance FANG LI was born in Baoji, China, in 1982. She
Factors Analysis to Explore Student Effort Invested in Programming received the M.S. degree in computer science and
Assignments. [Online]. Available: https://search.proquest.com/openview/ technology from Henan Polytechnic University,
1344abae126cd4240dfdce3764087786/1?pq-origsite=gscholar&cbl= China, in 2011. Since 2014, she has been a Lec-
1976352 turer with the Jiangsu College of Engineering and
[24] M. Birjali, M. Kasri, and A. Beni-Hssane, ‘‘A comprehensive survey on Technology. Her research interests include net-
sentiment analysis: Approaches, challenges and trends,’’ Knowl.-Based work ideological and political education and com-
Syst., vol. 226, Aug. 2021, Art. no. 107134.
puter application.
[25] A. G. Etemad, A. I. Abidi, and M. Chhabra, ‘‘Fine-tuned T5
for abstractive summarization,’’ Int. J. Performability Eng., vol. 17,
no. 10, pp. 900–906, Oct. 2021. [Online]. Available: http://www.ijpe-
online.com/EN/10.23940/ijpe.21.10.p8.900906
[26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
of deep bidirectional transformers for language understanding,’’ 2018, LONG LI (Member, IEEE) received the Ph.D.
arXiv:1810.04805. degree from the Guilin University of Electronic
[27] K. Gu and A. Budhkar, ‘‘A package for learning on tabular and Technology, Guilin, China, in 2018. He is cur-
text data with transformers,’’ in Proc. 3rd Workshop Multimodal rently a Lecturer with the School of Computer
Artif. Intell., 2021, pp. 69–73. [Online]. Available: https://aclanthology. Science and Information Security, Guilin Uni-
org/2021.maiworkshop-1.10 versity of Electronic Technology. His research
[28] L. Fang, Q. Yubin, C. Xiang, L. Long, and Y. Fan, ‘‘A senti- interests include cryptographic protocols, privacy-
ment analysis method based on class imbalance learning,’’ J. Jilin preserving technologies in big data, and the IoT.
Univ. Sci. Ed., vol. 59, no. 4, pp. 929–935, 2021. [Online]. Available:
http://xuebao.jlu.edu.cn/lxb/CN/abstract/abstract4404.shtml
[29] Y. Qu, X. Chen, F. Li, F. Yang, J. Ji, and L. Li, ‘‘Empirical evaluation on the
impact of class overlap for EEG-based early epileptic seizure detection,’’
IEEE Access, vol. 8, pp. 180328–180340, 2020.
XIANZHEN DOU was born in Xuzhou, China,
[30] D. Lakens, ‘‘Calculating and reporting effect sizes to facilitate
in 1987. He received the M.S. degree from the
cumulative science: A practical primer for T-tests and ANOVAs,’’
Frontiers Psychol., vol. 4, p. 863, Nov. 2013. [Online]. Available: School of Electronics and Information, Nantong
/pmc/articles/PMC3840331//pmc/articles/PMC3840331/?report=abstract University, China, in 2013. Since 2019, he has
and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840331/ been a Lecturer with the Information Engineer-
[31] S. S. Sawilowsky, ‘‘New effect size rules of thumb,’’ J. Modern Appl. ing Institute, Jiangsu College of Engineering and
Stat. Methods, vol. 8, no. 2, p. 26, Nov. 2009. [Online]. Available: Technology. His research interests include soft-
https://digitalcommons.wayne.edu/jmasm/vol8/iss2/26 ware engineering and machine learning.
[32] W. Rahman, M. Kamrul Hasan, S. Lee, A. Zadeh, C. Mao, L.-P. Morency,
and E. Hoque, ‘‘Integrating multimodal information in large pretrained
transformers,’’ 2019, arXiv:1908.05787.