0% found this document useful (0 votes)
12 views12 pages

Real-Time EEG-Based Driver Drowsiness Detection Based On Convolutional Neural Network With Gumbel-Softmax Trick

This document presents a novel deep learning framework for real-time driver drowsiness detection using electroencephalography (EEG) signals, employing a convolutional neural network (CNN) integrated with a Gumbel-Softmax technique for efficient channel selection. The proposed model achieves an average accuracy of 80.84% and an F1 score of 79.65% in cross-subject drowsiness identification, outperforming existing state-of-the-art methods. Additionally, a graphical user interface (GUI) has been developed to facilitate practical applications of the model in real-world scenarios.

Uploaded by

brandoncerejo39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

Real-Time EEG-Based Driver Drowsiness Detection Based On Convolutional Neural Network With Gumbel-Softmax Trick

This document presents a novel deep learning framework for real-time driver drowsiness detection using electroencephalography (EEG) signals, employing a convolutional neural network (CNN) integrated with a Gumbel-Softmax technique for efficient channel selection. The proposed model achieves an average accuracy of 80.84% and an F1 score of 79.65% in cross-subject drowsiness identification, outperforming existing state-of-the-art methods. Additionally, a graphical user interface (GUI) has been developed to facilitate practical applications of the model in real-world scenarios.

Uploaded by

brandoncerejo39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1860 IEEE SENSORS JOURNAL, VOL. 25, NO.

1, 1 JANUARY 2025

Real-Time EEG-Based Driver Drowsiness


Detection Based on Convolutional Neural
Network With Gumbel-Softmax Trick
Weibin Feng, Xiaoping Wang , Senior Member, IEEE, Jialan Xie, Wanqing Liu,
Yinghao Qiao, and Guangyuan Liu

Abstract—Nowadays, severe traffic accidents attributed


to driver drowsiness have become increasingly frequent,
prompting a widespread concern among researchers in
electroencephalography (EEG)-based driver drowsiness
detection. However, due to the significant differences
in EEG signals between participants, the prevalence of
redundant information in multichannel EEG data, and the
computational burden in combining channel selection
with neural networks, achieving an accurate and efficient
real-time driver drowsiness recognition remains challenging.
To overcome these limitations, this article proposes a
novel deep learning framework that utilizes a separable
convolutional neural network (CNN) to mine the intricate
spatiotemporal information in EEG signals, combined with
the channel selection layer to jointly optimize EEG channels
and network parameters. This layer employs an efficient
embedded Gumbel-Softmax technique for discrete sampling
and differentiable approximation. To prevent the introduction
of duplicate channels, we impose penalties on the row sums of the selection matrix to encourage the selection neurons
to learn distinct channels, enabling the neural network to train in an end-to-end manner. The proposed model achieves
an average accuracy of 80.84% and an F1 score of 79.65% in cross-subject drowsiness identification for 11 subjects on
the publicly available sustained-attention driving task dataset. Compared to the results of recent relevant works, our
model exhibits superior performance, surpassing state-of-the-art (SOTA) deep learning methods by 1.47%. Furthermore,
building upon the model’s advantages, we have further actualized a real-time driver drowsiness detection graphical user
interface (GUI), providing a practical reference for real-world applications.
Index Terms— Driver drowsiness detection, electroencephalography (EEG), graphical user interface (GUI), Gumbel-
Softmax trick, real-time.

I. I NTRODUCTION a transformative epoch, thereby changing the way humans


RIVEN by the Internet and artificial intelligence (AI) live. While vehicles bring personal portability, they also pose
D technology, the transportation industry has ushered in hidden dangers. According to the recent public statistics of the
National Highway Traffic Safety Administration (NHTSA),
Received 22 July 2024; accepted 29 October 2024. Date of publication there were 42 939 people killed in motor vehicle traffic acci-
12 November 2024; date of current version 2 January 2025. This work dents on U.S. roadways in 2021, with 684 or 1.6 percent
was supported in part by the National Natural Science Foundation of total fatalities involving drowsy drivers, which represents
of China under Grant 62236005 and Grant 61936004. The associate
editor coordinating the review of this article and approving it for publi- an increase of 8.2% from 632 in 2020 [1]. Notably, driver
cation was Dr. Pedro Oliveira Conceição Junior. (Corresponding author: drowsiness refers to a state where individuals experience tem-
Xiaoping Wang.) porary lapses in consciousness or diminished awareness [2],
Weibin Feng and Xiaoping Wang are with the School of Artificial
Intelligence and Automation, the Key Laboratory of Image Process- leading to impaired performance and an elevated risk of
ing and Intelligent Control of Education Ministry of China, and Hubei traffic accidents. Consequently, the effective detection and
Key Laboratory of Brain-inspired Intelligent Systems, Huazhong Uni- timely warning of driver drowsiness are crucial to personal
versity of Science and Technology, Wuhan 430074, China (e-mail:
[email protected]; [email protected]). safety.
Jialan Xie, Wanqing Liu, Yinghao Qiao, and Guangyuan Liu are At present, various technologies, including computer vision-
with the College of Electronic and Information Engineering and the based [3], [4] and physiological signal-based [5], [6], have
Institute of Affective Computing and Intelligent Information Processing,
Southwest University, Chongqing 400715, China. been reported for detecting driver drowsiness. For computer
Digital Object Identifier 10.1109/JSEN.2024.3492176 vision-based methods, the primary focus lies in assessing

1558-1748 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
FENG et al.: REAL-TIME EEG-BASED DRIVER DROWSINESS DETECTION BASED ON CNN 1861

drowsiness by detecting facial landmarks, measuring eye clo- end-to-end approach for driver drowsiness recognition
sure duration, monitoring yawning frequency, and estimating based on EEG signals.
head pose [7]. However, these methods rely heavily on vari- 2) To prevent the selection of duplicate channels,
ations in external conditions, such as occlusion, illumination, we impose penalties on the row sums of weight matrices
and clothing effects, and generally produce unreliable identifi- in the selection neurons, which enhances model effi-
cation results [8]. On the contrary, physiological signal-based ciency and reduces computational costs.
approaches differ in that they assess the driver’s alertness level 3) The performance of our network architecture on driver
and detect drowsy states by analyzing physiological such as drowsiness recognition is better than that of eight other
the electrooculography (EOG), the electrocardiography (ECG), recent works, and, in particular, achieves an average
and the electromyography (EMG). Despite these detection accuracy of 80.84% and an F1 score of 79.65% for
methodologies having made notable strides, they are still cross-subject drowsiness classification, demonstrating
hampered by limitations, including sensitivity to neural activity the effectiveness and broad applicability of our method-
and challenges in wearable comfort and practicality. In com- ology.
parison, electroencephalography (EEG) can more effectively 4) Faced with the scarcity of work simulating EEG-based
measure the brain dynamics associated with drowsiness and real-time drowsiness detection, we leverage the advan-
reflect functional and physiological changes in the central tages of the proposed model with high accuracy and
nervous system [9]. Therefore, EEG-based driver drowsiness efficiency to develop a graphical user interface (GUI)
detection has gradually attracted the attention of researchers. for simulating real-time driver drowsiness detection,
The endeavor to model EEG-based drowsiness recognition providing a valuable application for intelligent driver
predominantly branches into two main categories: machine drowsiness detection in the transportation industry.
learning and deep learning. Initially, driver drowsiness recog- The layout of the remaining part of this article is as
nition was accomplished by extracting handcrafted features follows. Section II reviews research related to EEG-based
from EEG signals in the time–frequency or nonlinear domains, driver drowsiness detection and Gumbel-Softmax methods in
which were fed into conventional machine learning mod- neural networks. The detailed description of the public dataset
els. However, these approaches typically require substantial and approach used in the study is in Section III, which is
domain expertise and yield suboptimal performance. In con- followed by the results and discussion of the validated methods
trast, deep learning, with its advanced architecture, allows in Sections IV and V, respectively. Finally, we conclude the
end-to-end learning of task-relevant high-level feature repre- entire paper in Section VI.
sentations directly from raw EEG data, thus achieving superior
performance. Convolutional neural networks (CNNs), as one II. R ELATED W ORKS
of the prominent deep models, exhibit enhanced efficiency in
This section reviews recent advances in EEG-based driver
modeling EEG signals for classification tasks [10]. Despite
drowsiness recognition and introduces the principles and appli-
the remarkable performance achieved in driver drowsiness
cations of the Gumbel-Softmax trick to neural networks.
recognition based on CNN architecture, it tends to underper-
form in cross-subject classification tasks due to significant
intersubject variability in EEG signals [11]. In addition, A. EEG-Based Driver Drowsiness Detection
EEG signals usually have multichannel data, yet not all With the rapid iteration of neural network technologies,
EEG channels correspond to the brain regions activated EEG-based drowsiness recognition methods have evolved from
by driver drowsiness [12]. Therefore, inputting all channels shallow machine learning to deep architectures, which pro-
will increase the model complexity and computational bur- vides a robust foundation for developing efficient and accurate
den, which is not conducive to real-time driver drowsiness driver fatigue detection systems. In general, shallow models
detection. achieve reasonable prediction ability with minimal complexity,
In response to these limitations, we propose a novel deep with a modest requirement for training data, and rely on
learning framework based on the CNN and Gumbel-Softmax extracting features according to prior knowledge or expert-
trick. First, we employ the Gumbel-Softmax discrete sampling informed approaches [13]. On the contrary, deep learning
to learn channel selection in an end-to-end manner from models incorporate a learned representation of the data and
multichannel EEG data, which is relevant to the drowsiness can find intrinsic feature expressions from the training data.
classification task. This approach effectively reduces data The recognition of drowsiness by shallow machine learn-
dimensionality and enhances computational efficiency. Subse- ing models first requires transforming the raw EEG signal
quently, a separable CNN is utilized to capture crucial feature into a set of feature vectors based on traditional artificial
information from the time–frequency domain. Finally, a fully feature extraction approaches and then feeding them into a
connected (FC) layer and a softmax layer are applied to classifier or regression model for detection. Along these lines,
complete the drowsiness classification. The main contributions Ogino and Mitsukura [14] compared the features extracted
of the article are summarized as follows. from EEG by three methods: power spectral density (PSD),
1) We utilize a separable CNN model to capture the autoregressive (AR), and multiscale entropy (MSE), adopted
spatiotemporal correlations in EEG signals, in conjunc- step-wise linear discriminant analysis (SWLDA) to sieve out
tion with the Gumbel-Softmax trick to optimize the informative PSD features, which were then fed into a support
channel selection and network parameters, achieving an vector machine (SVM) with a radial basis function (RBF)

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
1862 IEEE SENSORS JOURNAL, VOL. 25, NO. 1, 1 JANUARY 2025

kernel to detect drowsiness, achieving a classification accuracy for each class i ∈ D = {1, 2, . . . , N } can be expressed as
of 72.7% through tenfold cross-validation evaluation. Novelly, θi exp(αi /T ) exp(αi /T )
Shahbakhti et al. [15] extracted blink-related features from pi = P N = PN = (1)
j=1 θ j j=1 exp(α j /T )
Z D (θ )
low-channel frontal EEG data using a variational mode method
and compared the synergistic effect of blinking and EEG fea- where T is the temperature parameter and Z D (θ ) denotes the
tures before and after filtering for driver drowsiness detection partition function that normalizes the distribution. Frequently,
based on the SVM classifier, with a significant improvement T sets to 1, making α an unstandardized logarithmic probabil-
in average accuracy of 6.9 percentage points. In another study, ity, and (1) is known as the softmax function. Assuming z is
Chen et al. [16] mined nonlinear features, including sam- a class variable, the Gumbel-Softmax trick can elegantly and
ple entropy (SaEn), approximate entropy (ApEn), and Renyi expediently extract the sample z from a categorical distribution
entropy (RenEn) from the EEG subbands and fused eyelid with probabilities
motion information to recognize drowsiness using extreme z = one_hot{arg max[gi + log( pi )]} (2)
learning machine (ELM), achieving high detection accuracies. i
Given the high variability between individuals, specific EEG where gi is a sample drawn from the Gumbel(0, 1) distribution,
features and shallow models make it difficult to achieve and it can be generated from a uniform distribution by the
outstanding cross-subject drowsiness detection performance. Gumbel distribution inverse, denoted as
The emergence of deep learning models has led to a
gi = − log(− log(u i )) (3)
substantial leap forward in the performance and accuracy
of myriad classification tasks, thereby garnering substantial where u i represents a value sampled from the Uniform(0, 1)
scholarly interest. For instance, Gao et al. [5] introduced distribution. The discrete process is implemented by sampling
core blocks to extract temporal dependencies and dense layer gi from a fixed distribution, removing the nondifferentiable
fusion spatial information from the EEG and proposed a sampling step, and facilitating a differential backpropagation
spatiotemporal CNN (ESTCNN) based on the EEG to detect process within the neural network [22].
driver drowsiness with a classification accuracy of 97.37%. Capitalizing on the strengths of the Gumbel-Softmax trick,
To assess the performance of drowsiness recognition across it was introduced to the redetection module using the VGG19
subjects, Cui et al. [17] designed an interpretable CNN model, network in the visual tracking domain for accurate sampling,
achieving an average accuracy of 78.35% on 11 participants producing a smaller number of output target candidate boxes,
for leave-one-out cross-subject, and explained that the network thereby reducing the computational effort of the algorithm and
identifies biologically significant features from EEG signals. improving the long-term tracking performance [23]. In the
In addition, Chen et al. [18] proposed a self-attention channel- training of a mixture of variational autoencoder (MVAE)
connectivity capsule network (SACC-CapsNet) for EEG-based networks, Ye and Bors [24] proposed an inventive component
driver drowsiness detection to study the critical temporal selection approach to generate dropout masking parameters
information and significant channels, with comparative assess- from a differentiable categorical Gumbel-Softmax distribution
ments validating its superiority over prevailing methodologies. by controlling the number of variational autoencoder (VAE)
Despite these triumphs, deep learning architectures grapple components, ensuring end-to-end backpropagation throughout
with an array of challenges, such as more parameters, vast the network’s training. Confronting the challenge of neural
calculations, and insufficient interpretability, and are also architecture search (NAS) algorithm discontinuity in the sam-
affected by extraneous EEG channels [19]. It is of immense pling process of candidates, Pang et al. [22] devised the
practical significance to use an EEG channel selection strategy Gumbel-Softmax scheme, converted the discrete distribution
for network parameters optimization to construct an efficient logits into continuous probabilities that are both summative
real-time driver drowsiness detection system. to unity and amenable to differentiation, and proposed the
Gumbel-Softmax-based NAS (GS-NAS) to automate the archi-
B. Gumbel-Softmax in Neural Networks tectural design of the hierarchical functional brain network
Categorical variables are innately suited for embodying (FBN) decomposition for deep belief network (DBN). So far,
discrete structures. Nonetheless, due to the inability to back- Gumbel-Softmax approaches have attained heightened preva-
propagate through samples, stochastic neural networks rarely lence within neural networks for addressing differentiable
use categorical latent variables, rendering the training of optimization problems.
models with a formidable task [20]. The introduction of
Gumbel-Softmax addressed this problem by replacing non- III. M ATERIALS AND M ETHODS
differentiable samples from a discrete distribution with a A. Dataset Description
differentiable approximation. To study driver drowsiness detection in the driving environ-
The concrete distribution boasts a pivotal characteristic, ment, we used an openly available sustained-attention driving
which can smoothly transition into a categorical distribution, task dataset released in 2019 by Cao et al. [25]. The dataset
and it is a probability distribution that can assign a probability encompasses 62 EEG recordings of 27 students or staff mem-
to N different classes [21]. There are three distinct parame- bers, aged between 22 and 28 years, from the National Chiao
terizations from the same distribution, namely the normalized Tung University. Volunteers enrolled in a 90-min sustained-
probability p, the nonnormalized probability θ , and the non- attention driving experiment numerous times on the same or
normalized logarithmic probability log θ . The probability p different days.

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
FENG et al.: REAL-TIME EEG-BASED DRIVER DROWSINESS DETECTION BASED ON CNN 1863

The virtual reality (VR) driving environment was built


by a six-degree-of-freedom Stewart motion platform, which
could induce an event-related lane-departure paradigm. Cap-
italizing on participants’ reaction time (RT) as a metric—a
proven correlate of drowsiness in pertinent literature [26].
The experimental procedure simulated night-time driving on a
four-lane highway, where subjects needed to cruise the car Fig. 1. Schematic calculation for different RTs. When the sliding time
down the center of the lane. During this driving process, window is less than 90 s, global_RT also needs to be calculated.
the VR device randomly generated lane departure events at
5–10-s intervals, causing the automobile to drift from the TABLE I
original cruising lane toward the left or right sides (deviation S AMPLE S IZE OF E LIGIBLE S UBJECTS U SED IN THE E XPERIMENT
onset). Participants were promptly challenged to compensate
for this perturbation by steering the wheel (response onset),
causing the vehicle back onto its original trajectory (response
offset). Each lane-departure event was defined as a trial, which
included four sessions: a baseline period, deviation onset,
response onset, and response offset.
During the experiment, 500-Hz EEG signals were recorded
employing the Scan SynAmps2 Express system (Com-
pumedics Ltd., VIC, Australia), which had a wired EEG cap
with 32 Ag/AgCl electrodes, including 30 EEG valid elec-
trodes and two reference electrodes (opposite lateral mastoids).
The EEG electrodes were placed according to a modified
international 10–20 system.

B. Data Preprocessing and Extraction


The study utilized the preprocessed version of the
sustained-attention driving task dataset, which is conveniently
accessible online from [27]. The pretreatment for EEG sig-
nals consists of two main steps. Initially, the raw EEG data
underwent denoising via finite impulse response (FIR) filters
(1-Hz high pass and 50-Hz low pass). Subsequently, artifacts which was the fifth percentile of the local RT in the entire
and obvious blink contamination were removed by manual session [28]. Then, start dividing the sample according to
correction, followed by automatic correction of apparent eye global and alert RT, samples were labeled as “alert” when
blink contamination and muscle artifacts in the EEG signal both local and global RT were shorter than 1.5× alert RT,
employing the automatic artifact removal (AAR) plug-in for whereas they were labeled as “drowsiness” when both local
the EEGLAB toolbox in MATLAB. To enhance the compu- and global RT were above 2.5× alert RT.
tational efficiency, the sampling rate of the EEG signals was To ensure the reliability of the EEG data, we embraced the
subsequently reduced from 500 to 128 Hz. method used in [30] to exclude samples with moderate perfor-
Given the likelihood of heterogeneous mental states exhib- mance for drivers. In brief, each subject had at least 50 trials
ited by participants during the sustained-attention driving task, in both states. When the same participant had multiple driving
we followed the steps presented in [28] for sample extraction sessions, we selected the most equitable class distribution.
and EEG data calibration to assess the level of neurocogni- The samples were balanced between the two categories for
tive drowsiness. Specifically, we extracted 3-s-long EEG data each subject. In this way, we obtained a balanced dataset with
before the onset of the lane-departure event for each trial as 2022 samples from 11 different participants, and the specific
a sample, which had a dimension of 30 (channels) × 384 distribution of samples for each participant is shown in Table I.
(sample points). Each sample could be categorized as “alert”
or “drowsiness” according to the delineation criteria [29] and C. Network Architecture
through computations involving varying RTs. As depicted in During driver drowsiness detection, temporal–spatial fea-
Fig. 1, the time interval between deviation onset and response tures play a crucial role in capturing the dynamic charac-
onset in each lane-departure event was known as RT, also teristics of multichannel EEG signals [31]. The temporal
regarded as the “local_RT.” Furthermore, the degree of sub- dimension encapsulates evolving patterns and transitions over
jects’ drowsiness over an extended period was evaluated by time, while the spatial dimension provides information about
introducing the indicator called “global_RT,” which was the the distribution of brain activities. An overview of the network
mean value of local RT across all trials within the 90-s sliding framework is depicted in Fig. 2. First, to efficiently extract the
time window before the onset of the deviation. For each most relevant channel information, we designed an efficient
driving session, we defined a baseline “alert_RT” to represent and embedded channel selection layer based on the Gumbel-
the RT that the participant could perform during alertness, Softmax trick. Then, a separable CNN is employed to capture

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
1864 IEEE SENSORS JOURNAL, VOL. 25, NO. 1, 1 JANUARY 2025

Fig. 2. Proposed deep learning network architecture based on the CNN and the Gumbel-Softmax trick. xn denotes the feature vector derived from
the EEG channel n. zk represents the output of the k th selection neuron. The green arrows indicate the input with the highest probability of selection
from each neuron.

more comprehensive representations from the temporal–spatial as follows:


domain. Finally, an FC layer and a softmax layer are adopted   Nt
to accomplish the classification. Te e
T (t) = Ts × (5)
1) Channel Selection With the Gumbel-Softmax Trick: To Ts
curtail noise and reduce redundant information from irrelevant
where T (t) denotes the value of the temperature parameter
EEG channels, we introduced a concrete channel selection
T at epoch t. Ts and Te represent the initial and terminal
layer with the Gumbel-Softmax trick [32], which was trained
temperature parameters, respectively. Ne is the number of
simultaneously with the classification network. Assume that
epochs. When the epoch starts, the selection neuron calculates
a dataset D = {(X i , yi )|i = 1, 2, . . . , m} consists of m
the probability of all input channels. With sampling probability
EEG samples X i with corresponding class labels yi . For each
from a particular channel increases, it will perform a selection
X i ∈ R N ×F recorded from N electrodes and F number of
operation based on the highest channel probability at the end
features per channel. Embedded within the channel selection
of the epoch [33].
layer, K selection neurons are arranged in a stacked configu-
A limitation of the described EEG channel selection strat-
ration, in which each neuron takes N channels as input and
egy above is that the selection neurons may yield identical
produces a single output channel [33]. When the network starts
channels, engendering redundant channel information within
training, each selection neuron k is initially parameterized by
the classification network. It could potentially affect overall
a learnable vector αk ∈ R N until an EEG sample X i as input to
network performance and lead to lower classification accuracy.
the network, and it randomly selects a weight vector wk ∈ R N
Hence, it is imperative to improve strategies for channel
from the concrete distribution [34]. The weight vector can be
selection to ensure that each neuron samples unique weights
expressed as follows:
and produces nonrepetitive channel outputs. To the best of our
exp((log(αnk ) + G nk )/T ) knowledge, the Gumbel-Softmax trick was initially introduced
wnk = P N (4) for precision sampling, aiming to reduce the dimensionality of
j=1 exp((log(α jk ) + G jk )/T )
features and enhance computational efficiency [23]. Consider
where G nk is independent and uniformly distributed, and its that the probability of repeated selection is negligible during
samples conform to the Gumbel distribution [35] and the the sampling process. However, in the case of EEG channel
temperature parameter T ∈ (0, +∞) controls the relaxation of selection, there is a high probability of selecting neurons that
the one-hot vector. Each selection neuron, guided by individual output the same channel, giving rise to the problem of repeated
input channels, samples distinct weights and proceeds to channel selection [33].
compute the output channel as z k = wkT X . The weight vector To this end, we integrate a regularization component [37]
wk is infinitely close to 1 when T approaches 0 and then the into the channel selection layer to promote each selection neu-
output of the selection neuron is similar to the input. ron to learn different channel outputs in a single training stage.
Indeed, (4) adopting the softmax function was proposed For each selection neuron’s parameter vector αk , we leverage
to surmount the challenge posed by the nondifferentiable the columns of selection matrix P P to store the normalized val-
of the argmax operation during backpropagation in neural ues of αk such that pnk = αnk / Nj=1 α jk . As the temperature
networks, which could efficiently sample wk from a categorical parameter T approaches 0, the kth column of P represents the
distribution with class probability αk [36]. During network probability distribution of neuron k selecting the input channel.
training, the temperature parameter decreases exponentially. Culminating each epoch, the position with a value of 1 in that
As the epoch increases, the sampled values will converge to column indicates the selected channel. In this way, different
one-hot vectors [21]. The decay formula of the temperature channels can be output by penalizing the sum of rows of the
parameter in the channel selection network can be expressed selection matrix. The penalization term can be described as

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
FENG et al.: REAL-TIME EEG-BASED DRIVER DROWSINESS DETECTION BASED ON CNN 1865

follows:
N K
!
X X
0(P) = δ Relu pnk − θ (6)
n=1 k=1

where δ denotes the regularized loss weight, Relu represents


the rectified linear unit (ReLU), pnk signifies the probability
distribution of a neuron k selecting a particular channel n,
and θ is a threshold parameter. With the network training, θ
undergoes an exponential decay process. When the sum of
the probabilities of selecting a neuron in a channel exceeds θ ,
it is penalized. It will result in choosing distinct channels as
outputs, thus avoiding the phenomenon of duplicate channels.
It is worth noting that this regularization function does not
affect the channel selection strategy, as exceeding the threshold
is the only condition for penalizing channel duplication.
2) Classification Network: As is well-known, the CNN has
been widely applied in numerous domains, ranging from
time-series predicting and signal classification to image recog- Fig. 3. Model evaluation using a nested cross-validation approach.
nition and object detection, among others, with impressive Both the outer and inner loops adopted the leave-one-subject-out cross-
validation (LOSOCV). For each external training data, the optimal
achievements [38]. Therefore, in this study, the classification hyperparameter models were tuned in internal cross-validation followed
network applied is depthwise separable convolution, which by assessing the outer test data. The green and orange rectangu-
lar blocks represent training and test data for each cross-validation
can not only effectively decrease computational resources but separately. The purple boxes indicate the best hyperparameter mod-
also save costs [39]. The network comprises two integral com- els selected from the cross-validation in the inner loop, denoted by
ponents: pointwise convolutions and depthwise convolutions. Mi (i = 1, 2, . . . , 11). Purple arrows represent the prediction for the
outer test data using Mi. Si indicates the ith division of the dataset in
The pointwise convolution (also known as 1 × 1 convolution) the external loop cross-validation. Si_j (j = 1, 2, . . . , 10) denotes the jth
mixes information from different channels, thereby capturing split of corresponding inner loop in the ith split of the outer loop. subIdx
time dimension information, while the depthwise convolution (Idx = 01, 02, . . . , 11) represents the samples of the matching subject
index.
effectively extracts spatial dimension features from each input
channel [40]. The classification network structure contains two
blocks with a total of seven layers, with the first two layers
D. Training Procedure
implementing pointwise and deepwise convolution, followed
by an activation layer, a batch normalization layer, a global To validate the performance of our proposed framework for
average pooling (GAP) layer, an FC layer, and a softmax cross-subject driver drowsiness recognition, a nested LOSOCV
classification layer, in that order. The description of model approach was implemented to obtain an unbiased estimate of
construction in detail is as follows. the model’s performance [43]. For this purpose, we split the
dataset by following the process illustrated in Fig. 3. In brief,
1) For the sample input X ′ after k selection neurons with
each partition will set aside one subject’s sample for testing,
a shape of k × f , which means k selection channels
while the remaining subjects will be used as the training set.
of 1-D signal with a length of f . We use the N1
However, the nested LOSOCV approach needs to divide the
pointwise convolution nodes in the first block to output
training set into training and testing sets again according to
novel signal channels with the same extent and N1 is
the LOSOCV principle to iteratively fine-tune hyperparameters
set to approximately half of k to minimize redundancy
from the inner loop and select the optimal model for evaluating
and expedite network convergence [17]. Sequentially,
the outer loop’s test set.
deepwise convolution operations are employed on these
The experiments were conducted on the Windows 11 plat-
derived N1 channels to capture local features and spatial
form powered by Intel1 Core2 i7-12700F processor and an
information. After the convolution operation, we con-
8-GB NVIDIA GeForce RTX 3070 graphics card. In addition,
struct the activation layer using the ReLU function to
we implemented the network model by calling the PyTorch
converge the network and maintain global stability [41]. library using Python 3.11.4. In the training phase, we used a
Next, a batch normalization layer [42] can eliminate batch size of 64 for 80 epochs and employed the Adam opti-
the internal covariate shift by normalizing each feature mizer [44] with an initial learning rate of 0.001 to minimize
dimension in a small batch, and a GAP layer compresses the cross-entropy loss function. Considering that too large
its data to reduce model parameters. an epoch may cause model overfitting phenomena, we used
2) Another block concludes with an FC layer and a an early stopping strategy to truncate iterations once the
softmax layer, where the FC layer consolidates the
cost function ceased improving significantly [45]. For the
high-dimensional representations from its preceding lay-
parameters of the channel selection layer, we chose a value
ers and feeds them into the softmax layer to estimate
the probabilities associated with the driver’s two states 1 Registered trademark.
of alertness and drowsiness. 2 Trademarked.

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
1866 IEEE SENSORS JOURNAL, VOL. 25, NO. 1, 1 JANUARY 2025

Fig. 4. Average performance for 11 subjects at training epochs Fig. 5. Accuracy, precision, recall, and F1-score for each subject
from 1 to 200. calculated using the nested LOSOCV method.

TABLE II
of 0.1 for the regularization weight δ, then set the temperature M EAN AND S TANDARD D EVIATION OF A SSESSMENT
parameter T to 10 and decay to 0.1 as the epoch increased, I NDICATORS FOR A LL S UBJECTS
the regularization threshold θ decayed from 3 to 1.1 [33], and
finally adjusted the number of selection neurons to obtain the
best recognition result.

IV. R ESULTS
A. Overall Performance of Drowsiness Recognition
As for the evaluation criteria for model performance,
we adopt a quartet of universal metrics, such as accu-
all exceeding 80%. It indicates that the overall performance of
racy (Acc), precision (Pre), recall (Rec), and F1-score (F1),
the model has achieved a favorable result at the current training
to comprehensively assess the classification performance of
epoch number. Subsequently, as the number of training epochs
the network.
ranges from 100 to 200, the overall performance evaluation
Considering the influence of factors such as gender, age,
curve appears to fluctuate up and down.
and physical condition among different subjects, the models
In this study, we determined the optimal number of epochs
trained in the inner loop of the nested LOSOCV approach
to be 80 by analyzing the results in Fig. 4. In addition,
exhibit variations. To select the optimal model, we employed
we evaluated the classification performance of driver drowsi-
a strategy similar to a voting mechanism. Specifically, we ana-
ness for each participant using the LOSOCV method, with
lyzed the model validation outcomes from the internal loop
the selection neuron count fixed at 8, as depicted in Fig. 5.
and conducted a vote for the four evaluation metrics (Acc,
During each test stage, we calculated the average Acc, Pre,
Pre, Rec, and F1) across different subject validation results.
Rec, and F1. As can be observed from Fig. 5, the precision
The model corresponding to the maximum value received one
for each subject was higher than the recall, indicating the better
additional vote, and the model amassing the highest votes was
performance of our proposed model in identifying drowsiness
considered as optimal. In the case of a tie in the vote count,
states. In practical driving scenarios, we would prefer higher
the model boasting the supreme F1 was adjudged optimal.
precision values to minimize spurious classification of drowsi-
Since the F1 takes both Pre and Rec into account, it is a
ness as alertness [47]. However, due to individual differences
comprehensive performance index for evaluating classification
among the subjects, the model’s predictive ability significantly
algorithms [46].
decreased for subjects indexed as 02, 07, and 11, which is
Through preliminary training and validation of our proposed
similar to the prior work [48]. In addition, Table II presents
model, we observed that the model exhibited excellent per-
the average classification performance and standard deviations
formance in the driver drowsiness classification task when
of all participants to analyze the differences in the results.
configured with eight selection neurons. To optimize the train-
Our proposed model achieved an Acc of 80.84% and an F1
ing effect of the model, we focused on the selection of epoch
of 79.65%, with corresponding standard deviations of 8.22%
counts. Fig. 4 shows the trend of the average performance
and 8.39%, respectively. It indicates that our model attains
of all subjects on the training epochs from 1 to 200, under
a satisfying performance in cross-subject driver drowsiness
the condition with selection neurons of 8. The graph demon-
detection, thereby validating its robustness and practical utility.
strates that as the number of epochs increases, the average
performance of the model exhibits an overall upward trend
until reaching an optimal epoch count. It is worth noting that B. Comparison With Existing Works
when the number of training epochs is around 80, the values of To highlight the superiority of our proposed model for
four performance evaluation metrics obtain an extreme value, EEG-based cross-subject drowsiness detection within the

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
FENG et al.: REAL-TIME EEG-BASED DRIVER DROWSINESS DETECTION BASED ON CNN 1867

TABLE III
C OMPARISON OF THE EEG-B ASED C ROSS -S UBJECT C LASSIFICATION ACCURACIES (%) B ETWEEN THE P ROPOSED M ODEL AND R ECENT
E XISTING W ORK ON THE S USTAINED -ATTENTION D RIVING TASK DATASET

driving context, we compared it with other recent advanced and separable CNN architecture to mine the potential informa-
baselines, which are briefly described below. tion of EEG channels and extract the crucial features hidden
in the spatial and temporal domains of multichannel EEG
1) CAGNN [49]: It uses a self-attention mechanism-based
signals. Pitted against the state-of-the-art (SOTA) TFormer
connection-aware graph neural network and introduces
model introduced by Li et al. [55], our proposed network
a squeeze-and-excitation block to capture important fea-
achieves competitive performance, with an average accuracy
tures, generating task-relevant connected networks via
improvement of 1.16%. This further attests to the capability of
end-to-end training.
our framework to learn more discriminative and task-relevant
2) Tree-FL+CNN-GAP [50]: A fusion model based on
feature representations for driver drowsiness detection through
tree-based federated learning and CNNs.
an end-to-end approach. Although the reduction in available
3) Conformer [51]: It utilizes the 1-D temporal and spatial
EEG channel information due to the discrete sampling inherent
convolutional layers to learn low-level local features and
in the Gumbel-Softmax trick, the combination of efficient,
the self-attention module to extract global correlations in
embedded channel selection and the classification network has
local temporal features.
achieved superior or comparable classification performance,
4) TSANet [52]: A temporal–spectral fused and
indicating its potential practical value.
attention-based deep neural network model.
In the comparison of eight methodologies, our model
5) SedRVFL [53]: A spectral-ensemble deep random vector
emerged as the foremost in driver drowsiness assessment,
functional link network that focuses on feature learning
demonstrating a significant advantage over other methods.
in the frequency domain.
The outcome underscores the efficacy of our framework in
6) TFAC-Net [31]: It first employs continuous
leveraging the Gumbel-Softmax technique to select effective
wavelet transform to generate the corresponding
EEG channels, combined with a separable CNN that can
spectral–temporal representation and then adopts the
robustly capture information from both temporal and spatial
temporal-frequential attention mechanism to reveal
dimensions.
critical feature regions.
7) FGloWD-edRVFL [54]: It utilizes the CNN to extract C. Simulation for Real-Time Detection
features integrated into a deep random vector functional
According to our research of relevant studies, few works
link network.
simulate real-time EEG-based drowsiness detection due to the
8) TFormer [55]: A time–frequency transformer that auto-
complexity of EEG data. Therefore, to verify the feasibility
matically learns global time–frequency patterns from
of the proposed model in practical applications, considering
raw EEG data.
that the artifacts and blink contamination of the raw EEG
In Table III, we can observe that the proposed approach signals require manual correction [56], we employ prepro-
achieved superior performance in EEG-based driver drowsi- cessed, openly accessible EEG signals [27] to engineer a
ness detection, attributed to the learnable capabilities of the GUI that emulates real-time driver drowsiness detection. The
EEG channel selection via Gumbel-Softmax methodology and overall GUI consists of three parts: an administrator login,
the effective spatiotemporal feature extraction by the separable a user login and registration, and the core drowsiness detection
CNN. Compared with the methods based on deep learning, functionality. Considering the degree of importance of these
our model improves the recognition accuracy by an average of three interfaces, we place the presentation of the first two
4.61% and a maximum of 10.34%. It indicates that our frame- interfaces in the Appendix, and the display of the drowsiness
work can effectively combine the channel selection network detection interface is shown in Fig. 6.

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
1868 IEEE SENSORS JOURNAL, VOL. 25, NO. 1, 1 JANUARY 2025

TABLE IV
A BLATION S TUDY FOR C ROSS -S UBJECT C LASSIFICATION
P ERFORMANCE ON THE S USTAINED -ATTENTION
D RIVING TASK DATASET

Fig. 6. GUI for EEG-based real-time driver drowsiness detection sim-


ulation. In this screenshot, the simulated data is derived from the EEG
signal of the participant indexed as 01. The subject’s current drowsiness
state can be successfully predicted by loading the pretrained model.
A detection is performed every 3 s using the EEG data, corresponding
to one event in the outcome. It should be noted that there is a time delay
in the program execution process.

The design of the GUI is implemented using PySide6,


a Python binding library for the development of Qt appli-
cations. Within the detection interface, the user can select From Table IV, it becomes apparent that the four perfor-
an EEG data file by clicking the “Browse” button, after mance evaluation metrics used for classification show varying
which the file path will be displayed in a single-line text degrees of degradation without using the Gumbel-Softmax
box on the left side. Since the EEG signal contains 30 valid trick. Notably, the average Pre has the largest decrease, drop-
channels, in the simulation, to optimize the interface, the user ping by 5.96%, followed by Acc, which decreased by 3.46%.
can display the EEG signal of the current channel using a F1 is then reduced by 2.88%, while Rec remains relatively sta-
drop-down list box. We utilize FP1 as the initial EEG channel ble. It indicates that the Gumbel-Softmax strategy contributes
for display, and users can select other channels to visualize the to improving the performance of the model in the cross-subject
corresponding EEG waveforms. Subsequently, upon clicking classification task. On the other hand, there are no significant
the “Run” button, the upper portion of the drop-down list is for changes in the standard deviations of the four performance
display purposes only, visualizing the selected EEG channel evaluation metrics, indicating that the introduction of the
waveform. Conversely, the lower part presents the drowsiness Gumbel-Softmax trick does not amplify intersubject variability
detection results from the pretrained, categorically labeled as in recognition outcomes, possibly due to the compensatory
“Alert” and “Drowsy,” which remain unaffected by the user’s effect of the separable CNN. In addition, the validation method
channel selection. We consider 3-s EEG data as a sample for used in this article is nested LOSOCV, which has been
drowsiness detection, corresponding to an instance of lane employed in comparable studies. To exclude the possibility
departure event. Considering the multithreaded parallelism, that the validation means bring about differences in the results,
a delay of 0.375 s is set for result updates to better align we further analyze the results, considering that the classifi-
with real-time conditions. When 30-s EEG data are present cation network in the ablation study is similar to the model
within the time window, it corresponds to ten events in the proposed by Cui et al. [17] and that the average accuracies of
output results, and the waveform will dynamically shift with the two are 77.38% and 77.70%, respectively, with a difference
each time update. of only 0.32%, thus showing that the validation strategy exerts
no substantial bearing on the recognition efficacy. Overall, the
introduction of the Gumbel-Softmax method can achieve a
V. D ISCUSSION higher performance of the network for drowsiness detection,
A. Ablation Study and together with the CNN, it constitutes a model with strong
We conduct an ablation study to investigate the impact of the learning capabilities.
Gumbel-Softmax trick in the proposed model on classification
performance. Considering that the Gumbel-Softmax approach B. Impact of Different Selection Neuron Numbers
is designed for EEG channel selection, we remove this com- As is widely recognized, when all channel data of mul-
ponent in the ablation analysis, resulting in a model input tichannel EEG signals are fed into a neural network for
consisting of all EEG channel data. With model parameters training, it not only leads to complex model parameters and
held constant, Table IV reports the results for cross-subject increases computational consumption but also introduces the
driver drowsiness recognition using the same dataset and possibility of data redundancy caused by irrelevant EEG
training strategy. channel information. Therefore, this study analyzes the

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
FENG et al.: REAL-TIME EEG-BASED DRIVER DROWSINESS DETECTION BASED ON CNN 1869

learning strategies, which aims to detect drowsiness in driv-


ing tasks. The model architecture is not only capable of
spatiotemporal sequential processing of EEG signals but can
also be cleverly applied to EEG channel selection using the
reparameterization technique that deals with discreteness prob-
lems. To substantiate the efficacy of the proposed architecture,
we compare it with eight recently competitive research results
on a challenging dataset centered on sustained attention-
driving tasks. The results reveal that our framework obtains
superior performance for cross-subject drowsiness recognition.
Furthermore, we further illustrate that the introduction of the
Gumbel-Softmax trick can enhance classification performance
through ablation analysis and probe the impact of the number
of selected neurons on overall performance. Addressing the
Fig. 7. Comparison of the overall performance for different numbers of
selection neurons. scarcity of studies on real-time simulation of driver drowsiness
recognition, we incorporate the potential advantages of the
parameter sensitivity for the number of selection neurons to model and develop a GUI for simulating real-time detection,
explore the impact of diverse channel subset compositions on which provides the possibility for practical applications in real-
the results of driver drowsiness recognition. world scenarios.
Fig. 7 illustrates the overall performance evaluation of all Although the approach presented in this article surpasses
participants under different numbers of selection neurons. several deep learning models in driver drowsiness detection,
Considering that selecting a small number of neuron intervals there are still some limitations. We summarize the shortcom-
may result in insignificant biases and considerable com- ings of our work from two perspectives.
putational resources are required for each channel number 1) The dataset employed herein comprises solely EEG
calculation. Therefore, we calculate the average performance signals, garnered in a simulated driving setting, rather
at one channel interval and discuss the single-channel case. than a genuine driving environment.
As the number of EEG channels increases from 1 to 8, 2) Our network architecture is relatively shallow and lacks
the mean performance metrics manifest an upward trend. substantial interpretability, thereby potentially hindering
Subsequently, as the number of selected neurons increases, its proficiency in capturing intricate features within EEG
the curve exhibits a slight decline and slow oscillations. When data.
selecting a single channel, the difference between Acc and the
As we look ahead, on the one hand, it is a promising direction
optimal number of channels is the greatest, reaching 9.17%.
to extend our current network into a deeper architecture that
Conversely, embracing the full EEG channels narrows this
integrates multisource signal fusion in real-world scenarios
discrepancy to 3.25%. From this, it can be observed that the
could bolster its data representation learning and develop
network achieves optimal performance when there are eight
more robust frameworks; on the other hand, exploring ways
selected channels.
to impart interpretability to the model enables it to learn
To further reveal which channels are extracted by the chan-
significant biological characteristics, such as the impact of
nel selection strategy, we investigated the principles of the
alpha and beta waves in the EEG frequency band on driver
channel selection layer based on the Gumbel-Softmax. As the
drowsiness detection.
probability vector of a selection neuron approximates a one-
hot vector, the uncertainty of each selection neuron gradually
decreases. Continuous softmax can be approximated as a A PPENDIX
discrete argmax, thereby enabling the output of each selection D RIVER D ROWSINESS D ETECTION L OGIN I NTERFACE
neuron to yield a single channel index [33]. We correlated The login interface developed in our work consists of an
these output channel indices with the EEG electrode positions, administrator login interface and a user login and registration
referencing the layout of EEG electrodes used in the publicly interface. For administrator login, the username and password
available dataset [25] employed for our study. From the are preset in the program and cannot be changed on the GUI.
outcomes of the channel selection layer, we observed that the On the other hand, the user login and registration interface is
high-frequency channels selected were O1, P4, and T6, which, relatively rich in functions. It mandates a registration step for
respectively, correspond to the occipital, temporal, and parietal novel users, underpinned by stringent criteria. These include
lobe regions of the scalp. Notably, existing studies [57], [58] a minimum username length of four characters, while the
suggest that these channels may harbor information pivotal for password must be at least six characters, containing at least
classifying driver drowsiness, reinforcing the relevance of our one upper- and lower-case letter and number. The registration
channel selection findings. data is stored in a local JavaScript object notation (JSON) file.
The username and password are encrypted using secure hash
VI. C ONCLUSION algorithm3-256 (SHA3-256). If there is an input error during
In this work, we propose a novel architecture based on a login, a message dialog box will pop up indicating that the
separable CNN with a Gumbel-Softmax approach for deep username or password is incorrect.

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
1870 IEEE SENSORS JOURNAL, VOL. 25, NO. 1, 1 JANUARY 2025

R EFERENCES [22] T. Pang, S. Zhao, J. Han, S. Zhang, L. Guo, and T. Liu, “Gumbel–
Softmax based neural architecture search for hierarchical brain networks
[1] T. Stewart, “Overview of motor vehicle traffic crashes in 2021,” Nat.
decomposition,” Med. Image Anal., vol. 82, Nov. 2022, Art. no. 102570.
Highway Traffic Saf. Admin., Washington, DC, USA, Tech. Rep. DOT
HS 813 435, Apr. 2023. [23] Z. Hou, J. Ma, W. Yu, Z. Yang, S. Ma, and J. Fan, “Multi-template global
[2] A. Chowdhury, R. Shankaran, M. Kavakli, and M. M. Haque, “Sensor re-detection based on Gumbel–Softmax in long-term visual tracking,”
applications and physiological features in drivers’ drowsiness detection: Appl. Intell., vol. 53, no. 18, pp. 20874–20890, Sep. 2023.
A review,” IEEE Sensors J., vol. 18, no. 8, pp. 3055–3067, Apr. 2018. [24] F. Ye and A. G. Bors, “Deep mixture generative autoencoders,” IEEE
[3] X. Li, S. Lai, and X. Qian, “DBCFace: Towards pure convolutional neu- Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5789–5803,
ral network face detection,” IEEE Trans. Circuits Syst. Video Technol., Oct. 2022.
vol. 32, no. 4, pp. 1792–1804, Apr. 2022. [25] Z. Cao, C.-H. Chuang, J.-K. King, and C.-T. Lin, “Multi-channel EEG
[4] B. Mandal, L. Li, G. S. Wang, and J. Lin, “Towards detection of bus recordings during a sustained-attention driving task,” Sci. Data, vol. 6,
driver fatigue based on robust visual analysis of eye state,” IEEE Trans. no. 1, p. 19, Apr. 2019.
Intell. Transp. Syst., vol. 18, no. 3, pp. 545–557, Mar. 2017. [26] R. Langner, M. B. Steinborn, A. Chatterjee, W. Sturm, and K. Willmes,
[5] Z. Gao et al., “EEG-based spatio-temporal convolutional neural network “Mental fatigue and temporal preparation in simple reaction-time per-
for driver fatigue evaluation,” IEEE Trans. Neural Netw. Learn. Syst., formance,” Acta Psycholog., vol. 133, no. 1, pp. 64–72, Jan. 2010.
vol. 30, no. 9, pp. 2755–2763, Sep. 2019. [27] Z. Cao, C. H. Chuang, J. K. King, and C. T. Lin, “Multi-
[6] M. Shen, B. Zou, X. Li, Y. Zheng, L. Li, and L. Zhang, “Multi-source channel EEG recordings during a sustained-attention driving task
signal alignment and efficient multi-dimensional feature classification in (pre-processed dataset),” Sci. Data, vol. 6, no. 1, p. 19, Apr. 2019, doi:
the application of EEG-based subject-independent drowsiness detection,” 10.6084/m9.figshare.7666055.v3.
Biomed. Signal Process. Control, vol. 70, Sep. 2021, Art. no. 103023. [28] C.-S. Wei, Y.-T. Wang, C.-T. Lin, and T.-P. Jung, “Toward drowsiness
[7] Y. Gu, Y. Jiang, T. Wang, P. Qian, and X. Gu, “EEG-based driver mental detection using non-hair-bearing EEG-based brain-computer interfaces,”
fatigue recognition in COVID-19 scenario using a semi-supervised IEEE Trans. Neural Syst. Rehabil. Eng., vol. 26, no. 2, pp. 400–406,
multi-view embedding learning model,” IEEE Trans. Intell. Transp. Feb. 2018.
Syst., vol. 25, no. 1, pp. 859–868, Jan. 2024. [29] C.-S. Wei, Y.-T. Wang, C.-T. Lin, and T.-P. Jung, “Toward non-hair-
[8] J. R. Paulo, G. Pires, and U. J. Nunes, “Cross-subject zero calibration bearing brain-computer interfaces for neurocognitive lapse detection,”
driver’s drowsiness detection: Exploring spatiotemporal image encoding in Proc. 37th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC),
of EEG signals for convolutional neural network classification,” IEEE Aug. 2015, pp. 6638–6641.
Trans. Neural Syst. Rehabil. Eng., vol. 29, pp. 905–915, 2021. [30] J. Cui et al., “A compact and interpretable convolutional neural network
[9] X. Zhang et al., “Fatigue detection with covariance manifolds of for cross-subject driver drowsiness detection from single-channel EEG,”
electroencephalography in transportation industry,” IEEE Trans. Ind. Methods, vol. 202, pp. 173–184, Jun. 2022.
Informat., vol. 17, no. 5, pp. 3497–3507, May 2021. [31] P. Gong, P. Wang, Y. Zhou, X. Wen, and D. Zhang, “TFAC-Net: A
[10] Y. Yang, Z. Gao, Y. Li, and H. Wang, “A CNN identified by rein- temporal-frequential attentional convolutional network for driver drowsi-
forcement learning-based optimization framework for EEG-based state ness recognition with single-channel EEG,” IEEE Trans. Intell. Transp.
evaluation,” J. Neural Eng., vol. 18, no. 4, May 2021, Art. no. 046059. Syst., vol. 25, no. 7, pp. 7004–7016, Jul. 2024.
[11] X. Gu et al., “EEG-based brain-computer interfaces (BCIs): A survey of [32] A. Abid, M. Balin, and J. Zou, “Concrete autoencoders for differentiable
recent studies on signal sensing technologies and computational intel- feature selection and reconstruction,” in Proc. Int. Conf. Mach. Learn.,
ligence approaches and their applications,” IEEE/ACM Trans. Comput. vol. 97, Jun. 2019, pp. 444–453.
Biol. Bioinf., vol. 18, no. 5, pp. 1645–1666, Sep. 2021. [33] T. Strypsteen and A. Bertrand, “End-to-end learnable EEG channel
[12] A. Ahmadi, H. Bazregarzadeh, and K. Kazemi, “Automated detection selection for deep neural networks with Gumbel–Softmax,” J. Neural
of driver fatigue from electroencephalography through wavelet-based Eng., vol. 18, no. 4, Aug. 2021, Art. no. 0460a9.
connectivity,” Biocybern. Biomed. Eng., vol. 41, no. 1, pp. 316–332, [34] C. J. Maddison, A. Mnih, and Y. W. Teh, “The concrete distribution: A
Jan. 2021. continuous relaxation of discrete random variables,” in Proc. Int. Conf.
[13] G. Sikander and S. Anwar, “Driver fatigue detection systems: A Learn. Represent., Nov. 2016, p. 712.
review,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 6, pp. 2339–2352, [35] E. J. Gumbel, “Statistics of extremes,” Annu. Rev. Stat. Appl., vol. 2,
Jun. 2019. pp. 203–235, Apr. 2015.
[14] M. Ogino and Y. Mitsukura, “Portable drowsiness detection through use [36] H. Sun, J. Ren, H. Zhao, P. Yuen, and J. Tschannerl, “Novel Gumbel–
of a prefrontal single-channel electroencephalogram,” Sensors, vol. 18, Softmax trick enabled concrete autoencoder with entropy constraints for
no. 12, p. 4477, Dec. 2018. unsupervised hyperspectral band selection,” IEEE Trans. Geosci. Remote
[15] M. Shahbakhti et al., “Simultaneous eye blink characterization and Sens., vol. 60, 2022, Art. no. 5506413.
elimination from low-channel prefrontal EEG signals enhances driver [37] R. Flamary, N. Jrad, R. Phlypo, M. Congedo, and A. Rakotomamonjy,
drowsiness detection,” IEEE J. Biomed. Health Informat., vol. 26, no. 3, “Mixed-norm regularization for brain decoding,” Comput. Math. Meth-
pp. 1001–1012, Mar. 2022. ods Med., vol. 2014, pp. 1–13, Apr. 2014.
[16] L.-L. Chen, Y. Zhao, J. Zhang, and J.-Z. Zou, “Automatic detection [38] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional
of alertness/drowsiness from physiological signals using wavelet-based neural networks: Analysis, applications, and prospects,” IEEE Trans.
nonlinear features and machine learning,” Expert Syst. Appl., vol. 42, Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2022.
no. 21, pp. 7344–7355, Nov. 2015.
[39] E. Fernandez-Blanco, D. Rivero, and A. Pazos, “EEG signal processing
[17] J. Cui, Z. Lan, O. Sourina, and W. Müller-Wittig, “EEG-based cross-
with separable convolutional neural network for automatic scoring of
subject driver drowsiness recognition with an interpretable convolutional
sleeping stage,” Neurocomputing, vol. 410, pp. 220–228, Oct. 2020.
neural network,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 10,
pp. 7921–7933, Oct. 2022. [40] H. Qi, T. Xu, G. Wang, Y. Cheng, and C. Chen, “MYOLOv3-tiny: A
new convolutional neural network architecture for real-time detection of
[18] C. Chen, Z. Ji, Y. Sun, A. Bezerianos, N. Thakor, and H. Wang, “Self-
track fasteners,” Comput. Ind., vol. 123, Dec. 2020, Art. no. 103303.
attentive channel-connectivity capsule network for EEG-based driving
fatigue detection,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 31, [41] I. M. Elfadel, “On the stability of analog ReLU networks,” IEEE
pp. 3152–3162, 2023. Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 40, no. 11,
[19] Y. Zhang, R. Guo, Y. Peng, W. Kong, F. Nie, and B.-L. Lu, “An auto- pp. 2426–2430, Nov. 2021.
weighting incremental random vector functional link network for EEG- [42] S. Ioffe and C. Szegedy, “Normalization: Accelerating deep network
based driving fatigue detection,” IEEE Trans. Instrum. Meas., vol. 71, training by reducing internal covariate shift,” in Proc. Mach. Learn.
pp. 1–14, 2022. Res., Jun. 2015, pp. 448–456.
[20] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with [43] G. C. Cawley and N. L. Talbot, “On over-fitting in model selection and
Gumbel–Softmax,” in Proc. Int. Conf. Learn. Represent., vol. 1611, subsequent selection bias in performance evaluation,” J. Mach. Learn.
Nov. 2016, p. 1144. Res., vol. 11, pp. 2079–2107, Jul. 2010.
[21] I. A. M. Huijben, W. Kool, M. B. Paulus, and R. J. G. van Sloun, “A [44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
review of the Gumbel-max trick and its extensions for discrete stochas- in Proc. Int. Conf. Learn. Represent., vol. 1412, Dec. 2014, p. 6980.
ticity in machine learning,” IEEE Trans. Pattern Anal. Mach. Intell., [45] L. Prechelt, “Automatic early stopping using cross validation: Quantify-
vol. 45, no. 2, pp. 1353–1371, Feb. 2023. ing the criteria,” Neural Netw., vol. 11, no. 4, pp. 761–767, Jun. 1998.

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.
FENG et al.: REAL-TIME EEG-BASED DRIVER DROWSINESS DETECTION BASED ON CNN 1871

[46] D. J. Hand, P. Christen, and N. Kirielle, “F*: An interpretable transfor- Xiaoping Wang (Senior Member, IEEE) received the B.S. and M.S.
mation of the F-measure,” Mach. Learn., vol. 110, no. 3, pp. 451–456, degrees in automation from Chongqing University, Chongqing, China,
Mar. 2021. in 1997 and 2000, respectively, and the Ph.D. degree in systems engi-
[47] M. Ahmed, S. Masood, M. Ahmad, and A. A. A. El-Latif, “Intelligent neering from Huazhong University of Science and Technology, Wuhan,
driver drowsiness detection for traffic safety based on multi CNN deep China, in 2003.
model and facial subsampling,” IEEE Trans. Intell. Transp. Syst., vol. 23, Since 2011, she has been a Professor with the School of Artifi-
no. 10, pp. 19743–19752, Oct. 2022. cial Intelligence and Automation, Huazhong University of Science and
[48] R. Li, L. Wang, and O. Sourina, “Subject matching for cross-subject Technology. Her current research interests include memristors and their
EEG-based recognition of driver states related to situation awareness,” applications to memory storage, modeling, and simulation.
Methods, vol. 202, pp. 136–143, Jun. 2022.
[49] Z. Zhuang, Y.-K. Wang, Y.-C. Chang, J. Liu, and C.-T. Lin, “A
connectivity-aware graph neural network for real-time drowsiness
classification,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 32,
pp. 83–93, 2024.
[50] X. Qin, Y. Niu, H. Zhou, X. Li, W. Jia, and Y. Zheng, “Driver drowsiness
EEG detection based on tree federated learning and interpretable net-
Jialan Xie received the M.S. and Ph.D. degrees from the Col-
work,” Int. J. Neural Syst., vol. 33, no. 3, Mar. 2023, Art. no. 2350009.
lege of Electronic and Information Engineering, Southwest University,
[51] Y. Song, Q. Zheng, B. Liu, and X. Gao, “EEG conformer: Convolutional
Chongqing, China, in 2018 and 2024, respectively.
transformer for EEG decoding and visualization,” IEEE Trans. Neural
Her current research interests include affective computing, virtual
Syst. Rehabil. Eng., vol. 31, pp. 710–719, 2023.
reality, and machine learning.
[52] G. Fu, Y. Zhou, P. Gong, P. Wang, W. Shao, and D. Zhang, “A
temporal–spectral fused and attention-based deep model for automatic
sleep staging,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 31,
pp. 1008–1018, 2023.
[53] R. Li, R. Gao, P. N. Suganthan, J. Cui, O. Sourina, and L. Wang, “A
spectral-ensemble deep random vector functional link network for pas-
sive brain–computer interface,” Expert Syst. Appl., vol. 227, Oct. 2023,
Art. no. 120279. Wanqing Liu received the M.S. degree from the College of Electronic
[54] R. Li, R. Gao, L. Yuan, P. N. Suganthan, L. Wang, and O. Sourina, and Information Engineering, Southwest University, Chongqing, China,
“An enhanced ensemble deep random vector functional link network for in 2024.
driver fatigue recognition,” Eng. Appl. Artif. Intell., vol. 123, Aug. 2023, Her main research interests include affective computing and
Art. no. 106237. EEG-based emotion recognition.
[55] R. Li, M. Hu, R. Gao, L. Wang, P. N. Suganthan, and O. Sourina,
“TFormer: A time–frequency transformer with batch normalization for
driver fatigue recognition,” Adv. Eng. Informat., vol. 62, Oct. 2024,
Art. no. 102575.
[56] Y.-J. Liu, M. Yu, G. Zhao, J. Song, Y. Ge, and Y. Shi, “Real-time movie-
induced discrete emotion recognition from EEG signals,” IEEE Trans.
Affect. Comput., vol. 9, no. 4, pp. 550–562, Oct. 2018.
[57] Y. Liu, Z. Lan, J. Cui, O. Sourina, and W. Müller-Wittig, “Inter-subject Yinghao Qiao received the M.S. degree from the College of Electronic
transfer learning for EEG-based mental fatigue recognition,” Adv. Eng. and Information Engineering, Southwest University, Chongqing, China,
Informat., vol. 46, Oct. 2020, Art. no. 101157. in 2024.
[58] I. Stancin, M. Cifrek, and A. Jovic, “A review of EEG signal features Her main research interests include deep learning and EEG-based
and their application in driver drowsiness detection systems,” Sensors, emotion recognition.
vol. 21, no. 11, p. 3786, May 2021.

Guangyuan Liu received the B.E. degree in physics from Southwest


University, Chongqing, China, in 1983, and the M.A. and Ph.D. degrees
Weibin Feng received the M.S. degree from the College of Electronic in electronic engineering from the University of Electronic Science and
and Information Engineering, Southwest University, Chongqing, China, Technology, Chengdu, China, in 1995 and 1999, respectively.
in 2023. He is currently pursuing the Ph.D. degree with the School of He is currently a Professor with Chongqing Key Laboratory of Non-
Artificial Intelligence and Automation, Huazhong University of Science linear Circuits and Intelligent Information Processing and the Assistant
and Technology, Wuhan, China. President with Southwest University. His research interests include
His current research interests include graph neural networks, affective computing, neural networks, computational intelligence, and
computational intelligence, and event-based vision. fuzzy systems and applications.

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:12:03 UTC from IEEE Xplore. Restrictions apply.

You might also like