EEG-Signals Based Cognitive Workload Detection of Vehicle Driver Using Deep Learning
EEG-Signals Based Cognitive Workload Detection of Vehicle Driver Using Deep Learning
Abstract— Vehicle driver’s ability to maintain optimal previous researches come up with a system that can handle raw
performance and attention is essential to ensure the safety of the data directly. Thus, valuable information might be discarded
traffic. Electroencephalography (EEG) signals have been proven during the pre-processing.
to be effective in evaluating human’s cognitive state under specific The purpose of this paper is to introduce an end-to-end deep
tasks. In this paper, we propose the use of deep learning on EEG
signals to detect the driver’s cognitive workload under high and
neural network model that can directly infer cognitive workload
low workload tasks. Data used in this research are collected from raw EEG, where signal pre-processing as well as
throughout multiple driving sessions conducted on a high fidelity conventional feature design are not required.
driving simulator. Preliminary experimental results conducted on To evaluate the proposed model, EEG signal recordings have
only 4 channels of EEG show that the proposed system is capable been carried out on a subject driving a vehicle in a relatively
of accurately detecting the cognitive workload of the driver with realistic simulated environment under two different types of
an enormous potential for improvement. cognitive workloads, high workload and low workload.
This paper is organized as follows: Section II explains the
Keywords— Deep Learning, EEG, Neural Networks, Cognitive model used to classify the data, Section III introduces how the
Workload, Driving, Stress data are collected and prepared, Section IV evaluates how the
model is trained with the results of the proposed model, and the
I. INTRODUCTION conclusion is given in Section V.
Maintaining safe levels of cognitive workload is extremely
crucial to ensure optimal performance and attention whilst II. MODEL
driving automobiles. Different methods have been widely One of the major advantages of convolutional neural
investigated to monitor driver’s cognitive workload including networks (CNN) is the ability to extract, learn and generalize
heart rate monitoring, galvanic skin response, facial expression features without pre-processing. In this study, we propose the
and so on. One of the most popular measures for assessing the use of deep CNN to predict the cognitive workload of the driver
mental state of humans is electroencephalography (EEG) by using raw EEG signals with no additional pre-processing.
signals. EEG has reportedly shown success in evaluating the
mental state such as drowsiness [1], mind wandering [2] and A. EEG
alertness [3]. Monitoring electrical brain activity in a non-invasive means
Despite being an active research area, understanding and is referred to as EEG in clinical context. Brain activities
applying EEG are still limited due to the lack of true produce electrical charges caused by the neurons inside the
understanding of the brain’s activities which prevents use of brain, and voltage fluctuations are measured using metal pieces
good features from being engineered. Besides, EEG signals are called electrodes and a reference electrode attached to the head
strongly affected usually by noise and interference. Accurate and scalp [5]. These voltages pass through an amplifier for
reading requires delicate equipment and sealing. analysis. Figure 1 shows the international 10-20 system that is
Conventional analysing approaches make extensive use of used to describe the location of the electrodes on the head.
Fourier transform in order to decompose EEG signal into Thanks to recent advancements, compact lightweight devices
multiple frequencies. Only well-known frequency is then used have been introduced into the market to allow a more consumer
for feature design process. For example, the alpha band (8 to friendly means to monitor EEG signals.
15[Hz]) is correlated with relaxing, and the beta band (16 to In this research, EEG recording has been done using Muse
31[Hz]) associates with mental stress. Pre-processing methods [6] headband. Figure 1 also explains the electrodes used in this
such as Butterworth bandpass filter and stationary wavelet device where 4-channel configuration has been utilized; two
transform filters [4] are used to remove high and low frequency dry electrodes on the forehead AF7 (Left) and AF8 (Right), two
noises [3]. Several efforts in applying deep learning on EEG behind the ears TP9 (Left) and TP10 (Right), and a reference
have been carried out [1, 8, 9, 10]. However, none of the electrode (Fpz) in the middle of the forehead above the nasion.
18749 × 4
4 × 4 × 4 – strides 2
9373 × 4
4 × 4 × 4 – strides 2
4685 × 4
Figure 1. Locations of Scalp Electrodes in International 10-20 System [6]
8 × 4 × 4 – strides 4
B. Network Architecture
CNN is inspired by biological visual system [7] where 1170 × 4
different levels of visual hierarchy are handled by different
parts of the brain. Analogically, multi-layers of CNN generalize 8 × 4 × 4 – strides 4
visual features in multiple levels. Since each layer of CNN acts
as a filter, it is expected that CNN can be also used to extract 291 × 4
high level features even in the case of EEG signal.
Different from images, EEG comprises multiple time series 16 × 4 × 8 – strides 8
from each channel. For that reason, it is essential to design CNN
filters to stride along time direction only. Thus, it must fully
cover all other dimensions of input data. 35 × 8
The architecture of the proposed network is visualized in
Figure 1. As shown in the figure, in the case of 150[sec] of EEG 16 × 4 × 8 – strides 8
data as the input window size, it consists of 37500 data samples
for each channel, thus the raw EEG data of 37500 × 4, is used
3 ×8
to the input. The optimum input window size is discussed in
Section IV. Then, seven convolution layers transform raw EEG flat-out
data down to 3 × 8 features. These features are then flattened 24
and processed by 3 layers of fully connected (FC) classifier fc 1
block. All CNN filter strides along time dimension. 64
In a proposal such as [9], separated CNN networks are used
fc 2
on each band of EEG channel independently. However, our
CNN filters are applied to all EEG channels in the same time. 32
This method ensures that all possible combination features are
captured. Small filter size and stride together with high number fc 3 2
of CNN layers also ensure that features from multiple
hierarchical levels are extracted.
The FC block outputs a two-dimension vector. Softmax Softmax
function is used to interpret this vector into the probability of
each cognitive workload, one for high cognitive workload and Figure 2. Proposed Deep Neural Network Model in the case of 150[sec] of
one for low cognitive workload. Except the final layer of FC EEG Input (37500 Samples in Each Channel)
block, all layers in this deep network utilize rectified linear unit
(RELU) [8] activation function. III. DATA COLLECTION
Experiment recordings have been conducted on a popular
driving simulation video game (GTAV free driving mode) that
is highly capable of representing real life driving situations.
Other used equipment are Logitech G27 driving wheel and an
ultrawide (21:9) curved monitor as shown in Figure 3.
B. Experiment
1) Training the model: The model is trained for a total of 10
epochs using RMSProp optimizer with a learning rate of 0.002.
Each epoch represents the training of the model throughout all
the data. Input data are divided into batches of 64. During this
training process, 50% of drop-out is applied to all layers (except
the last layer) within FC block. Figure 4 shows the learning
curve of the best resulting window size of 150[sec] alongside
Figure 3 Representation of the Used Simulation Environment with an with the evaluation curve.
Example of High Workload (Left) and Low Workload (Right)
2) Evaluating the model: For every iteration of the training
A. Driving Sessions process, random batches of the test data and their corresponding
labels are used for cross-validating to evaluate the model’s
For the purpose of evaluating the cognitive workload of the performance. Figure 4 also shows the evaluation where a slight
subject, we choose two different types of driving sessions. The overfitting can be observed towards the end of the training. This
first type imposes high cognitive workload, i.e. dense traffic can be potentially resolved by collecting and adding more data
situation and complex metropolitan scene, and the opposite for into the training process.
the second type, i.e. empty roads and monotonous highway.
The subject is advised to drive as if they would in actual
situation. We implicitly assume that driving in the first type is
more mentally demanding than the second type.
Each of the two types of session was recorded for an interval
of 15 to 30 minutes on a span of one month. Data collections
were randomly done in daytime and night-time with random
intervals between them, and each session had a different
random route to remove any bias. The subject was required to
disengage from the test-bed upon completing each session.
In this experiment, one subject of male with age 29 was Figure 4. Learning Curve for Training and Evaluation of the Model with the
Best Window Size (150 [sec])
joined, who drives frequently with 11 years of driving
experience. The number of sessions for the experiments is 24 C. Results and Discussion
where 12 is for high workload and 12 is for low workload.
The proposed model in this study is composed of 7 CNN
B. EEG Sampling Rate layers that can figure the necessary transformations to extract
Data were collected at 256[Hz] sampling rate by Muse as features, and 3 fully connected layers that can accurately
mentioned in Section II A. Hence, the data in one second classify the cognitive workload of the driver. This end-to end
consist of 256 samples, and that of 15 to 30 minutes session network utilizes only raw EEG signals from 4 channels as it’s
consist of 230,000 to 460,000 samples for each of the 4 input. Table 1 summarises the achieved evaluation accuracies
channels. when running the model on different window sizes of the raw
EEG data. As shown in the table, 150[sec] window size
IV. EVALUATION achieves the best accuracy in classifying the cognitive
workload of the driver between high and low workloads. As an
A. Data Preparation additional metric to evaluate our model, classification accuracy
The data of all the sessions are used for training the model of high cognitive workload is 95.76%, and 92.57% for low
except for that of the last session which is used for the cognitive workload sessions.
evaluation. By doing so, real-life situation, where the model The proposed network is originally designed and optimized
requires to estimate the workload from future data by learning for 150[sec] window size. Therefore, there could be any other
previous data, can be simulated. Thus, over 6 million samples optimum configurations in the network structure for other
for each of the 4 channels are used for the training and 460 window sizes. In this experiment, the 7th convolutional layer
thousand for the evaluation. was removed for the 30 and 60 [sec] window size.
In order to train our proposed model, raw EEG data are
normalized by z-score for each of the 4 channels. Slicing TABLE 1 EEG DATA WINDOW SIZES USED TO EVALUATE THE MODEL AND
THE CORRESPONDING EVALUATION ACCURACIES.
window approach is utilized to prepare the data for training.
Table 1 explains the different slicing window size used for EEG Data Window Size Input Shape Evaluation Accuracy
training the model as well as the shape of the input data, where 30 Seconds 7680 × 4 87.25%
each slice is denoted by the time segments of the EEG data. The 60 Seconds 15360 × 4 92.02%
90 Seconds 23040 × 4 88.38%
slicing windows are overlapped with 1/256[sec] step. 120 Seconds 30720 × 4 82.24%
150 Seconds 38400 × 4 95.31%
180 Seconds 46080 × 4 84.69%
Conventional approaches to classify EEG signals consist of [7] K. Fukushima, “Neocognitron: A self-organizing neural network model
for a mechanism of pattern recognition unaffected by shift in position,”
decomposing the signals to extract features to be used for
Biol. Cybern., vol. 36, no. 4, pp. 193–202, 1980.
classification. Attempts of EEG analysis to utilize deep [8] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted
learning with minimal pre-processing are shown in Table 2. [9] Boltzmann Machines,” Proc. 27th Int. Conf. Mach. Learn., no. 3, pp.
evaluates drivers’ cognitive performance in a simulated 807–814, 2010.
[9] M. Hajinoroozi, Z. Mao, T. P. Jung, C. T. Lin, and Y. Huang, “EEG-
environment using filtered frequency EEG. In [1], FFT is used
based prediction of driver’s cognitive performance by deep
before classifying the signals to detect driver’s drowsiness, and convolutional neural network,” Signal Process. Image Commun., vol. 47,
[10] also proposes a model that can predict right and left hand pp. 549–555, 2016.
movements with frequency filtered EEG signals. [11] apply a [10] Z. Tang, C. Li, and S. Sun, “Single-trial EEG classification of motor
imagery using deep convolutional neural networks,” Optik (Stuttg)., vol.
spatial filter before classifying pathological from normal EEG
130, pp. 11–18, 2017.
recordings. [11] R. T. Schirrmeister, L. Gemein, K. Eggensperger, F. Hutter, and T. Ball,
This study does not impose in any way a direct comparison “Deep learning with convolutional neural networks for decoding and
with the distinguished previous works because the used data, visualization of EEG pathology,” 2017.
experimental conditions, classification targets are different in
Mohammad A. Almogbel (S’14) received his
each, but rather explore and introduce the potential of using bachelor’s degree in Information Systems from King
deep CNN architecture in classifying raw EEG signals without Saud University, Riyadh, Saudi Arabia in 2009. He
any pre-processing. joined King Abdul-Aziz City for Science and
Technology in Saudi Arabia as a researcher in the same
year and received a scholarship to complete his graduate
TABLE 2 RECENT RESEARCHES ON EEG ANALYSIS WITH DEEP LEARING school in 2010. He then received master’s degree in
computer science from Waseda University in 2014 and
Ref. Pre-processing Feature-Extraction Classifier Accuracy he continued to pursue his Ph.D. since then. He is a
member of IEEE, ITS and JSAE.
[9] Frequency filtering CNN 1-layer ANN 86.06%
[1] FFT N/A 1-layer ANN 86.50% Anh H. Dang (S’09) received his bachelor degree in
business administration, information & communication
[11] Spatial Filter CNN 1-layer ANN 84.80% technology from Ritsumeikan Asia Pacific University
[10] Frequency filtering CNN 1-layer ANN 86.41% (Beppu, Oita, Japan) in 2010. He then received the
master degree in computer science from Waseda
University (Shinjuku, Tokyo, Japan) in 2012. Since 2012,
V. CONCLUSION he is a Ph.D. candidate at Waseda University. He is a
In this research, we propose an end-to-end deep neural member of IEEE, ACM, and IEICE. His research
network that can accommodate raw EEG signals from 4 interests are machine learning, artificial intelligence, and
computer vision.
channels collected within one month from numerous driving
sessions as it’s input. The results show that the proposed model Wataru Kameyama (M’86) received the bachelor’s,
is capable of successfully generalizing the EEG signals and master’s, and D.Eng. degrees from the School of Science
making highly accurate classification of the driver’s cognitive and Engineering, Waseda University, in 1985, 1987, and
1990, respectively. He joined ASCII Corporation in
workload. 1992, and was transferred to France Telecom CCETT
Future works include testing the proposed model with public from 1994 to 1996 for his secondment. After joining
available data sets. Further investigation will be carried out by Waseda University as an Associate Professor in 1999, he
collecting more data from more subjects with different driving has been a Professor with the Department of
Communications and Computer Engineering, School of
experiences and classifying more than two types of cognitive Fundamental Science and Engineering, Waseda University, since 2014. He has
workloads. been involved in MPEG, MHEG, DAVIC, and the TV-Anytime Forum
activities. He was a Chairman of ISO/IECTC1/SC29/WG12, and a Secretariat
REFRENCES and Vice Chairman of the TV-Anytime Forum. He is a member of IEICE, IPSJ,
[1] I. Belakhdar, W. Kaaniche, R. Djmel, and B. Ouni, “Detecting driver ITE, IIEEJ, and ACM. He received the Best Paper Award of Niwa-Takayanagi
drowsiness based on single electroencephalography channel,” 13th Int. in 2006, the Best Author Award of Niwa-Takayanagi in 2009 from the Institute
Multi-Conference Syst. Signals Devices, SSD 2016, pp. 16–21, 2016. of Image Information and Television Engineers, and the International
[2] C. L. Baldwin, D. M. Roberts, D. Barragan, J. D. Lee, N. Lerner, and J. Cooperation Award from the ITU Association of Japan in 2012.
S. Higgins, “Detecting and Quantifying Mind Wandering during
Simulated Driving,” Front. Hum. Neurosci., vol. 11, no. August, pp. 1–
15, 2017.
[3] L. Bi, R. Zhang, and Z. Chen, “Study on Real-time Detection of
Alertness Based on EEG,” 2007 IEEE/ICME Int. Conf. Complex Med.
Eng., pp. 1490–1493, 2007.
[4] S. S. Daud and R. Sudirman, “Butterworth Bandpass and Stationary
Wavelet Transform Filter Comparison for Electroencephalography
Signal,” Proc. - Int. Conf. Intell. Syst. Model. Simulation, ISMS, vol.
2015–Octob, pp. 123–126, 2015.
[5] S. Donald L. and F. H. L. da Silva, Electroencephalography: Basic
Principles, Clinical Applications, and Related Fields. Lippincott
Williams & Wilkins, 2011.
[6] Interaxon, “Muse: the brain sensing headband,” Tech. Specif. Valid. Res.
use, pp. 4–9, 2017.