Instruction Set Extension of A RiscV Based SoC For Driver Drowsiness Detection
Instruction Set Extension of A RiscV Based SoC For Driver Drowsiness Detection
June 6, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3177743
ABSTRACT This paper describes the design and implementation of a driver drowsiness detection (DDD)
system using a modified RiscV processor on a field-programmable gate array (FPGA). To detect drowsiness,
Convolutional Neural Network (CNN) is implemented on a RiscV processor. The CNN is trained to classify
four primary driver’s expressions, including distraction, natural, sleep, and yawn. The trained CNN accuracy
is 81.07% on validation data. Furthermore, due to FPGA memory limitations, written C code for the trained
CNN is optimized in numerous ways. Optimizations include the usage of dynamic fixed-point data types
and dynamic memory allocations. On the other hand, the processor is modified by adding three custom
instructions, including custom store, conv2d(2 × 2), and multiply and accumulation (MAC) to enhance
the computation rate. As a result, the processor with custom store, conv2d(2 × 2), and MAC as custom
instructions achieved the best result in terms of latency, with an improvement factor of 1.7 over the base
processor and 1.25 over the processor with only custom store and multiply and accumulation (MAC) in
exchange of slight increase in area.
INDEX TERMS Convolutional neural network, driver drowsiness detection, FPGA, hardware implementa-
tion, modified RiscV processor.
is a noticeable sign of sleepiness. As an immediate result, is recorded, and via k-nearest neighbor and support vector
designing a system that can detect driver drowsiness is machine (SVM) 95.8% and 93.8% accuracy were achieved.
highly recommended. This paper proposed decreasing car In [9], with ECG signal and the same classifiers of [8]
accidents by considering driver drowsiness and distraction. driver drowsiness was detected, and above 90% accuracy
The hardware architecture of our designed system is such that was achieved for only two class classifications. In [10], the
it can form a system on a chip (SoC) compatible with future validity of detecting driver drowsiness by measuring the
ASIC implementation. Moreover, for the embedded proces- muscle activity associated with steering wheel grip with
sor instruction set architecture (ISA), RiscV is considered [5]. the help of EMG was analyzed. The results indicated that
Due to its simplicity and being completely open-source, it is the validity of the proposed algorithm, needs to be more
easily possible to modify and change the ISA to make the robustly tested with additional subjects and with different
processor optimized for any application. According to the experimental designs. In [11], five methods for drowsiness
research firm Semico, the number of chips that include at least detection, i.e. subjective reporting, driver biological features,
some RISC-V technology will grow 73.6 percent per year to driver physical features, vehicular features while driving,
2027, when there will be some 25 billion AI chips produced, and hybrid features are discussed and compared, which
accounting for US $291 billion in revenue [6]. On the indicates all methods have pros and cons, and even though
other hand, due to utilizing an embedded processor, the some of the stated methods are very complex and need
implementation of CNN becomes less challenging because many sensors; they still need improvement. As a result,
of the ability to implement the designed CNN in C/C++. although the accuracy of these systems is considerably high in
Additionally, codes in C/C++ can be compiled and translated certain situations, the usage of numerous sensors sometimes
into RiscV machine-level code due to a reliable RiscV GNU simultaneously, and the increased complexity of the system
Tool-chain found in [7]. Optimizations on the software side are the main downsides of this method. Moreover, the cost
are done by managing memory efficiently, as an instance in of these methods is not affordable, and also the flexibility of
the compiler, appropriate linkers, and decreasing the number modifying these systems to add other options is poor.
of parameters. Likewise, optimizations on the written code The other method for detecting drowsiness is image
are considered due to the shortage of memory resources on processing. These methods are great for high-level and
low-cost FPGAs. These optimizations reduce the size of the software approaches since they are hard to implement on
machine-level code. hardware without the help of high-level design languages
The organization of the paper is as follows. Section II pro- such as Matlab or Python. In [12], an intelligent surveillance
vides information regarding previous research in designing system is designed based on signal processing and embedded
driver drowsiness detection systems and RiscV processors. tools, which has three interlinked modules, driver fatigue
Section III describes the software design of the CNN and the detection, alcohol content detection, and vehicular crash
dataset utilized for system training and validation. Moreover, detection. Although the system is complete and practical,
the model’s modifications for software optimizations on the the brain of the system is a Raspberry Pi microcomputer
implemented CNN are explained in this section. Section IV which is not flexible and far from optimized for such a
provides information concerning hardware implementation huge task. In [13] however, an FPGA-based system for
and optimizations. In Section V, results and achievements drowsiness detection is proposed. The system utilizes two
of the paper are illustrated. Finally, section VI concludes the signs to detect the driver’s drowsiness, closed eye, and open
paper. mouth. Although the system is on FPGA, which means it
can be designed to be optimum in terms of power and delay
II. RELATED WORK and suitable for ASIC implementations, the use of high-level
By increasing the number of cars and trucks, the rate design tools, especially the use of Matlab HDL coder, reduces
of car accidents increases regretfully. In these accidents, the hardware designers’ control on the final architecture.
driver drowsiness has the most dominant role. Subsequently, In [14], a real-time system that utilizes computerized camera
researchers in both industry and academic endeavor to reduce to automatically track and process the driver’s eye using
the rate of car accidents by designing a system to detect Python, dlib, and OpenCV is proposed. With Python and
drowsiness. high-level libraries like OpenCV and dlib, small bare-metal
Between researches in the area of driver drowsiness processors cannot host this method.
detection, a significant number of them utilize biomed- Another method that can be used for detecting drowsiness
ical methods to detect drowsiness. In [1], the authors is neural networks especially convolutional neural networks
provide valuable information about psychological signals (CNNs). In [15] DriCare, a driver drowsiness detection
that can detect a driver’s drowsiness level. Authors also method is proposed by using face landmark detection and
measured drowsiness levels utilizing various methods like the help of CNN. Although the DriCare method accuracy
Electroencephalography (EEG), Electrocardiography (ECG), is about 92%, but the system is tested on an Intel Core
Electrooculogram (EOG), and Electromyogram (EMG). The i7 CPU, which is a supercomputer in comparison with small
measured accuracy of some methods like EEG and ECG is embedded processors. In [16], a real-time model based on
above 95%. In [8] utilizing an EEG headset, brain activity deep neural networks on an embedded processor is proposed.
FIGURE 1. Extracted frames of driver expressions (a) Normal, (b) Distracted, (c) Sleep, (d) Yawn, (e) Gaussian Noise, (f) Brighter, (g) Translation,
(h) Rotation.
Model Summary 1: Implemented CNN Structure In this paper, due to the hardware limitations, the designed
Conv2D(4, (2,2), input shape = (100,100,1), activation = CNN is compact. The implemented CNN’s layers are listed in
’relu’) Model Summary 1. It must be noted that in the system utilized
MaxPooling2D (MaxPooling2D (2,2)) for driver drowsiness detection, the system accuracy must
Conv2D(4, (2,2), activation = ’relu’) be as accurate as possible. For this reason, two additional
MaxPooling2D(pool size = (2,2)) models, CNN I and CNN II, are designed for having a
Conv2D(4, (2,2), activation = ’relu’) system with high accuracy. These models are implementable
MaxPooling2D(pool size = (2,2)) like Model Summary 1, but they require more memory. The
Conv2D(4, (2,2), activation = ’relu’) CNN II model is depicted in Fig. 3. Regarding CNN I, the
MaxPooling2D(pool size = (2,2)) first two convolution layers have three channels, similar to
Flatten() CNN II. However, the remaining convolution layers in CNN I
Dense(32, activation = ’relu’) have 4, 8, 16, and 32 channels. After each two convolution
Dense(classes, activation = ’softmax’) layers, a max-pooling layer is considered. Additionally, the
first dense layer has 128 neurons. Furthermore, the size of
input images in CNN I and CNN II is 160 × 120, then the
number of parameters in CNN I and CNN II are 21,282 and
As illustrated by Model Summary 1, the system is 45,546, respectively. Moreover, CNN I and CNN II accuracy
extremely lightweight and has a low number of parameters. and loss functions are depicted in Fig. 2.
Input image size is 100 × 100, and images for training Intending to evaluate the proposed models, we trained
and testing are grayscale. It must be noted that the CNN several well-known models with our dataset, including
model is designed based on the hardware limitations. For MobileNet V2, VGG-16, and Inception. While for Model
implementing the model on the hardware, the input size of the Summary 1, we trained the network with grayscale images,
image and the number of filters alongside the size of the filters this time, the networks are trained with an RGB version of
in the first convolutional layer’s have a significant effect on our dataset. In table 1 the CNN II accuracy and its number
the area of the implemented model on hardware. Therefore, of parameters are compared with famous state-of-the-art
memory usage is managed to use the minimum amount due networks. However, the reason for training with the RGB data
to a lack of memory on low-cost FPGAs. Besides the input is that the models MobileNet V2, VGG-16, and Inception are
image and convolutional layers, the activation function of trained with the ImageNet dataset, which contains RGB data,
all convolution layers is considered to be rectified linear and the weights are extracted for RGB images. As a result,
unite (RELU) in this model. Regarding the max-pooling for having a fair comparison between CNN II, and the pre-
layers, stride size is two, and padding does not add. There are trained models, CNN II is trained by RGB data as well. While
also two fully connected layers at the end of the model, first the accuracy of MobileNet is significantly higher than that of
with RELU activation and second with SoftMax activation, CNN II, the accuracy of VGG-16 is approximately the same
as four classes are considered. After training the model on the as CNN II trained with RGB data. Moreover, the accuracy
described dataset, the achieved accuracy for validation data is of Inception is lower than CNN II. It must be noted that the
81.07%. Fig. 2 (c) and Fig. 2 (f) depicts the system accuracy number of parameters in this pre-trained model is by far more
and loss in terms of the number of epochs for the CNN model. than the CNN II. These results indicate that although CNN I
FIGURE 2. (a): CNN I Accuracy, (b): CNN II Accuracy, (c): Implemented CNN Accuracy, (d): CNN I Loss, (e): CNN II Loss, (f): Implemented CNN Loss.
C. SOFTWARE OPTIMIZATION
This subsection covers optimizations considered in writing
C code to achieve minimum code size for the implemented
CNN. As we mentioned in subsection III-B, the lack of
memory in low-cost FPGA models leads to selecting the
CNN model, which has a low number of weights and biases.
Despite choosing a lightweight model, tackling these issues
requires even more consideration. Overall, for some systems,
especially memory-hungry systems, the memory unit on
FPGAs might be insufficient. For this reason, intending to
decrease the needed memory block size on FPGA, all of the
weights and biases are converted into fixed-point numbers. FIGURE 3. The structure design of The improved CNN II model.
For conversion from floating-point to fixed-point, a python
package named fxpmath is utilized [23]. Also, for a better and comparing it to the famous IEEE 754 standard for float
understanding of the fixed-point configuration in our design numbers in Fig. 4, our fixed-point configuration can be seen.
FIGURE 6. Output histogram of (a) First, (b) Second, (c) Third, and (d) Forth convolution layer, (e) First dense layer.
TABLE 3. Fixed-point configuration of each layer’s output. the Bare-Metal systems compiler is considered to implement
dynamic memory allocation. Fig. 7 provides a block diagram
to indicate how to compile C codes that contain DMA
functions for Bare-Metal cpu’s. Also, in Fig. 7, we should
mention that the −sbrk() must be added at the first line of the
written C code before other functions to allow the compiler
to utilize DMA functions. Moreover, we endeavored to use
points stated in [24] to optimize the written Makefile and
linkerfile for the compiler.
Finally, for the last dense layers configuration, we decided Alongside the optimizations mentioned above for memory
to keep 16-bit representation, to better differentiate between usage, we optimized the code in terms of run time and
four classes of the network. latency with the help of custom instructions. Overall, the
For the second optimization, due to an increase in the convolution function is the most called in this code. Algo-
number of intermediate variables after each convolution rithm 2 depicts the pseudo-code of a standard convolution
layer, we utilized Dynamic Memory Allocation (DMA) which indicates that the multiply and accumulation (MAC)
functions, especially Malloc() and free() in coding. By these operation frequently happens inside for loops. Due to
functions, the heap memory of the processor is allocated this reason, it was convenient to add MAC operation or
and deallocated to the variables dynamically. For example, even the whole convolution operation to the hardware.
by these functions, after calculating the first layer’s output, Besides, in the case of MAC operation, it can also be used
there is no need for the first layer’s input to occupy the for two fully connected layers of our implemented CNN
memory anymore. Subsequently, the occupied memory is network to reduce the run time of the code even more.
released with the free() function and automatically reallo- In Section IV, the hardware optimizations are explained by
cated to the following layer’s variables. However, due to considering the process of adding a custom store as custom0,
the architecture of the RiscV compiler, usage of dynamic conv2d(2 × 2) operation as custom1, and MAC as custom2
memory allocation functions in C or C++ is not possible to the Arithmetic Logic Unit (ALU) and processor decoder.
unless the target embedded processor has an operating However, in C code optimization, Fig. 8 displays a block
system. The −libgloss library contains these functions, but diagram of how to modify the C code and the compiler to
it applies to processors with an operating system. In this utilize the help of custom instructions. It must be noted that
case, we use a Bare-Metal processor on FPGA, so the for conv2d(2 × 2) operation as custom1, considering the 8bit
usage of −libgloss must be avoided in the Makefile of the demonstration and comprehending the size of the convolution
compiler. Then a specific version of the −sbrk() library for filters is 2 × 2, only the first operand of the 32bit ALU is
REFERENCES
FIGURE 16. Real word test of designed systems.
[1] A. Chowdhury, R. Shankaran, M. Kavakli, and M. M. Haque, ‘‘Sensor
applications and physiological features in drivers’ drowsiness detection:
A review,’’ IEEE Sensors J., vol. 18, no. 8, pp. 3055–3067, Apr. 2018.
are used. For illustrating the system output and captured
[2] A. Quddus, A. S. Zandi, L. Prest, and F. J. E. Comeau, ‘‘Using
images from the camera module, an HDMI monitor used long short term memory and convolutional neural networks for driver
in Fig. 14 is connected to VGA connectors of the FPGA drowsiness detection,’’ Accident Anal. Prevention, vol. 156, Jun. 2021,
board via VGA to HDMI converter 15. However, the monitor Art. no. 106107.
[3] M. Hashemi, A. Mirrashid, and A. B. Shirazi, ‘‘Driver safety development:
block is optional and can be omitted when implementing the Real-time driver drowsiness detection system based on convolutional
real-world application. For latency evaluation, we enabled a neural network,’’ Social Netw. Comput. Sci., vol. 1, no. 5, p. 289, Sep. 2020,
counter at the beginning of the C code and disabled it at the doi: 10.1007/s42979-020-00306-9.
[4] D. T. Nguyen, T. N. Nguyen, H. Kim, and H. J. Lee, ‘‘A high-throughput
end of the code to measure the run time of each code precisely. and power-efficient FPGA implementation of YOLO CNN for object
As a result of an example picture from the dataset, latency is detection,’’ IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 8,
390ms for ibex with no custom instruction, 289ms for ibex pp. 1861–1873, Aug. 2019.
[5] RISC-V Specification, vol. 1, RISC-V, Unprivileged Spec V, Cham,
with custom store and MAC used in convolution and fully Switzerland, 2019.
connected function, and 231ms for ibex with custom store, [6] S. K. Moore, ‘‘RISC-V AI chips will be everywhere,’’ IEEE Spectrum,
conv2d(2 × 2) for convolution and MAC for fully connected New York, NY, USA, Tech. Rep., 2022.
[7] P. Dabbelt et al. (2020). GNU Toolchain for RISC-V, Including GCC.
function as shown in Table 4. For accuracy measurements, [Online]. Available: https://github.com/riscv/riscv-gnu-toolchain
we tested both FPGA and Raspberry Pi based systems in [8] S. Arif, M. Arif, S. Munawar, Y. Ayaz, M. J. Khan, and N. Naseer,
real-world and deployed the system inside a car, as can be ‘‘EEG spectral comparison between occipital and prefrontal cortices for
early detection of driver drowsiness,’’ in Proc. Int. Conf. Artif. Intell.
seen in Fig. 16. Moreover, the accuracy results can be seen
Mechatronics Syst. (AIMS), Apr. 2021, pp. 1–6.
in Tables 4 and 6. Besides for FPGA implementation, the [9] S. Murugan, J. Selvaraj, and A. Sahayadhas, ‘‘Detection and analysis:
Vivado utilization table is considered to compare resource Driver state with electrocardiogram (ECG),’’ Phys. Eng. Sci. Med., vol. 43,
utilization in both cases with and without custom instructions, no. 2, pp. 525–537, Jun. 2020.
[10] A. T. Satti, J. Kim, E. Yi, H.-Y. Cho, and S. Cho, ‘‘Microneedle array
as shown in Table 5. Finally, Table 6 compares the FPGA electrode-based wearable EMG system for detection of driver drowsiness
implementation with Raspberry Pi implementation. through steering wheel grip,’’ Sensors, vol. 21, no. 15, p. 5091, Jul. 2021.
Overall, results indicate that the processor with added [11] G. Sikander and S. Anwar, ‘‘Driver fatigue detection systems: A review,’’
IEEE Trans. Intell. Transp. Syst., vol. 20, no. 6, pp. 2339–2352, Jun. 2018.
custom store, conv2d(2 × 2), and MAC operation can achieve [12] V. S. Kumar, S. N. Ashish, I. V. Gowtham, S. P. A. Balaji, and E. Prabhu,
the best result in terms of latency but with negligible increase ‘‘Smart driver assistance system using raspberry pi and sensor networks,’’
in the usage of DSP blocks of the board. Microprocessors Microsyst., vol. 79, Nov. 2020, Art. no. 103275.
[13] S. Gupta, P. Jain, and E. Rufus, ‘‘Drowsy driver alerting system,’’ in Proc.
2nd Int. Conf. Electron., Commun. Aerosp. Technol. (ICECA), Mar. 2018,
VI. CONCLUSION pp. 1665–1670.
This paper discusses the implementation of a modified [14] A. A. Suhaiman, Z. May, and N. A. Rahman, ‘‘Development of an
intelligent drowsiness detection system for drivers using image processing
embedded processor based on RiscV ISA for driver drowsi- technique,’’ in Proc. IEEE Student Conf. Res. Develop. (SCOReD),
ness detection systems. The whole system consists of an Sep. 2020, pp. 233–236.
embedded processor, camera, and a monitor connected to [15] W. Deng and R. Wu, ‘‘Real-time driver-drowsiness detection system using
facial features,’’ IEEE Access, vol. 7, pp. 118727–118738, 2019.
the VGA port of the FPGA. The drowsiness is detected
[16] B. Reddy, Y.-H. Kim, S. Yun, C. Seo, and J. Jang, ‘‘Real-time driver
with a convolutional neural network. Implemented CNN drowsiness detection for embedded system using model compression of
classifies input images taken from the driver into four deep neural networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
classes: distraction, natural, sleep, and yawning. Moreover, Workshops, Jul. 2017, pp. 121–128.
[17] B. K. Savas and Y. Becerikli, ‘‘Real time driver fatigue detection system
system’s hardware and software parts are optimized for based on multi-task ConNN,’’ IEEE Access, vol. 8, pp. 12491–12498,
this application. On the software side, Dynamic Memory 2020.
[18] G. Zhang, K. Zhao, B. Wu, Y. Sun, L. Sun, and F. Liang, ‘‘A RISC-V based ERFAN GHOLIZADEHAZARI was born in
hardware accelerator designed for Yolo object detection system,’’ in Proc. Urmia, Iran. He received the B.Sc. and M.Sc.
IEEE Int. Conf. Intell. Appl. Syst. Eng. (ICIASE), Apr. 2019, pp. 9–11. degrees (Hons.) in electronics from Istan-
[19] A. Gonzalez and C. Hong, ‘‘A chipyard comparison of NVDLA and bul Technical University, Turkey, in 2021.
Gemmini,’’ Berkeley, CA, USA, Tech. Rep. EE 290-2, 2020. From 2017 to 2018, he was a Researcher at Urmia
[20] F. Farshchi, Q. Huang, and H. Yun, ‘‘Integrating NVIDIA deep learning University, Urmia, where he was involved in the
accelerator (NVDLA) with RISC-V SoC on FireSim,’’ in Proc. 2nd IR signal decoding project using NEC protocol for
Workshop Energy Efficient Mach. Learn. Cognit. Comput. Embedded Appl.
controlling projection systems. His undergraduate
(EMC), Feb. 2019, pp. 21–25.
research was about wireless charger designing and
[21] E. Gholizadehazari, T. Ayhan, and B. Ors, ‘‘An FPGA implementation of
a RISC-V based SoC system for image processing applications,’’ in Proc. implementation. From 2020 to 2021, he was a
29th Signal Process. Commun. Appl. Conf. (SIU), Jun. 2021, pp. 1–4. Graduate Researcher for the project between TUBİTAK (The Scientific
[22] S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, ‘‘YawDD: and Technological Research Institution of Turkey) and Ministry of Science,
A yawning detection dataset,’’ in Proc. 5th ACM Multimedia Syst. Conf. Research and Technology, Iran, developing RISC-V based on SoC system
(MMSys), 2014, pp. 24–28. for driver fatigue detection algorithms. His research interests include
[23] A. Franco, J. Charlong, and E. Badger. A Python Library for microprocessor architecture, embedded systems, and HW/SW codesign.
Fractional Fixed-Point (Base 2) Arithmetic and Binary Manipula-
tion With Numpy Compatibility. Accessed: 2020. [Online]. Available:
https://github.com/francof2a/fxpmath
[24] M. Perotti, P. D. Schiavone, G. Tagliavini, D. Rossi, T. Kurd, M. Hill,
L. Yingying, and L. Benini, ‘‘HW/SW approaches for RISC-V code size
reduction,’’ in Proc. Workshop Comput. Archit. Res. RISC-V (CARRV),
2020, pp. 1–8.
[25] LowRISC. Ibex Core Documentation. Accessed: 2017. [Online]. Avail-
able: https://ibex-core.readthedocs.io/en/latest/
[26] A. Amid et al., ‘‘Chipyard: Integrated design, simulation, and implementa-
tion framework for custom SoCs,’’ IEEE Micro, vol. 40, no. 4, pp. 10–21,
2020, doi: 10.1109/MM.2020.2996616.
[27] R. Herveille, ‘‘WISHBONE system-on-chip (SoC) interconnection archi-
tecture for portable IP cores,’’ OpenCores, Amsterdam, The Netherlands,
Tech. Rep., 2010. MORTEZA MOUSAZADEH was born in Urmia,
[28] (2011). A R M Limited. AXI Spec. [Online]. Available: http:// Iran. He received the B.S. degree in electrical
www.gstitt.ece.ufl.edu/courses/fall15/eel4720_5721/labs/refs/AXI%4_ engineering from the Iran University of Science
specification.pdf and Technology, Tehran, Iran, in 2003, and the
[29] MathWorks. MATLAB Support Package for Raspberry PI Hardware. M.S. degree in electrical engineering and the Ph.D.
Accessed: 2014. [Online]. Available: https://www.mathworks.com/ degree in microelectronics from Urmia University,
hardware-support/raspberry-pi-MATLAB.html Urmia, in 2006 and 2014, respectively. He is
[30] MathWorks. Deep Learning Toolbox Converter for Tensorflow Models. currently a Professor at Urmia University. His
Accessed: 2017. [Online]. Available: https://www.mathworks.com/matla research interests include mixed mode IC design,
bcentral/fileexchange/64649-deep-learning-toolbox-converter-for- data converter, and AI accelerator.
tensorflow-models
[31] Digilent. Nexys 4 DDR Board. Accessed: 2013. [Online]. Available:
https://digilent.com/reference/programmable-logic/nexys-4-ddr/start