On-Device Training of Artificial Intelligence Models On Microcontrollers
On-Device Training of Artificial Intelligence Models On Microcontrollers
Bao-Toan Thai 1 , Vy-Khang Tran1 , Hai Pham1,2 , Chi-Ngon Nguyen1 , Van-Khanh Nguyen1
1
Faculty of Automation T echnology, College of Engineering, Can T ho University, Can T ho, Vietnam
2
School of Engineering, RMIT University, Melbourne, Australia
Corresponding Author:
Van-Khanh Nguyen
Faculty of Automation Technology, College of Engineering, Can Tho University
Campus II, 3/2 street, Ninh Kieu district, Can Tho city, Vietnam
Email: [email protected]
1. INTRODUCTION
The tiny machine learning (tinyML) model is an approach that enables the direct application of
machine learning (ML) on embedded systems. It focuses on integrating ML techniques directly into
micro-controller units (MCUs) with limited computing power and memory resources [1]. The tinyML
approach has been applied in various fields such as healthcare, smart agriculture, environmental, and
anomaly detection [2]–[4]. Most of these applications utilize models pre-trained on powerful computers and
then deploy them directly onto MCUs. This provides flexibility and efficiency in processing information now
on the device [5]. Based on the research in [6] the typical tinyML deployment process involves the following
steps: training the model on a powerful computing device, quantizing the model within the TensorFlow Lite
framework [7], and finally deploying the quantized model on an MCU to perform inference tasks.
However, when needed, the tinyML model cannot be retrained with new data. Currently, some
artificial intelligence (AI) models in general, ML and deep learning (DL) models in particular have been
directly trained on MCUs, garnering significant interest [8], [9]. This approach is also referred to as
on-device training (ODT). ODT directly trains the model on small computing devices, such as MCUs,
without pre-training. This method lets the model be instructed with data acquired during the device's
operation [9]. However, ODT still faces challenges like computational capability and memory constraints.
Many recent studies, such as [10]–[14] have proposed various techniques to optimize models and memory,
enabling the computation of complex models on small devices.
Real-time operating system (RTOS) is an operating system that supports scheduling mechanisms to
ensure tasks can be completed within specific time constraints [15], [16]. FreeRTOS [17] is one of the RTOS
kernels developed for embedded systems, and it supports the most common micro -controller families. Using
freeRTOS not only helps manage tasks more efficiently but also leverages the capa bilities of multi-core
MCUs for parallel execution. This enhances the performance of embedded devices. FreeRTOS has been
applied to accelerate the computation of artificial neural networks (ANN) by dividing them into two
corresponding tasks and assigning them to two cores for scheduling [18]. It is also applied to enhance the
efficiency of signal preprocessing for tinyML applications [19]. Accelerating tinyML with freeRTOS is
challenging due to the undisclosed structure of pre-trained models. However, for ODT, freeRTOS can
effectively leverage its capabilities when running on multi-core MCUs, scheduling training and prediction
tasks on two separate cores. This allows devices to make continuous predictions with out interruption from
the model training process when needed.
Current ODT solutions still have some inherent limitations. According to Sudharsan et al. [20], the
Train++ algorithm is utilized to address classification and regression problems on common MCUs. However,
the embedded program algorithm needs to be more specific and generalized to accommodate the addition of
hidden layers. As a result, creating complex ANNs is challenging. Craighero et al. [21] has developed the
capability to train convolutional neural network (CNN) models directly on the STM32 MCU to address
human action recognition problems. However, deploying this research might face challenges if the data
cannot be classified, for example, in cases like electrocardiogram (ECG) signals requiring cardiac expert
evaluation. This study also focuses on a specific application and has only been tested on one MCU. This
suggests that if training ANN models using unsupervised learning algorithms on MCUs is poss ible, the
applicability could be broader, such as models for anomaly detection based on autoencoders, and ANN
models for prediction in internet of things (IoT) applications.
This research focuses on developing a feature-rich and highly customizable capability for creating
and training ANNs on embedded systems. To achieve this, fundamental functions of ANN models such as
forward and backward propagation, activation functions, loss functions, and ANN creation and training
functions (i.e., add, use_loss, fit and predict) will be generically programmed based on object -oriented
programming languages. Subsequently, these functions will train and predict on one PC and some common
MCUs to evaluate performance. Additionally, freeRTOS will be applied to perform parallel training and
prediction tasks on multi-core MCUs to enhance the efficiency of the ODT method.
2. EXECUTION METHODS
First, Figure 1 illustrates the relevant mathematical analyses of the neural network model.
Subsequently, the ANN library is created based on them. Next, several ANN models ranging from simple to
complex are directly implemented on both one PC and various types of MCUs using this library to evaluate
their performances.
the characteristics of the data. To provide a more precise understanding, Fig ure 2 illustrates each layer's
forward propagation computation process. Notably, each layer's output becomes the following layer's input,
forming a linked chain between layers in the model. The relevant mathematical symbols of the AI model are
depicted in Figure 1. x represents the input data, W is the weight matrix, b is the bias parameter, z is the linear
parameter, a is the activation function, ŷ is the output of the model, l is the loss function, L represents the
number of layers in the model, and n denotes the number of layers in the model.
𝑎 = 𝑓(𝑧) (1)
where a is the activation function, which includes non-linear functions such as Relu, Tanh, and Sigmoid [23].
z being the value of the linear function calculated by (2):
𝑧 = 𝑥𝑇 𝑊 + 𝑏 (2)
During the model training process, the backward propagation algorithm is applied. This algorithm starts from
the last layer to the first layer, combining the chain rule. In this proces s, the gradient of the loss function is
calculated in (3) to adapt the parameters W, x and b to the data, as presented in [21].
𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙
= 𝑥𝑇 , = 𝑊 𝑇 ⨀𝑓′(𝑥) , = (3)
𝜕𝑊 𝜕ŷ 𝜕𝑥 𝜕ŷ 𝜕𝑏 𝜕ŷ
The model parameters are updated by the stochastic gradient descent (SGD) [24] algorithm, as presented in (4).
𝜕𝑙
𝜃 = 𝜃 − 𝑙𝑟 (4)
𝜕𝜃
where 𝜃 is the set of parameters to be optimized, such as W, b, x, and lr is the learning rate.
After each training cycle, the loss function is used to evaluate the training performance. In this
study, three types of loss functions have been implemented, including mean squared error (MSE) [25], binary
cross-entropy (BCE), and categorical cross-entropy (CE) [26], represented mathematically in (5) to (7).
1
𝑀𝑆𝐸 (𝑦, ŷ ) = (𝑦 − 𝑦̂) 2 (5)
𝑀
where y is the actual value, ŷ is the predicted value, and M is the total number of samples.
(a) (b)
Figure 3. Implement an ANN model on an embedded system: (a) program segments to train and predict the
ANN model and (b) ANN model structure to create
The algorithm for the fit method is presented in Algorithm 1, executed in four steps. Firstly, err is
initialized to 0. Subsequently, each sample of the x_train data is utilized to compute ForwardPropagation to
find the output of the ANN with the current set of weights and biases. Next, the error is calculated relative to
the actual values using the defined loss function. Simultaneously, the rate of change of the loss,
derivative_loss, is computed. Finally, the BackwardPropagation algorithm is executed based on
∂l
derivative_loss (i.e., in (4)), and the learning rate (lr) is used to update the parameters of the network. The
∂θ
optimization algorithm SGD is explicitly implemented as follows:
weights[i] -= lr * weights_error[i][j];
bias[i] -= lr * output_error[i];
∂l ∂l
In the equations above, weights, bias, weights_error, output_error correspond to W, b, , ;
∂W ∂b
where i, j range from 1 to n. The training algorithm is iterated for epochs times. The va lue of err after each
epoch is collected to assess the success of the training process.
Algorithm 1. Training
fit (x_train, y_train, epochs, lr):
For e each value of epochs:
err 0
For i each row of x_train:
output[i] ForwardPropagation(x_train[i])
err err + loss(y_train[i], output)
BackwardPropagation (derivative_loss, lr)
End For
err err / number of samples
End For
The predict method is presented in Algorithm 2. This method calls the ForwardPropagation method
as in (2) to find the output of the trained ANN. This method evaluates the ANN model after training with a
test dataset or new data collected by devices applying the ANN model created by this library.
Algorithm 2. Predict
predict (data):
For i each row of data:
output[i] ForwardPropagation(data[i])
End For
The three proposed models to be evaluated on both MCUs and one PC are summarized in Table 3.
All these models have one neuron in the input and output layers. Models 1, 2, and 3 have 1, 2, and 3 hidden
layers, respectively, with corresponding parameter counts of 193, 4,353, and 8,513 parameters. These models
will be implemented to evaluate metrics such as training and prediction time for a single input sample. Each
model is trained for 30 epochs with a learning rate 0.001, using ReLu as the activation function and MSE as
the loss function.
The performance of the ANN models is evaluated using a regression problem with a cubic
polynomial of the form presented in (4). The training data, randomly generated from this polynomial, is
On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)
2834 ISSN: 2252-8938
illustrated in Figure 4. The black dots on the graph represent the data used during training, and the red line
represents the actual values of the polynomial.
𝑔 (𝑥) = 𝑎𝑥 3 + 𝑏𝑥 2 + 𝑐𝑥 + 𝑑 (8)
The function g(x) is defined as a cubic equation with coefficients a, b, c, d corresponding to 1, 2, -3, -4,
respectively.
Two datasets were created to evaluate the performance and efficiency of multi-threaded
programming in the training and prediction of ANN models. The specific information about the two datasets
is as follows. Dataset-1 contains 150 data points (x, g(x)), with x ranging from -3 to 3, divided into two sets:
train and test, consisting of 100 and 50 data points, respectively. The MSE of the model is used for evaluation
on both PC and MCU. Dataset-2 consists of 300 data points (x, g(x)), with x ranging from -3 to 3, divided
into two sets: train and test, consisting of 200 and 100 data points, respectively. Particularly for Datas et-2, it
is further divided into four smaller datasets, each containing 50 samples. These smaller datasets are
sequentially used to train the model through Task Train, evaluating the parallel execution capability during
training and prediction on a multi-core MCU.
The models will be trained and tested with single-task programming running on a single-core MCU
or a separate core of a multi-core MCU and multi-task programming using freeRTOS to run on a multi-core
MCU. Figure 5 presents two flowcharts implementing the ANN model on single-core MCUs and multi-core
MCUs. Figure 5(a) illustrates the process of deploying the ANN model using sequential programming. The
program will create the ANN model upon startup using the provided functions. If this is the first training,
weights, and biases will be randomly initialized, and the model will be trained with Dataset -1. Conversely,
the weights are loaded from the system's non-volatile memory if the ANN model has already been taught.
Then, the program will perform the main loop, consisting of two sequential tasks: retraining the ANN and
making predictions. These tasks are executed at different speeds, set by prediction and training time. In this
case, it is evident that when the MCU retrains the system, it cannot predict new data.
To address the challenges, Figure 5(b) illustrates the process of training and predicting algorithms
on a multi-core MCU using multi-task programming with freeRTOS. The training and prediction processes
are implemented as two freeRTOS tasks, each assigned to a separate core of the MCU. FreeRTOS will be
responsible for scheduling these two tasks to run in parallel, ensuring that the training process does not
interrupt the prediction process. Similar to the algorithm in Fig ure 5(a), the program will create the ANN
model using the provided functions after startup. If it is the first training, weights and biases will be randomly
initialized, and the flag wait is set to true to notify that the prediction process must wait until the first training
process is completed. Conversely, suppose the model has already been trained. In that case, the model's
parameters will be loaded from the MCU's non-volatile memory, and the flag wait will be set to false to
allow the task predict to operate. After each completion of the Task Train, the weights and biases will be
saved to the MCU's non-volatile memory; the trained flag will be set to true to allow the task predict to
update the new parameters, and then reset the trained flag to false.
(a) (b)
Figure 5. Flow chart of implementing ANN model: (a) single-core MCU, and (b) dual-core MCU
3. RESULTS
3.1. Evaluate artificial neural network performance on micro-controller units
The real-time execution time evaluation results for the three models on Dataset -1 are presented in
Tables 4 and 5. In Table 4, MCU-1 and MCU-3 exhibit similar training speeds, significantly faster than
MCU-2 and MCU-4. This difference could be attributed to the lower clock speeds of MCU-2 and MCU-4
compared to MCU-1 and MCU-3. A similar pattern is observed in the prediction speeds of the models in
Table 5. However, the overall training times are relatively long. Even the model with the lowest number of
parameters, Model-1, requires up to 12 seconds to complete training on 100 samples. In contrast, Model-3,
with the highest number of parameters, takes up to 135.52 seconds to finish training on the MCU with the
highest clock speed. Therefore, training ANN models directly on MCUs using sequential programming is not
feasible, making it challenging to ensure real-time performance on the devices. Conversely, the prediction
speeds of the ANN models can range from approximately 1.7 Hz to 23 Hz on high -clock-speed MCUs. This
suggests that the trained models can be applied to MCUs for prediction tasks, bu t retraining them is difficult
due to the lengthy interruption caused by the device's training process.
The MSE results between PC and MCUs are presented in Fig ure 6, demonstrating a significant
similarity between the two platforms with only a tiny difference of approximately 0.1. The main reason for
this slight discrepancy is that MCUs support data representation and operations with lower precision than one
PC. Nevertheless, this proves that trained ANN models can be directly deployed on resource -constrained
devices. Therefore, a parallel programming mechanism needs to b e implemented to achieve simultaneous
training and prediction on these MCUs.
Figure 8 presents the prediction results of model-1 after five training sessions, where the lines
represent the prediction results of the model with the test dataset of Dataset-2. The results show that, after
each training session, the prediction function gradually approaches the original graph of the cubic function.
The prediction MSE decreases from 0.17 to 0.04 after five training sessions, indicating a 76.5% improvement
in the model-1 MSE. This training process can continue to improve the accuracy of the model further because
the device's prediction is not interrupted by the training process. The parameters will be updated after each
training session, almost without affecting the prediction. In summary, although the training time increases
with the complexity of the model, this drawback has been overcome with the multi-task programming
technique running on multi-core MCUs. This demonstrates that the ANN model library and multi-task
algorithm can be applied to deploy traditional ANN models directly on common MCUs.
4. DISCUSSION
In this study, we developed a library for deploying ANN models directly on MCUs. The create d
models are trained and predicted on the MCUs. Experimental results show that the prediction time of the
models after training is relatively fast. However, the training time could be longer, making it impractical to
deploy traditional ANN models directly on MCUs due to the lengthy training process, which makes it
challenging to ensure real-time performance. The application of multi-task programming based on freeRTOS
has addressed this drawback. The training process can be iterated to improve the model's ac curacy while the
prediction process continues continuously with the latest parameter set, ensuring the system's functionality.
By applying multi-task programming to enable parallel training and prediction processes, this
research allows the direct implementation of traditional ANN models on MCUs without optimizing
algorithms. This is advantageous because the relevant mathematical operations have been well-established
and thoroughly evaluated. The developed ANN library is highly flexible, allowing easy addit ion and
modification of layers, activation, and loss functions through the library's application programming interface
(API) functions. Moreover, this ANN library has the potential for scalability and customization across
various platforms. Indeed, in addition to the integrated activation and loss functions, new functions can be
easily programmed using their mathematical representations. The library is written in the object -oriented
programming language C++ and focused on core features, making it easy to transition to other programming
languages or platforms in the future. It can also be customized to be deployed on various multi-core MCU
models. Compared to the Train++ algorithm [20], both studies support multiple MCUs. However, this
research demonstrates greater flexibility in implementing various ANN models with different structures,
whereas Train++ does not support hidden layers.
Despite the advantages, the ANN library in this study also has three main limitations . First,
optimization algorithms have yet to be applied to accelerate processing speed, posing challenges when
applied to applications demanding high computational speeds and making it difficult to implement
complex-scaled models due to the limited memory of MCUs. Second, the level of multitasking could be
higher, utilizing only two main parallel tasks and not fully exploiting the computa tional capabilities of
multi-core MCUs. Task partitioning for training and prediction should be prioritized and concentrated on in
the future. Finally, the library is currently limited to ANNs, and integrating data storage capabilities may
prove challenging when applied to supervised learning-based classification applications.
In reality, the ANN model cannot only handle regression problems but can also be extended to
perform various tasks, such as classification and anomaly detection. However, implementin g anomaly
detection based on unsupervised learning methods would be more feasible, and storing labeled data on the
MCU is complex. This research can be highly applicable in real-world scenarios, especially in IoT
applications where devices are being explored to integrate self-learning, analysis, and decision-making
capabilities locally instead of relying on an AI network operating on the system's cloud server.
5. CONCLUSION
In this study, a library has been developed to deploy ANN models directly on multi-core MCUs.
The models, once created, are trained and predicted on the MCUs. Experimental results demonstrate that the
prediction time for the models after training is relatively fast. In contrast, the shortest training time takes up
to 12 seconds for the mos t complex model. However, this issue has been addressed by multitasking using
freeRTOS on multi-core MCUs, allowing the training and prediction processes to occur concurrently without
interference. Furthermore, the training process can continue to improve the accuracy of the ANN model
when additional relevant data is collected. This research can potentially integrate ANNs into embedded
devices, especially in the IoT domain. For example, when estimating irrigation needs in agriculture based on
soil moisture, soil temperature, and air temperature to determine the amount of irrigation water, the edge
device will be retrained regularly to adapt to climate conditions.
REFERENCES
[1] R. S. -Iborra and A. F. Skarmeta, “T inyML-enabled frugal smart objects: challenges and opportunities,” IEEE Circuits and
Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020, doi: 10.1109/MCAS.2020.3005467.
[2] Y. Abadade, A. T emouden, H. Bamoumen, N. Benamar, Y. Chtouki, and A. S. Hafid, “A Comprehensive survey on T inyML,”
IEEE Access, vol. 11, pp. 96892–96922, 2023, doi: 10.1109/ACCESS.2023.3294111.
[3] Y. Y. Siang, M. R. Ahamd, M. Shafinaz, and Z. Abidin, “Anomaly detection based on tiny machine learning: A review,” Open
International Journal of Informatics, vol. 9, no. Special Issue 2, pp. 67–78, 2021.
[4] B. Sun, S. Bayes, A. M. Abotaleb, and M. Hassan, “T he case for tinyML in healthcare: CNNs for real -time on-edge blood
pressure estimation,” in Proceedings of the ACM Symposium on Applied Computing, in SAC ’23. ACM, 2023, pp. 629–638, doi:
10.1145/3555776.3577747.
[5] D. L. Dutta and S. Bharali, “TinyML Meets IoT: a comprehensive survey,” Internet of Things (Netherlands), vol. 16, 2021, doi:
10.1016/j.iot.2021.100461.
[6] N. Schizas, A. Karras, C. Karras, and S. Sioutas, “TinyML for ultra-low power AI and large scale IoT deployments: a systematic
review,” Future Internet, vol. 14, no. 12, 2022, doi: 10.3390/fi14120363.
[7] P. Warden and D. Situnayake, TinyML: machine learning with TensorFlow Lite on Arduino and ultra-low-power
microcontrollers, Sebastopol, USA: O'Reilly Media, Inc, 2019.
[8] J. Lee and H.-J. Yoo, “An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,” IEEE
Open Journal of the Solid-State Circuits Society, vol. 1, pp. 115–128, 2021, doi: 10.1109/ojsscs.2021.3119554.
[9] O. D. Incel and S. O. Bursa, “On-device deep learning for mobile and wearable sensing applications: A review,” IEEE Sensors
Journal, vol. 23, no. 6, pp. 5501–5512, 2023, doi: 10.1109/JSEN.2023.3240854.
[10] D. Nadalini, M. Rusci, L. Benini, and F. Conti, “Reduced precision floating-point optimization for deep neural network on-device
learning on microcontrollers,” Future Generation Computer Systems, vol. 149, pp. 212–226, 2023, doi:
10.1016/j.future.2023.07.020.
[11] L. Wu, J. Liu, S. Vazquez, and S. K. Mazumder, “Sliding mode control in power converters and drives: a review,” IEEE/CAA
Journal of Automatica Sinica, vol. 9, no. 3, pp. 392–406, 2022, doi: 10.1109/JAS.2021.1004380.
[12] A. N. Mazumder et al., “A survey on the optimization of neural network accelerators for micro -AI on-device inference,” IEEE
Journal on Emerging and Selected Topics in Circuits and Systems, vol. 11, no. 4, pp. 532–547, 2021, doi:
10.1109/JET CAS.2021.3129415.
[13] J. Lin, L. Zhu, W. M. Chen, W. C. Wang, C. Gan, and S. Han, “On -device training under 256KB memory,” Advances in Neural
Information Processing Systems, vol. 35, 2022.
[14] M. Chowdhary and S. S. Saha, “On-sensor online learning and classification under 8 KB memory,” in 2023 26th International
Conference on Information Fusion, FUSION 2023 , IEEE, 2023, pp. 1-8, doi: 10.23919/FUSION52260.2023.10224228.
[15] E. L. Lamie, “Real-time embedded multithreading using threadX and MIPS,” Real-Time Embedded Multithreading Using
ThreadX and MIPS. CRC Press, pp. 1–465, Apr. 2019, doi: 10.1201/9780429187858.
[16] R. V. Aroca and G. Caurin, “A real time operating systems (RTOS) comparison,” WSO - Workshop de Sistemas Operacionais,
vol. 12, pp. 2441-2452, 2009.
[17] A. T homas, “Enclaves in real-time operating systems,” M.Sc. Dissertation, Department of Electrical Engineering and Computer
Science, University of California, Berkeley, USA. 2021.
[18] M. Z. H. Zim, “T inyML: analysis of xtensa LX6 microprocessor for neural network applications by ESP32 SoC,” ArXiv-
Computer Science, pp. 1-6, 2021, doi: 10.13140/RG.2.2.28602.11204.
[19] V. K. Nguyen, V. K. T ran, H. Pham, V. M. Nguyen, H. D. Nguyen, and C. N. Nguyen, “A multi-microcontroller-based hardware
for deploying T iny machine learning model,” International Journal of Electrical and Computer Engineering, vol. 13, no. 5, pp.
5727–5736, 2023, doi: 10.11591/ijece.v13i5.pp5727 -5736.
[20] B. Sudharsan, P. Yadav, J. G. Breslin, and M. Intizar Ali, “Train++: an incremental ML model training algorithm to create self-
learning IoT dDevices,” in Proceedings - 2021 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted
Computing, Scalable Computing and Communications, Internet of People, and Smart Ci ty Innovations,
SmartWorld/ScalCom/UIC/ATC/IoP/SCI 2021, IEEE, 2021, pp. 97–106, doi: 10.1109/SWC50871.2021.00023.
[21] M. Craighero, D. Quarantiello, B. Rossi, D. Carrera, P. Fragneto, and G. Boracchi, “On -device personalization for human activity
recognition on ST M32,” IEEE Embedded Systems Letters, vol. 16, no. 2, pp. 106-109, June 2024, doi:
10.1109/LES.2023.3293458.
[22] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. E. Mohamed, and H. Arshad, “State -of-the-art in artificial neural
network applications: A survey,” Heliyon, vol. 4, no. 11, Nov. 2018, doi: 10.1016/j.heliyon.2018.e00938.
[23] A. D. Jagtap and G. E. Karniadakis, “How important are activation functions in regression and classification? a survey,
performance comparison, and future directions,” Journal of Machine Learning for Modeling and Computing, vol. 4, no. 1, pp. 21–
75, 2023, doi: 10.1615/jmachlearnmodelcomput.2023047367.
[24] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade, Berlin, Heidelberg: Springer, 2012, pp.
421–436, doi: 10.1007/978-3-642-35289-8_25.
[25] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE,
MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, pp. 1–24, 2021, doi:
10.7717/PEERJ-CS.623.
[26] Y. Ho and S. Wookey, “The real-world-weight cross-entropy loss function: modeling the costs of mislabeling,” IEEE Access, vol.
8, pp. 4806–4813, 2020, doi: 10.1109/ACCESS.2019.29 62617.
BIOGRAPHIES OF AUTHORS
Hai Pham received his master’s degree from University of South Australia
(UniSA) in 2010. Since 2012, he has been a lecturer at Faculty of Automation Technology,
College of Engineering, Can Tho University. His research interests focus on Bistatic LIDAR
system for gas measurement in environmental and agricultural applications. He can be
contacted at email: [email protected].
Van-Khanh Nguyen received his master’s degree from Ho Chi M inh University
of Technology, Vietnam, in 2014 and his Doctor of Engineering degree from Tokyo
University of M arine Science and Technology, Japan, in 2020. Since 2007, he has been a
lecturer at the Faculty of Automation Technology, College of Engineering, Can Tho
University. Currently, he is the head of PLC Technology and Industrial IoT Lab. His
research interests concentrate on embedded systems and AIoT- and IoT-based applications in
environmental and agricultural control, electrocardiogram (ECG) real-time classification, and
anomaly detection. He can be contacted at email: [email protected].