0% found this document useful (0 votes)

28 views11 pages

On-Device Training of Artificial Intelligence Models On Microcontrollers

Numerous studies are currently training artificial intelligence (AI) models on tiny devices constrained by computing power and memory limitations by implementing model optimization algorithms. The question arises whether implementing traditional AI models directly on small devices like micro-controller units (MCUs) is feasible. In this study, a library has been developed to train and predict the artificial neural network (ANN) model on common MCUs. The evaluation results on the regression problem indicate that, despite the extensive training time, when combined with multitasking programming on multi-core MCUs, the training does not adversely affect the system's execution. This research contributes an additional solution that enables the direct construction of ANN models on MCU systems with limited resources.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views11 pages

On-Device Training of Artificial Intelligence Models On Microcontrollers

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 3, September 2024, pp. 2829~2839

ISSN: 2252-8938, DOI: 10.11591/ ijai.v 13.i3.pp2829-2839  2829

On-device training of artificial intelligence models on

microcontrollers

Bao-Toan Thai 1 , Vy-Khang Tran1 , Hai Pham1,2 , Chi-Ngon Nguyen1 , Van-Khanh Nguyen1
1
Faculty of Automation T echnology, College of Engineering, Can T ho University, Can T ho, Vietnam
2
School of Engineering, RMIT University, Melbourne, Australia

Article Info ABSTRACT

Article history: Numerous studies are currently training artificial intelligence (AI) models on
tiny devices constrained by computing power and memory limitations by
Received Nov 28, 2023 implementing model optimization algorithms. The question arises whether
Revised Jan 26, 2024 implementing traditional AI models directly on small devices like
Accepted Feb 14, 2024 micro-controller units (M CUs) is feasible. In this study , a library has been
developed to train and predict the artificial neural network (ANN) model on
common M CUs. The evaluation results on the regression problem indicate
Keywords: that, despite the extensive training time, when combined with multitasking
programming on multi-core M CUs, the training does not adversely affect the
Artificial intelligence system's execution. This research contributes an additional solution that
Free real-time operating system enables the direct construction of ANN models on M CU systems with
Micro-controllers limited resources.
On-device training
Real-time operating system This is an open access article under the CC BY-SA license.

Corresponding Author:
Van-Khanh Nguyen
Faculty of Automation Technology, College of Engineering, Can Tho University
Campus II, 3/2 street, Ninh Kieu district, Can Tho city, Vietnam
Email: [email protected]

1. INTRODUCTION
The tiny machine learning (tinyML) model is an approach that enables the direct application of
machine learning (ML) on embedded systems. It focuses on integrating ML techniques directly into
micro-controller units (MCUs) with limited computing power and memory resources [1]. The tinyML
approach has been applied in various fields such as healthcare, smart agriculture, environmental, and
anomaly detection [2]–[4]. Most of these applications utilize models pre-trained on powerful computers and
then deploy them directly onto MCUs. This provides flexibility and efficiency in processing information now
on the device [5]. Based on the research in [6] the typical tinyML deployment process involves the following
steps: training the model on a powerful computing device, quantizing the model within the TensorFlow Lite
framework [7], and finally deploying the quantized model on an MCU to perform inference tasks.
However, when needed, the tinyML model cannot be retrained with new data. Currently, some
artificial intelligence (AI) models in general, ML and deep learning (DL) models in particular have been
directly trained on MCUs, garnering significant interest [8], [9]. This approach is also referred to as
on-device training (ODT). ODT directly trains the model on small computing devices, such as MCUs,
without pre-training. This method lets the model be instructed with data acquired during the device's
operation [9]. However, ODT still faces challenges like computational capability and memory constraints.
Many recent studies, such as [10]–[14] have proposed various techniques to optimize models and memory,
enabling the computation of complex models on small devices.
Real-time operating system (RTOS) is an operating system that supports scheduling mechanisms to
ensure tasks can be completed within specific time constraints [15], [16]. FreeRTOS [17] is one of the RTOS

Journal homepage: http://ijai.iaescore.com

2830  ISSN: 2252-8938

kernels developed for embedded systems, and it supports the most common micro -controller families. Using
freeRTOS not only helps manage tasks more efficiently but also leverages the capa bilities of multi-core
MCUs for parallel execution. This enhances the performance of embedded devices. FreeRTOS has been
applied to accelerate the computation of artificial neural networks (ANN) by dividing them into two
corresponding tasks and assigning them to two cores for scheduling [18]. It is also applied to enhance the
efficiency of signal preprocessing for tinyML applications [19]. Accelerating tinyML with freeRTOS is
challenging due to the undisclosed structure of pre-trained models. However, for ODT, freeRTOS can
effectively leverage its capabilities when running on multi-core MCUs, scheduling training and prediction
tasks on two separate cores. This allows devices to make continuous predictions with out interruption from
the model training process when needed.
Current ODT solutions still have some inherent limitations. According to Sudharsan et al. [20], the
Train++ algorithm is utilized to address classification and regression problems on common MCUs. However,
the embedded program algorithm needs to be more specific and generalized to accommodate the addition of
hidden layers. As a result, creating complex ANNs is challenging. Craighero et al. [21] has developed the
capability to train convolutional neural network (CNN) models directly on the STM32 MCU to address
human action recognition problems. However, deploying this research might face challenges if the data
cannot be classified, for example, in cases like electrocardiogram (ECG) signals requiring cardiac expert
evaluation. This study also focuses on a specific application and has only been tested on one MCU. This
suggests that if training ANN models using unsupervised learning algorithms on MCUs is poss ible, the
applicability could be broader, such as models for anomaly detection based on autoencoders, and ANN
models for prediction in internet of things (IoT) applications.
This research focuses on developing a feature-rich and highly customizable capability for creating
and training ANNs on embedded systems. To achieve this, fundamental functions of ANN models such as
forward and backward propagation, activation functions, loss functions, and ANN creation and training
functions (i.e., add, use_loss, fit and predict) will be generically programmed based on object -oriented
programming languages. Subsequently, these functions will train and predict on one PC and some common
MCUs to evaluate performance. Additionally, freeRTOS will be applied to perform parallel training and
prediction tasks on multi-core MCUs to enhance the efficiency of the ODT method.

2. EXECUTION METHODS
First, Figure 1 illustrates the relevant mathematical analyses of the neural network model.
Subsequently, the ANN library is created based on them. Next, several ANN models ranging from simple to
complex are directly implemented on both one PC and various types of MCUs using this library to evaluate
their performances.

Figure 1. Overal neural networks

2.1. Related mathematics

An overview of the ANN [22] model is illustrated in Figure 1. In an ANN model, we typically have
an input layer and an output layer, with multiple hidden layers in between, depending on the choice to suit

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Int J Artif Intell ISSN: 2252-8938  2831

the characteristics of the data. To provide a more precise understanding, Fig ure 2 illustrates each layer's
forward propagation computation process. Notably, each layer's output becomes the following layer's input,
forming a linked chain between layers in the model. The relevant mathematical symbols of the AI model are
depicted in Figure 1. x represents the input data, W is the weight matrix, b is the bias parameter, z is the linear
parameter, a is the activation function, ŷ is the output of the model, l is the loss function, L represents the
number of layers in the model, and n denotes the number of layers in the model.

Figure 2. Forward propagation for each layer

The output of each layer is calculated according to (1):

𝑎 = 𝑓(𝑧) (1)

where a is the activation function, which includes non-linear functions such as Relu, Tanh, and Sigmoid [23].
z being the value of the linear function calculated by (2):

𝑧 = 𝑥𝑇 𝑊 + 𝑏 (2)

During the model training process, the backward propagation algorithm is applied. This algorithm starts from
the last layer to the first layer, combining the chain rule. In this proces s, the gradient of the loss function is
calculated in (3) to adapt the parameters W, x and b to the data, as presented in [21].

𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙
= 𝑥𝑇 , = 𝑊 𝑇 ⨀𝑓′(𝑥) , = (3)
𝜕𝑊 𝜕ŷ 𝜕𝑥 𝜕ŷ 𝜕𝑏 𝜕ŷ

The model parameters are updated by the stochastic gradient descent (SGD) [24] algorithm, as presented in (4).

𝜕𝑙
𝜃 = 𝜃 − 𝑙𝑟 (4)
𝜕𝜃

where 𝜃 is the set of parameters to be optimized, such as W, b, x, and lr is the learning rate.
After each training cycle, the loss function is used to evaluate the training performance. In this
study, three types of loss functions have been implemented, including mean squared error (MSE) [25], binary
cross-entropy (BCE), and categorical cross-entropy (CE) [26], represented mathematically in (5) to (7).

1
𝑀𝑆𝐸 (𝑦, ŷ ) = (𝑦 − 𝑦̂) 2 (5)
𝑀

𝐵𝐶𝐸 (𝑦, ŷ ) = −[𝑦 log(𝑦̂) + (1 − 𝑦) log(1 − 𝑦̂) ] (6)

𝐶𝐸 (𝑦, ŷ ) = −𝑦 log(𝑦̂) (7)

where y is the actual value, ŷ is the predicted value, and M is the total number of samples.

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

2832  ISSN: 2252-8938

2.2. Programming artificial neural network library on micro-controller units

According to the mathematical analysis of the ANN model, the library is built with three classes and
four methods, as presented in Table 1. The FClayers class performs computations for both forward and
backward processes. The ActivationLayer class integrates non -linear activation functions. Finally, the
Network class is the main class of the ANN library, comprising four main methods: add is used to add layers
and activation functions to the model; use_loss is used to define the loss function for evaluating the model's
performance; fit and predict are methods for training and predicting the model, respectively. The add method
can dynamically define the number of neurons, making the ANN model creation highly flexible.

Table 1. Classes and methods of the ANN model

Class/Method Function Description
FCLayer Class As a fully connected layer in the neural network.
ActivationLayer Class T he activation functions used in the network include: ReLU, Tanh.
Network Class T he main class for managing the neural network.
add Method of Network Add a new layer to the neural network model.
use_loss Method of Network Specify the loss function for the model, including MSE, BCE, CE.
fit Method of Network T rain the neural network model.
predict Method of Network Predict the output of the model.

An descriptive understanding of the ANN model implementation method on an embedded device is

shown in Figure 3. Figure 3(a) illustrates a code snippet to create an ANN model with one input neuron, one
output neuron, and two hidden layers. Firstly, the model object of the ANN model is instantiated using the
construction method of the Network class. Next, the add method is used to create the input neurons for the
two hidden layers (one layer with 64 neurons and the other with 32 neurons). Following each command to
create a hidden layer is a command to add an activation function for the neurons in that layer, as
demonstrated by adding the ReLU activation function. Finally, the comman d to create the output neuron and
declare the loss function using the use_loss method. The ANN model will be trained by the fit method with
the training data x_train, y_train, learning rate lr, and epochs number of iterations. The predict method is
used to make predictions with new data. To visualize Figure 3(b) depicts the structure of an ANN model that
has been implemented in Figure 3(a).

(a) (b)

Figure 3. Implement an ANN model on an embedded system: (a) program segments to train and predict the
ANN model and (b) ANN model structure to create

The algorithm for the fit method is presented in Algorithm 1, executed in four steps. Firstly, err is
initialized to 0. Subsequently, each sample of the x_train data is utilized to compute ForwardPropagation to
find the output of the ANN with the current set of weights and biases. Next, the error is calculated relative to
the actual values using the defined loss function. Simultaneously, the rate of change of the loss,
derivative_loss, is computed. Finally, the BackwardPropagation algorithm is executed based on
∂l
derivative_loss (i.e., in (4)), and the learning rate (lr) is used to update the parameters of the network. The
∂θ
optimization algorithm SGD is explicitly implemented as follows:

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Int J Artif Intell ISSN: 2252-8938  2833

weights[i] -= lr * weights_error[i][j];
bias[i] -= lr * output_error[i];

∂l ∂l
In the equations above, weights, bias, weights_error, output_error correspond to W, b, , ;
∂W ∂b
where i, j range from 1 to n. The training algorithm is iterated for epochs times. The va lue of err after each
epoch is collected to assess the success of the training process.

Algorithm 1. Training
fit (x_train, y_train, epochs, lr):
For e  each value of epochs:
err  0
For i  each row of x_train:
output[i]  ForwardPropagation(x_train[i])
err  err + loss(y_train[i], output)
BackwardPropagation (derivative_loss, lr)
End For
err  err / number of samples
End For

The predict method is presented in Algorithm 2. This method calls the ForwardPropagation method
as in (2) to find the output of the trained ANN. This method evaluates the ANN model after training with a
test dataset or new data collected by devices applying the ANN model created by this library.

Algorithm 2. Predict
predict (data):
For i  each row of data:
output[i]  ForwardPropagation(data[i])
End For

2.3. Deployed on micro-controller units

To deploy the ANN model on the selected MCUs presented in Table 2, the Arduino integrated
development environment (Arduino IDE) uploads the code directly to the MCUs. This ensures the flexib le
transferability of the model from a personal computer to the MCUs. Most chosen MCUs come from
manufacturers with diverse CPU architectures, frequencies, and memory characteristics. This can assess the
compatibility of the model with various hardware conditions.

Table 2. MCUs are selected for implementation

Frequency Flash RAM
Name Development Board CPU
(MHz) (MB) (KB)
MCU-1 ESP32-Wroom 32 Dual core Xtensa®32-bitLX6 240 4 520
MCU-2 Arduino nano 33 BLE Single core Arm Cortex-M4F 64 1 256
MCU-3 Sipeed Maix Bit K210 Dual core Kendryte K210 400 16 8192
MCU-4 Raspberry Pi Pico RP2040 Dual core Arm Cortex-M0 133 2 264
PC Macbook M1 2020 Apple M1 3.2(GHz) - 8GB

The three proposed models to be evaluated on both MCUs and one PC are summarized in Table 3.
All these models have one neuron in the input and output layers. Models 1, 2, and 3 have 1, 2, and 3 hidden
layers, respectively, with corresponding parameter counts of 193, 4,353, and 8,513 parameters. These models
will be implemented to evaluate metrics such as training and prediction time for a single input sample. Each
model is trained for 30 epochs with a learning rate 0.001, using ReLu as the activation function and MSE as
the loss function.

Table 3. MCUs are selected for implementation

Name Construct Parameter
Model-1 1×64×1 193
Model-2 1×64×64×1 4,353
Model-3 1×64×64×64×1 8,513

The performance of the ANN models is evaluated using a regression problem with a cubic
polynomial of the form presented in (4). The training data, randomly generated from this polynomial, is
On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)
2834  ISSN: 2252-8938

illustrated in Figure 4. The black dots on the graph represent the data used during training, and the red line
represents the actual values of the polynomial.

𝑔 (𝑥) = 𝑎𝑥 3 + 𝑏𝑥 2 + 𝑐𝑥 + 𝑑 (8)

The function g(x) is defined as a cubic equation with coefficients a, b, c, d corresponding to 1, 2, -3, -4,
respectively.

Figure 4. Data used to evaluate the model

Two datasets were created to evaluate the performance and efficiency of multi-threaded
programming in the training and prediction of ANN models. The specific information about the two datasets
is as follows. Dataset-1 contains 150 data points (x, g(x)), with x ranging from -3 to 3, divided into two sets:
train and test, consisting of 100 and 50 data points, respectively. The MSE of the model is used for evaluation
on both PC and MCU. Dataset-2 consists of 300 data points (x, g(x)), with x ranging from -3 to 3, divided
into two sets: train and test, consisting of 200 and 100 data points, respectively. Particularly for Datas et-2, it
is further divided into four smaller datasets, each containing 50 samples. These smaller datasets are
sequentially used to train the model through Task Train, evaluating the parallel execution capability during
training and prediction on a multi-core MCU.
The models will be trained and tested with single-task programming running on a single-core MCU
or a separate core of a multi-core MCU and multi-task programming using freeRTOS to run on a multi-core
MCU. Figure 5 presents two flowcharts implementing the ANN model on single-core MCUs and multi-core
MCUs. Figure 5(a) illustrates the process of deploying the ANN model using sequential programming. The
program will create the ANN model upon startup using the provided functions. If this is the first training,
weights, and biases will be randomly initialized, and the model will be trained with Dataset -1. Conversely,
the weights are loaded from the system's non-volatile memory if the ANN model has already been taught.
Then, the program will perform the main loop, consisting of two sequential tasks: retraining the ANN and
making predictions. These tasks are executed at different speeds, set by prediction and training time. In this
case, it is evident that when the MCU retrains the system, it cannot predict new data.
To address the challenges, Figure 5(b) illustrates the process of training and predicting algorithms
on a multi-core MCU using multi-task programming with freeRTOS. The training and prediction processes
are implemented as two freeRTOS tasks, each assigned to a separate core of the MCU. FreeRTOS will be
responsible for scheduling these two tasks to run in parallel, ensuring that the training process does not
interrupt the prediction process. Similar to the algorithm in Fig ure 5(a), the program will create the ANN
model using the provided functions after startup. If it is the first training, weights and biases will be randomly
initialized, and the flag wait is set to true to notify that the prediction process must wait until the first training
process is completed. Conversely, suppose the model has already been trained. In that case, the model's
parameters will be loaded from the MCU's non-volatile memory, and the flag wait will be set to false to
allow the task predict to operate. After each completion of the Task Train, the weights and biases will be
saved to the MCU's non-volatile memory; the trained flag will be set to true to allow the task predict to
update the new parameters, and then reset the trained flag to false.

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Int J Artif Intell ISSN: 2252-8938  2835

(a) (b)

Figure 5. Flow chart of implementing ANN model: (a) single-core MCU, and (b) dual-core MCU

3. RESULTS
3.1. Evaluate artificial neural network performance on micro-controller units
The real-time execution time evaluation results for the three models on Dataset -1 are presented in
Tables 4 and 5. In Table 4, MCU-1 and MCU-3 exhibit similar training speeds, significantly faster than
MCU-2 and MCU-4. This difference could be attributed to the lower clock speeds of MCU-2 and MCU-4
compared to MCU-1 and MCU-3. A similar pattern is observed in the prediction speeds of the models in
Table 5. However, the overall training times are relatively long. Even the model with the lowest number of
parameters, Model-1, requires up to 12 seconds to complete training on 100 samples. In contrast, Model-3,
with the highest number of parameters, takes up to 135.52 seconds to finish training on the MCU with the
highest clock speed. Therefore, training ANN models directly on MCUs using sequential programming is not
feasible, making it challenging to ensure real-time performance on the devices. Conversely, the prediction
speeds of the ANN models can range from approximately 1.7 Hz to 23 Hz on high -clock-speed MCUs. This
suggests that the trained models can be applied to MCUs for prediction tasks, bu t retraining them is difficult
due to the lengthy interruption caused by the device's training process.

Table 4. Evaluate training time

T raining time on PC and MCU (s)
Name
Model-1 Model-2 Model-3
MCU-1 12.00 76.64 149.60
MCU-2 27.51 217.64 401.31
MCU-3 16.86 76.96 135.52
MCU-4 13.64 215.62 417.61
PC 0.11 0.85 1.57

Table 5. Evaluate predict time

Inference time on PC and MCU (s)
Name
Model-1 Model-2 Model-3
MCU-1 0.047 0.357 0.660
MCU-2 0.093 1.221 2.247
MCU-3 0.043 0.322 0.594
MCU-4 0.074 1.111 2.137
PC 5e-4 33e-4 58e-4

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

2836  ISSN: 2252-8938

The MSE results between PC and MCUs are presented in Fig ure 6, demonstrating a significant
similarity between the two platforms with only a tiny difference of approximately 0.1. The main reason for
this slight discrepancy is that MCUs support data representation and operations with lower precision than one
PC. Nevertheless, this proves that trained ANN models can be directly deployed on resource -constrained
devices. Therefore, a parallel programming mechanism needs to b e implemented to achieve simultaneous
training and prediction on these MCUs.

Figure 6. Calculate MSE between predict and cubic function

3.2. Dual-core performance on micro-controller units

Figure 7 presents the results of parallel training and prediction on Dataset-2. The results show that,
except for the case of training the ANN models for the first time, training and prediction occur in parallel in
subsequent training sessions. This addresses the issue of interrupting prediction during training in sequential
programming. Dataset-2 is divided into four subsets to perform sequential training, so the model's loss values
gradually decrease over the training sessions, as illustrated in the chart. Indeed, after completing the six
training sessions, the average loss value is in the range of 0.02, and the MSE of prediction is 0.04,
representing a significant improvement compared to previous training sessions. This is particularly suitable
for supervised learning applications on embedded systems based on real-time sensor data.

Figure 7. Experimental results of parallel training and prediction on MCU-1

Figure 8 presents the prediction results of model-1 after five training sessions, where the lines
represent the prediction results of the model with the test dataset of Dataset-2. The results show that, after
each training session, the prediction function gradually approaches the original graph of the cubic function.
The prediction MSE decreases from 0.17 to 0.04 after five training sessions, indicating a 76.5% improvement
in the model-1 MSE. This training process can continue to improve the accuracy of the model further because
the device's prediction is not interrupted by the training process. The parameters will be updated after each
training session, almost without affecting the prediction. In summary, although the training time increases
with the complexity of the model, this drawback has been overcome with the multi-task programming
technique running on multi-core MCUs. This demonstrates that the ANN model library and multi-task
algorithm can be applied to deploy traditional ANN models directly on common MCUs.

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Int J Artif Intell ISSN: 2252-8938  2837

Figure 8. Prediction results on MCU-1

4. DISCUSSION
In this study, we developed a library for deploying ANN models directly on MCUs. The create d
models are trained and predicted on the MCUs. Experimental results show that the prediction time of the
models after training is relatively fast. However, the training time could be longer, making it impractical to
deploy traditional ANN models directly on MCUs due to the lengthy training process, which makes it
challenging to ensure real-time performance. The application of multi-task programming based on freeRTOS
has addressed this drawback. The training process can be iterated to improve the model's ac curacy while the
prediction process continues continuously with the latest parameter set, ensuring the system's functionality.
By applying multi-task programming to enable parallel training and prediction processes, this
research allows the direct implementation of traditional ANN models on MCUs without optimizing
algorithms. This is advantageous because the relevant mathematical operations have been well-established
and thoroughly evaluated. The developed ANN library is highly flexible, allowing easy addit ion and
modification of layers, activation, and loss functions through the library's application programming interface
(API) functions. Moreover, this ANN library has the potential for scalability and customization across
various platforms. Indeed, in addition to the integrated activation and loss functions, new functions can be
easily programmed using their mathematical representations. The library is written in the object -oriented
programming language C++ and focused on core features, making it easy to transition to other programming
languages or platforms in the future. It can also be customized to be deployed on various multi-core MCU
models. Compared to the Train++ algorithm [20], both studies support multiple MCUs. However, this
research demonstrates greater flexibility in implementing various ANN models with different structures,
whereas Train++ does not support hidden layers.
Despite the advantages, the ANN library in this study also has three main limitations . First,
optimization algorithms have yet to be applied to accelerate processing speed, posing challenges when
applied to applications demanding high computational speeds and making it difficult to implement
complex-scaled models due to the limited memory of MCUs. Second, the level of multitasking could be
higher, utilizing only two main parallel tasks and not fully exploiting the computa tional capabilities of
multi-core MCUs. Task partitioning for training and prediction should be prioritized and concentrated on in
the future. Finally, the library is currently limited to ANNs, and integrating data storage capabilities may
prove challenging when applied to supervised learning-based classification applications.
In reality, the ANN model cannot only handle regression problems but can also be extended to
perform various tasks, such as classification and anomaly detection. However, implementin g anomaly
detection based on unsupervised learning methods would be more feasible, and storing labeled data on the
MCU is complex. This research can be highly applicable in real-world scenarios, especially in IoT
applications where devices are being explored to integrate self-learning, analysis, and decision-making
capabilities locally instead of relying on an AI network operating on the system's cloud server.

5. CONCLUSION
In this study, a library has been developed to deploy ANN models directly on multi-core MCUs.
The models, once created, are trained and predicted on the MCUs. Experimental results demonstrate that the
prediction time for the models after training is relatively fast. In contrast, the shortest training time takes up
to 12 seconds for the mos t complex model. However, this issue has been addressed by multitasking using

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

2838  ISSN: 2252-8938

freeRTOS on multi-core MCUs, allowing the training and prediction processes to occur concurrently without
interference. Furthermore, the training process can continue to improve the accuracy of the ANN model
when additional relevant data is collected. This research can potentially integrate ANNs into embedded
devices, especially in the IoT domain. For example, when estimating irrigation needs in agriculture based on
soil moisture, soil temperature, and air temperature to determine the amount of irrigation water, the edge
device will be retrained regularly to adapt to climate conditions.

REFERENCES
[1] R. S. -Iborra and A. F. Skarmeta, “T inyML-enabled frugal smart objects: challenges and opportunities,” IEEE Circuits and
Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020, doi: 10.1109/MCAS.2020.3005467.
[2] Y. Abadade, A. T emouden, H. Bamoumen, N. Benamar, Y. Chtouki, and A. S. Hafid, “A Comprehensive survey on T inyML,”
IEEE Access, vol. 11, pp. 96892–96922, 2023, doi: 10.1109/ACCESS.2023.3294111.
[3] Y. Y. Siang, M. R. Ahamd, M. Shafinaz, and Z. Abidin, “Anomaly detection based on tiny machine learning: A review,” Open
International Journal of Informatics, vol. 9, no. Special Issue 2, pp. 67–78, 2021.
[4] B. Sun, S. Bayes, A. M. Abotaleb, and M. Hassan, “T he case for tinyML in healthcare: CNNs for real -time on-edge blood
pressure estimation,” in Proceedings of the ACM Symposium on Applied Computing, in SAC ’23. ACM, 2023, pp. 629–638, doi:
10.1145/3555776.3577747.
[5] D. L. Dutta and S. Bharali, “TinyML Meets IoT: a comprehensive survey,” Internet of Things (Netherlands), vol. 16, 2021, doi:
10.1016/j.iot.2021.100461.
[6] N. Schizas, A. Karras, C. Karras, and S. Sioutas, “TinyML for ultra-low power AI and large scale IoT deployments: a systematic
review,” Future Internet, vol. 14, no. 12, 2022, doi: 10.3390/fi14120363.
[7] P. Warden and D. Situnayake, TinyML: machine learning with TensorFlow Lite on Arduino and ultra-low-power
microcontrollers, Sebastopol, USA: O'Reilly Media, Inc, 2019.
[8] J. Lee and H.-J. Yoo, “An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,” IEEE
Open Journal of the Solid-State Circuits Society, vol. 1, pp. 115–128, 2021, doi: 10.1109/ojsscs.2021.3119554.
[9] O. D. Incel and S. O. Bursa, “On-device deep learning for mobile and wearable sensing applications: A review,” IEEE Sensors
Journal, vol. 23, no. 6, pp. 5501–5512, 2023, doi: 10.1109/JSEN.2023.3240854.
[10] D. Nadalini, M. Rusci, L. Benini, and F. Conti, “Reduced precision floating-point optimization for deep neural network on-device
learning on microcontrollers,” Future Generation Computer Systems, vol. 149, pp. 212–226, 2023, doi:
10.1016/j.future.2023.07.020.
[11] L. Wu, J. Liu, S. Vazquez, and S. K. Mazumder, “Sliding mode control in power converters and drives: a review,” IEEE/CAA
Journal of Automatica Sinica, vol. 9, no. 3, pp. 392–406, 2022, doi: 10.1109/JAS.2021.1004380.
[12] A. N. Mazumder et al., “A survey on the optimization of neural network accelerators for micro -AI on-device inference,” IEEE
Journal on Emerging and Selected Topics in Circuits and Systems, vol. 11, no. 4, pp. 532–547, 2021, doi:
10.1109/JET CAS.2021.3129415.
[13] J. Lin, L. Zhu, W. M. Chen, W. C. Wang, C. Gan, and S. Han, “On -device training under 256KB memory,” Advances in Neural
Information Processing Systems, vol. 35, 2022.
[14] M. Chowdhary and S. S. Saha, “On-sensor online learning and classification under 8 KB memory,” in 2023 26th International
Conference on Information Fusion, FUSION 2023 , IEEE, 2023, pp. 1-8, doi: 10.23919/FUSION52260.2023.10224228.
[15] E. L. Lamie, “Real-time embedded multithreading using threadX and MIPS,” Real-Time Embedded Multithreading Using
ThreadX and MIPS. CRC Press, pp. 1–465, Apr. 2019, doi: 10.1201/9780429187858.
[16] R. V. Aroca and G. Caurin, “A real time operating systems (RTOS) comparison,” WSO - Workshop de Sistemas Operacionais,
vol. 12, pp. 2441-2452, 2009.
[17] A. T homas, “Enclaves in real-time operating systems,” M.Sc. Dissertation, Department of Electrical Engineering and Computer
Science, University of California, Berkeley, USA. 2021.
[18] M. Z. H. Zim, “T inyML: analysis of xtensa LX6 microprocessor for neural network applications by ESP32 SoC,” ArXiv-
Computer Science, pp. 1-6, 2021, doi: 10.13140/RG.2.2.28602.11204.
[19] V. K. Nguyen, V. K. T ran, H. Pham, V. M. Nguyen, H. D. Nguyen, and C. N. Nguyen, “A multi-microcontroller-based hardware
for deploying T iny machine learning model,” International Journal of Electrical and Computer Engineering, vol. 13, no. 5, pp.
5727–5736, 2023, doi: 10.11591/ijece.v13i5.pp5727 -5736.
[20] B. Sudharsan, P. Yadav, J. G. Breslin, and M. Intizar Ali, “Train++: an incremental ML model training algorithm to create self-
learning IoT dDevices,” in Proceedings - 2021 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted
Computing, Scalable Computing and Communications, Internet of People, and Smart Ci ty Innovations,
SmartWorld/ScalCom/UIC/ATC/IoP/SCI 2021, IEEE, 2021, pp. 97–106, doi: 10.1109/SWC50871.2021.00023.
[21] M. Craighero, D. Quarantiello, B. Rossi, D. Carrera, P. Fragneto, and G. Boracchi, “On -device personalization for human activity
recognition on ST M32,” IEEE Embedded Systems Letters, vol. 16, no. 2, pp. 106-109, June 2024, doi:
10.1109/LES.2023.3293458.
[22] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. E. Mohamed, and H. Arshad, “State -of-the-art in artificial neural
network applications: A survey,” Heliyon, vol. 4, no. 11, Nov. 2018, doi: 10.1016/j.heliyon.2018.e00938.
[23] A. D. Jagtap and G. E. Karniadakis, “How important are activation functions in regression and classification? a survey,
performance comparison, and future directions,” Journal of Machine Learning for Modeling and Computing, vol. 4, no. 1, pp. 21–
75, 2023, doi: 10.1615/jmachlearnmodelcomput.2023047367.
[24] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade, Berlin, Heidelberg: Springer, 2012, pp.
421–436, doi: 10.1007/978-3-642-35289-8_25.
[25] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE,
MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, pp. 1–24, 2021, doi:
10.7717/PEERJ-CS.623.
[26] Y. Ho and S. Wookey, “The real-world-weight cross-entropy loss function: modeling the costs of mislabeling,” IEEE Access, vol.
8, pp. 4806–4813, 2020, doi: 10.1109/ACCESS.2019.29 62617.

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Int J Artif Intell ISSN: 2252-8938  2839

BIOGRAPHIES OF AUTHORS

Bao-Toan Thai is a B.S. degree student in Automation and Control Engineering

of the Faculty of Automation Technology, College of Engineering, Can Tho University,
Vietnam. He can be contacted at email: [email protected].

Vy-Khang Tran is a M Sc degree student in Automation and Control

Engineering of the Faculty of Automation Technology, College of Engineering, Can Tho
University, Vietnam. His research interests focus on embedded systems and AIoT - and IoT-
based applications in environmental and agricultural control. He can be contacted at email:
[email protected].

Hai Pham received his master’s degree from University of South Australia
(UniSA) in 2010. Since 2012, he has been a lecturer at Faculty of Automation Technology,
College of Engineering, Can Tho University. His research interests focus on Bistatic LIDAR
system for gas measurement in environmental and agricultural applications. He can be
contacted at email: [email protected].

Chi-Ngon Nguyen received B.S. and M .S. degrees in Electronic Engineering

from Can Tho Universityand the National University, Ho Chi M inh City University of
Technology, Vietnam, in 1996 and 2001, respectively. The degree of Ph.D. in Control
Engineering was awarded by the University of Rostock, Germany, in 2007. Since 1996, he
has worked at the Can Tho University. He is an associate professor in automation at Faculty
of Automation Technology, and former dean of the College of Engineering at the Can Tho
University. Currently, he is a Vice Chairman of the Board of Trustee of Can Tho University.
His research interests are intelligent control, medical control, pattern recognition,
classifications, speech recognition, computer vision and agricultural automation. He can be
contacted at email: [email protected].

Van-Khanh Nguyen received his master’s degree from Ho Chi M inh University
of Technology, Vietnam, in 2014 and his Doctor of Engineering degree from Tokyo
University of M arine Science and Technology, Japan, in 2020. Since 2007, he has been a
lecturer at the Faculty of Automation Technology, College of Engineering, Can Tho
University. Currently, he is the head of PLC Technology and Industrial IoT Lab. His
research interests concentrate on embedded systems and AIoT- and IoT-based applications in
environmental and agricultural control, electrocardiogram (ECG) real-time classification, and
anomaly detection. He can be contacted at email: [email protected].

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

Slides
No ratings yet
Slides
249 pages
PHD Thesis Fouad Sakr
100% (1)
PHD Thesis Fouad Sakr
170 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
Deep Learning
No ratings yet
Deep Learning
28 pages
Model Acceleration For Efficient Deep Learning Computing
No ratings yet
Model Acceleration For Efficient Deep Learning Computing
92 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Tensorflow Lite Micro
No ratings yet
Tensorflow Lite Micro
12 pages
tinyML Talks Massimo Banzi 210420
No ratings yet
tinyML Talks Massimo Banzi 210420
68 pages
Tiny Machine Learning
No ratings yet
Tiny Machine Learning
33 pages
Lecture 3a - Deployment To Microcontroller
No ratings yet
Lecture 3a - Deployment To Microcontroller
41 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Ai HW1
No ratings yet
Ai HW1
25 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
Naumetc Daniil
No ratings yet
Naumetc Daniil
44 pages
LBDL
No ratings yet
LBDL
185 pages
On-Device Training of Artificial Intelligence Mode
No ratings yet
On-Device Training of Artificial Intelligence Mode
11 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
LBDL
No ratings yet
LBDL
143 pages
The - Little - Book - of - Deep Learning
No ratings yet
The - Little - Book - of - Deep Learning
140 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
aDSA SuperComp4Trng DNN
No ratings yet
aDSA SuperComp4Trng DNN
12 pages
Fin Irjmets1684902949
No ratings yet
Fin Irjmets1684902949
6 pages
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
No ratings yet
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
13 pages
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
No ratings yet
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
13 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
18 pages
Quantization and Deployment Od DNN On Microcontroller
No ratings yet
Quantization and Deployment Od DNN On Microcontroller
34 pages
Full Text 01
No ratings yet
Full Text 01
85 pages
Deep Learning Models (Basic)
No ratings yet
Deep Learning Models (Basic)
35 pages
TensorFlow Lite Micro Embedded Machine L
No ratings yet
TensorFlow Lite Micro Embedded Machine L
13 pages
Tiny Machine Learning
No ratings yet
Tiny Machine Learning
39 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
No ratings yet
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
8 pages
HW 5
No ratings yet
HW 5
10 pages
The Ontologrcal Expressiveness Information Systems Analysis Design Grammars
No ratings yet
The Ontologrcal Expressiveness Information Systems Analysis Design Grammars
21 pages
DL Record
No ratings yet
DL Record
11 pages
On-Device Training Under 256KB Memory: Indicates Equal Contributions
No ratings yet
On-Device Training Under 256KB Memory: Indicates Equal Contributions
18 pages
Debashis 006
No ratings yet
Debashis 006
16 pages
Artificial Neural Network Design For Compact Modeling of Generic
No ratings yet
Artificial Neural Network Design For Compact Modeling of Generic
8 pages
Lecture 2 - Hello World in ML
No ratings yet
Lecture 2 - Hello World in ML
49 pages
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
No ratings yet
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
15 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
Neural Networks Embed D
No ratings yet
Neural Networks Embed D
6 pages
U-Net For Wheel Rim Contour Detection in Robotic Deburring
No ratings yet
U-Net For Wheel Rim Contour Detection in Robotic Deburring
14 pages
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
No ratings yet
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
11 pages
Hybrid Model Detection and Classification of Lung Cancer
No ratings yet
Hybrid Model Detection and Classification of Lung Cancer
11 pages
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
No ratings yet
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
12 pages
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
No ratings yet
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
10 pages
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
No ratings yet
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
11 pages
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
No ratings yet
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
11 pages
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
No ratings yet
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
12 pages
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
No ratings yet
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
9 pages
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
No ratings yet
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
10 pages
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
No ratings yet
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
8 pages
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
No ratings yet
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
11 pages
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
No ratings yet
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
13 pages
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
No ratings yet
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
9 pages
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
No ratings yet
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
9 pages
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
No ratings yet
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
11 pages
Graph-Based Methods For Transaction Databases: A Comparative Study
No ratings yet
Graph-Based Methods For Transaction Databases: A Comparative Study
10 pages
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
No ratings yet
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
10 pages
Mi-90 1
No ratings yet
Mi-90 1
24 pages
Applsci 15 00688 v3
No ratings yet
Applsci 15 00688 v3
21 pages
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
No ratings yet
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
9 pages
A Comparative Analysis of Exponential Smoothing Method and Deep Learning Models For Bitcoin Price Prediction
No ratings yet
A Comparative Analysis of Exponential Smoothing Method and Deep Learning Models For Bitcoin Price Prediction
9 pages
Ann On Fpga Ieee
No ratings yet
Ann On Fpga Ieee
6 pages
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
No ratings yet
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
7 pages
Axial Piston Variable Pump A4VG Series 32: Europe
No ratings yet
Axial Piston Variable Pump A4VG Series 32: Europe
94 pages
Ug4 Proj
No ratings yet
Ug4 Proj
44 pages
AIfES A Next-Generation Edge AI Framework
No ratings yet
AIfES A Next-Generation Edge AI Framework
16 pages
Service Manual: Ic-F14 Ic-F14s Ic-F15 Ic-F15s
No ratings yet
Service Manual: Ic-F14 Ic-F14s Ic-F15 Ic-F15s
32 pages
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
No ratings yet
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
10 pages
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
No ratings yet
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
9 pages
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
No ratings yet
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
9 pages
Image Classification On Resource-Constrained Microcontrollers
No ratings yet
Image Classification On Resource-Constrained Microcontrollers
3 pages
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
No ratings yet
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
8 pages
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
No ratings yet
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
8 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
Implementations of Learning Control Systems Using Neural Networks
No ratings yet
Implementations of Learning Control Systems Using Neural Networks
9 pages
DL Practical 02 Binary Class Classifier Using ANN
No ratings yet
DL Practical 02 Binary Class Classifier Using ANN
5 pages
Control System Term Paper
No ratings yet
Control System Term Paper
12 pages
Industrial Ro Plant 1000 LPH PDF
No ratings yet
Industrial Ro Plant 1000 LPH PDF
7 pages
Assignment 2 - Neural Network Fundamentals
No ratings yet
Assignment 2 - Neural Network Fundamentals
7 pages
Aces Review Center: Ree Online Review Refresher Esas 7B by Engr. Jimmy L. Ocampo 0920 - 644 - 6246
No ratings yet
Aces Review Center: Ree Online Review Refresher Esas 7B by Engr. Jimmy L. Ocampo 0920 - 644 - 6246
5 pages
Vlsi Module-3
No ratings yet
Vlsi Module-3
129 pages
Documento Completo
No ratings yet
Documento Completo
5 pages
PCI PTS POI - SRED v4.x
No ratings yet
PCI PTS POI - SRED v4.x
51 pages
ME990-IH-Section 2a - LongBoltFlangeDesignProblems
No ratings yet
ME990-IH-Section 2a - LongBoltFlangeDesignProblems
15 pages
Cree XLamp LM-80 - Results
No ratings yet
Cree XLamp LM-80 - Results
173 pages
Data Migration in Fiori
No ratings yet
Data Migration in Fiori
22 pages
Hira For Cement Mill
No ratings yet
Hira For Cement Mill
6 pages
FD Pro 8.1 Admin Guide
No ratings yet
FD Pro 8.1 Admin Guide
22 pages
Digitalization and The Future of Work in The Financial Services
No ratings yet
Digitalization and The Future of Work in The Financial Services
53 pages
UGRD-EnG6204 Computer Aided Drafting Midterm Quiz 1
No ratings yet
UGRD-EnG6204 Computer Aided Drafting Midterm Quiz 1
11 pages
Image Classification Using Small Convolutional Neural Network
No ratings yet
Image Classification Using Small Convolutional Neural Network
5 pages
Project 1 - ANN With Backprop
No ratings yet
Project 1 - ANN With Backprop
3 pages
BS en Iso 28927-5-2009 PDF
No ratings yet
BS en Iso 28927-5-2009 PDF
32 pages
9822/0310 F625-5-1 459/10247 INPUT CLUTCH ASSEMBLY: Marcial Militante
100% (1)
9822/0310 F625-5-1 459/10247 INPUT CLUTCH ASSEMBLY: Marcial Militante
3 pages
Information Sciences: Le Zhang, P.N. Suganthan
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
3 pages
Log Book Week 1 Week 2
No ratings yet
Log Book Week 1 Week 2
12 pages
Prepare, Sterilize and Dispense Culture Media
No ratings yet
Prepare, Sterilize and Dispense Culture Media
24 pages
Bib Sepport System
No ratings yet
Bib Sepport System
17 pages
Hussein Abdullahi Elmi: Personal Profile
No ratings yet
Hussein Abdullahi Elmi: Personal Profile
3 pages
WP99-UPC RI - Expense Claim Form - Rediansyah - Maret 2024-3
No ratings yet
WP99-UPC RI - Expense Claim Form - Rediansyah - Maret 2024-3
11 pages
TLWA Assignment-1 - 03-09-2024
No ratings yet
TLWA Assignment-1 - 03-09-2024
2 pages
Log
No ratings yet
Log
9 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Muhammad Naseem Electrical Supervisor CV
No ratings yet
Muhammad Naseem Electrical Supervisor CV
3 pages
43 MLD STP at Valak Bl2 Status Report
No ratings yet
43 MLD STP at Valak Bl2 Status Report
2 pages
Test48 - Google Search
No ratings yet
Test48 - Google Search
3 pages
ZYAROCK Artec Pot Leaflet (En)
No ratings yet
ZYAROCK Artec Pot Leaflet (En)
2 pages
Panel Options LCD Samsung PDF
No ratings yet
Panel Options LCD Samsung PDF
11 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet

On-Device Training of Artificial Intelligence Models On Microcontrollers

Uploaded by

On-Device Training of Artificial Intelligence Models On Microcontrollers

Uploaded by

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 3, September 2024, pp. 2829~2839

On-device training of artificial intelligence models on

Article Info ABSTRACT

Journal homepage: http://ijai.iaescore.com

Figure 1. Overal neural networks

2.1. Related mathematics

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Figure 2. Forward propagation for each layer

The output of each layer is calculated according to (1):

𝐵𝐶𝐸 (𝑦, ŷ ) = −[𝑦 log(𝑦̂) + (1 − 𝑦) log(1 − 𝑦̂) ] (6)

𝐶𝐸 (𝑦, ŷ ) = −𝑦 log(𝑦̂) (7)

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

2.2. Programming artificial neural network library on micro-controller units

Table 1. Classes and methods of the ANN model

An descriptive understanding of the ANN model implementation method on an embedded device is

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

2.3. Deployed on micro-controller units

Table 2. MCUs are selected for implementation

Table 3. MCUs are selected for implementation

Figure 4. Data used to evaluate the model

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Table 4. Evaluate training time

Table 5. Evaluate predict time

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

Figure 6. Calculate MSE between predict and cubic function

3.2. Dual-core performance on micro-controller units

Figure 7. Experimental results of parallel training and prediction on MCU-1

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Figure 8. Prediction results on MCU-1

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

Int J Artif Intell, Vol. 13, No. 3, September 2024: 2829-2839

Bao-Toan Thai is a B.S. degree student in Automation and Control Engineering

Vy-Khang Tran is a M Sc degree student in Automation and Control

Chi-Ngon Nguyen received B.S. and M .S. degrees in Electronic Engineering

On-device training of artificial intelligence models on microcontrollers (Bao-Toan Thai)

You might also like