0% found this document useful (0 votes)
16 views5 pages

The Fusion of Deep Reinforcement Learning and Edge Computing For Real-Time Monitoring and Control Optimization in Iot Environments

This paper presents an optimization control system that integrates deep reinforcement learning with edge computing to enhance real-time monitoring and control in industrial IoT environments. The proposed system reduces communication latency, improves responsiveness, and optimizes resource allocation, achieving better control quality and cost-effectiveness compared to traditional cloud-based systems. Experimental results demonstrate significant improvements in control stability and operational efficiency, particularly in applications such as industrial boiler monitoring.

Uploaded by

reyadarefin888
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

The Fusion of Deep Reinforcement Learning and Edge Computing For Real-Time Monitoring and Control Optimization in Iot Environments

This paper presents an optimization control system that integrates deep reinforcement learning with edge computing to enhance real-time monitoring and control in industrial IoT environments. The proposed system reduces communication latency, improves responsiveness, and optimizes resource allocation, achieving better control quality and cost-effectiveness compared to traditional cloud-based systems. Experimental results demonstrate significant improvements in control stability and operational efficiency, particularly in applications such as industrial boiler monitoring.

Uploaded by

reyadarefin888
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

The Fusion of Deep Reinforcement

Learning and Edge Computing for


Real-time Monitoring and Control
Optimization in IoT Environments
Jingyu Xu1,*,Weixiang Wan 2,a,Linying Pan3,b,Wenjian Sun4,c,Yuxiang Liu5,d

1
Northern Arizona University,1900 S Knoles Dr, Flagstaff, Arizona,USA

2
University of Electronic Science and Technology of China,ChengDu,China

3
Trine university,Phoenix, Arizona, USA

4
Yantai University,Tokyo,Japan

5
Northwestern University,Atlanta, Georgia,USA

*
[email protected]

a
[email protected]

b
[email protected]

c
[email protected]

d
[email protected]

Abstract: In response to the demand for real-time Keywords: Deep reinforcement learning; Edge computing;
performance and control quality in industrial Internet of Industrial Internet of Things; Lightweight policy network;
Things (IoT) environments, this paper proposes an Dynamic resource allocation
optimization control system based on deep reinforcement
I. INTRODUCTION
learning and edge computing. The system leverages cloud-edge
collaboration, deploys lightweight policy networks at the edge, With the rapid development of the industrial Internet of
predicts system states, and outputs controls at a high Things, there is a growing demand for real-time monitoring
frequency, enabling monitoring and optimization of industrial and control of systems. However, relying on cloud
objectives. Additionally, a dynamic resource allocation computing centers for computation and decision-making
mechanism is designed to ensure rational scheduling of edge often fails to meet the constraints of real-time
computing resources, achieving global optimization. Results responsiveness [1]. In this regard, this study proposes a
demonstrate that this approach reduces cloud-edge novel industrial system control architecture that actively
communication latency, accelerates response to abnormal senses the environment and makes rapid decisions through
situations, reduces system failure rates, extends average the organic combination of deep reinforcement learning and
equipment operating time, and saves costs for manual edge computing [2]. This approach deploys lightweight
maintenance and replacement. This ensures real-time and policy networks at the network edge to predict and control
stable control. local states at a high frequency. Simultaneously, multiple
edge nodes collaborate with the cloud center, enhancing
control real-time performance at the edge while the cloud Reinforcement Learning module is responsible for learning
center tracks strategies and performs global optimization. and optimizing system control strategies, primarily divided
This paper provides a detailed construction of the system's into modeling unit, policy network, and decision unit [9].
overall architecture, functional modules, and designs a Among them, the modeling unit constructs an environment
lightweight policy network structure and dynamic resource model to predict system states; the policy network represents
allocation mechanism for edge computing. Experimental and approximates control strategies using neural networks,
validation demonstrates the effectiveness of this approach, and the decision unit provides control decisions based on the
significantly reducing control latency and improving control network outputs. The Edge Computing module primarily
quality and cost-effectiveness compared to architectures offers local data storage, preprocessing, caching, and other
relying solely on the cloud center. functions to assist in the training of the Deep Reinforcement
Learning module. Additionally, it includes a task distributor
II. OPTIMIZATION CONTROL SYSTEM BASED ON DEEP
that dynamically allocates edge computing tasks and a data
REINFORCEMENT LEARNING AND EDGE COMPUTING
collector that aggregates data from edge nodes.
A. Overall System Architecture
This system employs a layered architecture comprising a
sensor acquisition layer, edge computing layer, and cloud
computing layer[7]. The sensor layer collects environmental
and system data like temperature and machine status. This
data is sent to the edge computing layer, where edge servers
perform real-time analysis and local decision-making. Deep
reinforcement learning models in this layer predict and
control system behavior, creating a digital twin to forecast
and optimize operations. The cloud computing layer
Fig. 2. System Functional Modules
oversees the entire system, providing powerful computing
and storage to refine control strategies and system logic. The III. KEY TECHNOLOGIES AND ALGORITHMS
architecture is service-oriented and modular, with
A. Lightweight Deep Reinforcement Learning for Edge
components like data acquisition, storage, deep learning,
control, and scheduling modules connected via a message To accommodate the limited computational and storage
bus. This design enhances flexibility and scalability, resources of edge computing nodes, the system employs a
allowing the system to automate and intelligently control customized, lightweight deep reinforcement learning
operations, adapt to various scenarios, and efficiently algorithm. This algorithm uses simpler network structures,
manage complex tasks[8]. such as a three-layer perceptron, instead of complex deep
convolutional neural networks, reducing the number of
parameters and the model's space footprint. The experience
replay buffer size is also limited to around 5000 transition
samples to manage capacity. During training, the batch size
is set to 16, matching the parallel computing capabilities of
edge servers.The lightweight model contains about 1 million
parameters and occupies less than 400MB, making it
suitable for deployment on less powerful edge computing
Fig. 1. Overall System Architecture nodes. It can deliver near real-time control strategy outputs,
even on inexpensive hardware. In practical applications like
B. System Functional Modules
monitoring machine tools in smart manufacturing, the model
The functional modules of this system mainly consist of can re-plan and issue control strategy instructions every 5
the Deep Reinforcement Learning module and the Edge seconds. This optimizes the machine's dynamic performance
Computing module, as shown in Figure 2. The Deep and extends its lifespan.Compared to traditional cloud-based
models, which may have an average delay of over 1 minute IV. EXPERIMENT AND RESULTS ANALYSIS
in computing and issuing control commands, this system
A. Experimental Environment and Dataset
significantly reduces control loop latency, demonstrating
The experimental environment for this research is based
more efficient and responsive control in real-time scenarios.
on the TensorFlow framework and utilizes an NVIDIA Tesla
B. Dynamic Collaborative Distributed Optimization V100 GPU for constructing deep neural networks, training,
Algorithm and testing. To thoroughly validate the effectiveness of the
To achieve global optimal control through cloud-edge proposed method, the experiments use an open-source IoT
collaboration, this system designs a dynamic optimization simulation environment, IoTSim, as the dataset. IoTSim
allocation distributed algorithm. This algorithm is includes data from sensors, edge layer resource
coordinated by the cloud-side master node, which can configurations, and network parameters. The dataset
request or cancel edge server's computational resources encompasses readings from various heterogeneous sensors,
on-demand and maintain a list of available resources while such as temperature, humidity, voltage, etc., spanning a
monitoring the load on each edge server. Based on the month with a sampling frequency of one sample per minute.
current system state and control requirements, it runs an Considering the real-time control requirements of the system,
environmental monitoring program that dynamically selects the research sets up an environment where the system
a group of edge servers with the most optimal combined reports its state and outputs control commands every 5
indicators, such as bandwidth and computing capacity. It seconds, and the dataset is resampled accordingly.
also allocates critical control modules with higher
B. Model Hyperparameter Settings
computational intensity to servers with stronger computing
In deep reinforcement learning, the choice of
power. By aggregating intermediate states and control
hyperparameters plays a crucial role in the model's
results from various edge nodes, it collaboratively optimizes
performance and convergence speed. Hyperparameters are
and obtains a global control strategy. The optimization
those parameters that need to be manually set when training
algorithm can be expressed using the following formula:
deep reinforcement learning models, and they can influence

Maximize ∑&"$% ∑# the training process and the model's final performance.
!$% A(c! , r" ) (1)
These hyperparameters are listed in Table 1.
Where ri represents the resources of the ith server, and cj
represents the jth control module. TABLE I. MODEL HYPERPARAMETERS

It must satisfy the following conditions:


Hyperparameter Initial Setting Impact

A larger experience
∑)'$% 𝑥'( ≤ 1 ∀𝑗 (2)
pool provides more
Experience Pool Size 5000
Each control module can be assigned to at most one training data, aiding in
resource. model convergence.

Adjustment of the
Server resource and load constraints:
learning rate can

𝑙' + ∑*
'$% 𝑥'( ∙ 𝐿𝑜𝑎𝑑(𝑐( ) ≤ 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦(𝑟' ) ∀𝑖 ( 3 ) Learning Rate 0.01 affect the model's
convergence speed
Where xij is a binary decision variable, indicating
and stability.
whether control module cj is assigned to resource ri (1 if
A smaller γ value
assigned, 0 otherwise). Load(cj) represents the
focuses more on
computational density of control module cj, and Capacity(ri)
short-term rewards,
is the maximum capacity of resource ri. Discount Factor (γ) 0.95
while a larger γ value
emphasizes long-term
rewards.
The choice of update cloud-edge collaborative deep reinforcement learning
frequency (τ) can framework greatly improved real-time control effectiveness
Target Network Update
100 impact the model's and quality. Future enhancements aim to increase the
Frequency (τ)
stability and learning adaptability of edge agents.
speed.

C. Evaluation Metric Setup


This system consists of three layers: the sensor
acquisition layer, the edge computing layer, and the cloud
computing layer. At the base, the sensor acquisition layer
gathers environmental data like temperature, humidity, and
machine status. This data is sent to the edge computing layer
for real-time analysis and local decision-making. Here, deep Fig. 3. shows the changing trends of experimental results over time or

reinforcement learning models predict system behavior and iterations.

create control strategies for closed-loop control. These


V. CASE STUDY
models learn operational patterns, establish digital twins, and
predict future states to devise optimal strategies, enabling A. Scenario Modeling
quick response and self-adjustment for automation and To assess the practical effectiveness of the method, an
intelligence. The top layer, cloud computing, oversees the application scenario for industrial state monitoring and fault
system, fine-tuning strategies and optimizing control logic prediction was constructed. The scenario involves
with its superior computing and storage capabilities. The monitoring the operation of an industrial boiler. Data sources
architecture is service-oriented and modular, with units like include boiler inlet and outlet temperatures, water level, and
data acquisition, storage, learning models, control operating pressure signals. For the actuators such as water
optimization, and task scheduling. These components are pumps and valves, corresponding states were also set, with
decoupled and communicate via a message bus, enhancing state transitions based on control instructions from the deep
flexibility and scalability. The system's deep learning reinforcement learning model. The entire boiler operation
capabilities, combined with its hierarchical, service-oriented process constitutes a complex state mechanism, requiring
design, make it efficient and adaptable to various complex precise control of parameters like water quantity and
scenarios. temperature to maximize efficiency while avoiding accidents.
D. Experimental Results and Analysis Using the dataset from this application scenario, an
environment dynamics model was built, and a deep
Experimental results show that using distributed deep
reinforcement learning controller was trained. To conserve
reinforcement learning with edge computing significantly
edge computing resources, this research employed a
reduces communication time between cloud and edge,
two-layer fully connected neural network as the policy
lowering control latency. Traditional cloud-centered control
function approximator. The state space includes current
had a delay of up to 1.5 seconds, slowing response to sudden
process parameters, information from the last 10 observed
changes. By deploying deep reinforcement learning agents at
states, rewards, and more. The deep reinforcement learning
the edge, this delay dropped to about 0.3 seconds, meeting
model outputs deterministic control actions that can be
real-time control needs. This approach also improved
directly applied to actuators, enabling monitoring and
resource utilization, with CPU usage at the edge layer
optimization of the boiler's operational status, as shown in
increasing from 53% to 67%. The system achieved a higher
Figure 4.
cumulative reward (890 points) compared to cloud-only
systems (750 points), indicating better performance. Control
stability improved with a 22% reduction in control loss, and
action accuracy reached 88%, showing enhanced
responsiveness to environmental changes. Overall, the
timely control policy updates and flexible resource
allocation. Experimental results have shown that this system
can reduce control loop latency and enhance responsiveness
to sudden state changes. When applied to an industrial boiler
control scenario, the method outperforms rule-based control
by increasing operational rewards, reducing failure
Fig. 4. Scenario Modeling probabilities, extending the fault-free running time, and
lowering manual intervention and maintenance costs. The
B. Performance Evaluation approach designed in this research ensures control quality
In a month-long experiment comparing traditional PID while improving the real-time nature of control and
control to a deep reinforcement learning (DRL) approach for decision-making. Future work will involve validating the
boiler operation, the DRL method showed significant method's effectiveness in more complex industrial
improvements. It scored an average reward of 3820 points environments.
over the month, 36% higher than the 2810 points achieved
REFERENCES
by the PID method. The DRL algorithm's reward curve
[1]Zhou P , Chen X , Liu Z ,et al.DRLE: Decentralized
stabilized over time, unlike the PID's fluctuating curve. Reinforcement Learning at the Edge for Traffic Light
Control in the IoV[J].IEEE, 2021(4).
Notably, the DRL controller, using predictive models, [2]Celtek S A , Durdu A .A Novel Adaptive Traffic Signal
greatly reduced water and temperature anomalies, leading to Control Based on Cloud/Fog/Edge
Computing[J].International journal of intelligent
a 29% decrease in boiler system failures and extending transportation systems research, 2022.
uninterrupted operation by 15 days. This also resulted in [3]Elgendy I A , Muthanna A , Hammoudeh M ,et
al.Advanced Deep Learning for Resource Allocation
lower costs for manual maintenance and parts replacement. and Security Aware Data Offloading in Industrial
Mobile Edge Computing[J].Big Data, 2021.
Overall, from reward, stability, and economic perspectives, [4]Mlika Z , Cherkaoui S .Network Slicing with MEC and
the DRL method excelled in boiler state control and Deep Reinforcement Learning for the Internet of
Vehicles[J]. 2022.
optimization. [5]Laroui M , Khedher H , Moussa A C ,et al.SO¬MEC:
Service Offloading in Virtual Mobile Edge Computing
Using Deep Reinforcement Learning[J].Transactions
on Emerging Telecommunications Technologies, 2021.
[6]F. An, B. Zhao, B. Cui and R. Bai, "Multi-Functional DC
Collector for Future All-DC Offshore Wind Power
System: Concept, Scheme, and Implement," in IEEE
Transactions on Industrial Electronics, 2022.
[7]F. An, B. Zhao, B. Cui and Y. Chen, "Selective Virtual
Synthetic Vector Embedding for Full-Range Current
Harmonic Suppression of the DC Collector," in IEEE
Transactions on Power Electronics.
Fig. 5. Average Rewards Over 20 Days Under the PID Method [8]Chang Che, Bo Liu, Shulin Li, Jiaxin Huang, and Hao
Hu. Deep learning for precise robot position prediction
in logistics. Journal of Theory and Practice of
VI. CONCLUSION Engineering Science, 3(10):36–41, 2023.
[9]Hao Hu, Shulin Li, Jiaxin Huang, Bo Liu, and Change
This research has introduced an intelligent monitoring Che. Casting product image data for quality inspection
with xception and data augmentation. Journal of
and optimization method for industrial systems based on Theory and Practice of Engineering Science, 3(10):42–
deep reinforcement learning and edge computing. The 46, 2023.
[10]Tianbo, Song, Hu Weijun, Cai Jiangfeng, Liu Weijia,
method leverages edge computing resources to the fullest Yuan Quan, and He Kun. "Bio-inspired Swarm
Intelligence: a Flocking Project With Group Object
extent by deploying lightweight deep reinforcement learning Recognition." In 2023 3rd International Conference on
models at the network's edge, enabling real-time prediction Consumer Electronics and Computer Engineering
(ICCECE), pp. 834-837. IEEE, 2023.
and control of system states. Additionally, it facilitates
collaboration between the edge and the cloud, ensuring more

View publication stats

You might also like