The Fusion of Deep Reinforcement Learning and Edge Computing For Real-Time Monitoring and Control Optimization in Iot Environments
The Fusion of Deep Reinforcement Learning and Edge Computing For Real-Time Monitoring and Control Optimization in Iot Environments
1
Northern Arizona University,1900 S Knoles Dr, Flagstaff, Arizona,USA
2
University of Electronic Science and Technology of China,ChengDu,China
3
Trine university,Phoenix, Arizona, USA
4
Yantai University,Tokyo,Japan
5
Northwestern University,Atlanta, Georgia,USA
Abstract: In response to the demand for real-time Keywords: Deep reinforcement learning; Edge computing;
performance and control quality in industrial Internet of Industrial Internet of Things; Lightweight policy network;
Things (IoT) environments, this paper proposes an Dynamic resource allocation
optimization control system based on deep reinforcement
I. INTRODUCTION
learning and edge computing. The system leverages cloud-edge
collaboration, deploys lightweight policy networks at the edge, With the rapid development of the industrial Internet of
predicts system states, and outputs controls at a high Things, there is a growing demand for real-time monitoring
frequency, enabling monitoring and optimization of industrial and control of systems. However, relying on cloud
objectives. Additionally, a dynamic resource allocation computing centers for computation and decision-making
mechanism is designed to ensure rational scheduling of edge often fails to meet the constraints of real-time
computing resources, achieving global optimization. Results responsiveness [1]. In this regard, this study proposes a
demonstrate that this approach reduces cloud-edge novel industrial system control architecture that actively
communication latency, accelerates response to abnormal senses the environment and makes rapid decisions through
situations, reduces system failure rates, extends average the organic combination of deep reinforcement learning and
equipment operating time, and saves costs for manual edge computing [2]. This approach deploys lightweight
maintenance and replacement. This ensures real-time and policy networks at the network edge to predict and control
stable control. local states at a high frequency. Simultaneously, multiple
edge nodes collaborate with the cloud center, enhancing
control real-time performance at the edge while the cloud Reinforcement Learning module is responsible for learning
center tracks strategies and performs global optimization. and optimizing system control strategies, primarily divided
This paper provides a detailed construction of the system's into modeling unit, policy network, and decision unit [9].
overall architecture, functional modules, and designs a Among them, the modeling unit constructs an environment
lightweight policy network structure and dynamic resource model to predict system states; the policy network represents
allocation mechanism for edge computing. Experimental and approximates control strategies using neural networks,
validation demonstrates the effectiveness of this approach, and the decision unit provides control decisions based on the
significantly reducing control latency and improving control network outputs. The Edge Computing module primarily
quality and cost-effectiveness compared to architectures offers local data storage, preprocessing, caching, and other
relying solely on the cloud center. functions to assist in the training of the Deep Reinforcement
Learning module. Additionally, it includes a task distributor
II. OPTIMIZATION CONTROL SYSTEM BASED ON DEEP
that dynamically allocates edge computing tasks and a data
REINFORCEMENT LEARNING AND EDGE COMPUTING
collector that aggregates data from edge nodes.
A. Overall System Architecture
This system employs a layered architecture comprising a
sensor acquisition layer, edge computing layer, and cloud
computing layer[7]. The sensor layer collects environmental
and system data like temperature and machine status. This
data is sent to the edge computing layer, where edge servers
perform real-time analysis and local decision-making. Deep
reinforcement learning models in this layer predict and
control system behavior, creating a digital twin to forecast
and optimize operations. The cloud computing layer
Fig. 2. System Functional Modules
oversees the entire system, providing powerful computing
and storage to refine control strategies and system logic. The III. KEY TECHNOLOGIES AND ALGORITHMS
architecture is service-oriented and modular, with
A. Lightweight Deep Reinforcement Learning for Edge
components like data acquisition, storage, deep learning,
control, and scheduling modules connected via a message To accommodate the limited computational and storage
bus. This design enhances flexibility and scalability, resources of edge computing nodes, the system employs a
allowing the system to automate and intelligently control customized, lightweight deep reinforcement learning
operations, adapt to various scenarios, and efficiently algorithm. This algorithm uses simpler network structures,
manage complex tasks[8]. such as a three-layer perceptron, instead of complex deep
convolutional neural networks, reducing the number of
parameters and the model's space footprint. The experience
replay buffer size is also limited to around 5000 transition
samples to manage capacity. During training, the batch size
is set to 16, matching the parallel computing capabilities of
edge servers.The lightweight model contains about 1 million
parameters and occupies less than 400MB, making it
suitable for deployment on less powerful edge computing
Fig. 1. Overall System Architecture nodes. It can deliver near real-time control strategy outputs,
even on inexpensive hardware. In practical applications like
B. System Functional Modules
monitoring machine tools in smart manufacturing, the model
The functional modules of this system mainly consist of can re-plan and issue control strategy instructions every 5
the Deep Reinforcement Learning module and the Edge seconds. This optimizes the machine's dynamic performance
Computing module, as shown in Figure 2. The Deep and extends its lifespan.Compared to traditional cloud-based
models, which may have an average delay of over 1 minute IV. EXPERIMENT AND RESULTS ANALYSIS
in computing and issuing control commands, this system
A. Experimental Environment and Dataset
significantly reduces control loop latency, demonstrating
The experimental environment for this research is based
more efficient and responsive control in real-time scenarios.
on the TensorFlow framework and utilizes an NVIDIA Tesla
B. Dynamic Collaborative Distributed Optimization V100 GPU for constructing deep neural networks, training,
Algorithm and testing. To thoroughly validate the effectiveness of the
To achieve global optimal control through cloud-edge proposed method, the experiments use an open-source IoT
collaboration, this system designs a dynamic optimization simulation environment, IoTSim, as the dataset. IoTSim
allocation distributed algorithm. This algorithm is includes data from sensors, edge layer resource
coordinated by the cloud-side master node, which can configurations, and network parameters. The dataset
request or cancel edge server's computational resources encompasses readings from various heterogeneous sensors,
on-demand and maintain a list of available resources while such as temperature, humidity, voltage, etc., spanning a
monitoring the load on each edge server. Based on the month with a sampling frequency of one sample per minute.
current system state and control requirements, it runs an Considering the real-time control requirements of the system,
environmental monitoring program that dynamically selects the research sets up an environment where the system
a group of edge servers with the most optimal combined reports its state and outputs control commands every 5
indicators, such as bandwidth and computing capacity. It seconds, and the dataset is resampled accordingly.
also allocates critical control modules with higher
B. Model Hyperparameter Settings
computational intensity to servers with stronger computing
In deep reinforcement learning, the choice of
power. By aggregating intermediate states and control
hyperparameters plays a crucial role in the model's
results from various edge nodes, it collaboratively optimizes
performance and convergence speed. Hyperparameters are
and obtains a global control strategy. The optimization
those parameters that need to be manually set when training
algorithm can be expressed using the following formula:
deep reinforcement learning models, and they can influence
Maximize ∑&"$% ∑# the training process and the model's final performance.
!$% A(c! , r" ) (1)
These hyperparameters are listed in Table 1.
Where ri represents the resources of the ith server, and cj
represents the jth control module. TABLE I. MODEL HYPERPARAMETERS
A larger experience
∑)'$% 𝑥'( ≤ 1 ∀𝑗 (2)
pool provides more
Experience Pool Size 5000
Each control module can be assigned to at most one training data, aiding in
resource. model convergence.
Adjustment of the
Server resource and load constraints:
learning rate can
𝑙' + ∑*
'$% 𝑥'( ∙ 𝐿𝑜𝑎𝑑(𝑐( ) ≤ 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦(𝑟' ) ∀𝑖 ( 3 ) Learning Rate 0.01 affect the model's
convergence speed
Where xij is a binary decision variable, indicating
and stability.
whether control module cj is assigned to resource ri (1 if
A smaller γ value
assigned, 0 otherwise). Load(cj) represents the
focuses more on
computational density of control module cj, and Capacity(ri)
short-term rewards,
is the maximum capacity of resource ri. Discount Factor (γ) 0.95
while a larger γ value
emphasizes long-term
rewards.
The choice of update cloud-edge collaborative deep reinforcement learning
frequency (τ) can framework greatly improved real-time control effectiveness
Target Network Update
100 impact the model's and quality. Future enhancements aim to increase the
Frequency (τ)
stability and learning adaptability of edge agents.
speed.