Deep Reinforcement Learning-Based Adaptive Scheduling
Deep Reinforcement Learning-Based Adaptive Scheduling
Article
Deep Reinforcement Learning-Based Adaptive Scheduling for
Wireless Time-Sensitive Networking
Hanjin Kim 1 , Young-Jin Kim 2 and Won-Tae Kim 1, *
1 Future Convergence Engineering Major, Department of Computer Science and Engineering, Korea University
of Technology and Education, Cheonan-si 31253, Republic of Korea; [email protected]
2 Department of Artificial Intelligence Big Data, Sehan University, Dangjin-si 31746, Republic of Korea;
[email protected]
* Correspondence: [email protected]; Tel.: +82-41-560-1485
Abstract: Time-sensitive networking (TSN) technologies have garnered attention for supporting
time-sensitive communication services, with recent interest extending to the wireless domain. How-
ever, adapting TSN to wireless areas faces challenges due to the competitive channel utilization in
IEEE 802.11, necessitating exclusive channels for low-latency services. Additionally, traditional TSN
scheduling algorithms may cause significant transmission delays due to dynamic wireless character-
istics, which must be addressed. This paper proposes a wireless TSN model of IEEE 802.11 networks
for the exclusive channel access and a novel time-sensitive traffic scheduler, named the wireless
intelligent scheduler (WISE), based on deep reinforcement learning. We designed a deep reinforce-
ment learning (DRL) framework to learn the repetitive transmission patterns of time-sensitive traffic
and address potential latency issues from changing wireless conditions. Within this framework, we
identified the most suitable DRL model, presenting the WISE algorithm with the best performance.
Experimental results indicate that the proposed mechanisms meet up to 99.9% under the various
wireless communication scenarios. In addition, they show that the processing delay is successfully
limited within the specific time requirements and the scalability of TSN streams is guaranteed by the
proposed mechanisms.
standards from the wireless physical layer, including IEEE 802.11 and 5G [12]. Our research
focuses on IEEE 802.11-based WTSN, taking advantage of the cost benefits of unlicensed
bands and the ease of integration using the same protocol stack [13]. This paper primarily
discusses applying the IEEE 802.1Qbv standard’s TAS to wireless communications.
Applying TSN standard technologies to wireless systems based on the IEEE 802.11
standard involves more than just adopting the TAS algorithm. IEEE 802.11 fundamentally
uses a carrier sense multiple access with collision avoidance (CSMA/CA) method for
wireless medium access, where devices on the same channel compete for access based on
random timing. This unpredictability in CSMA/CA makes it difficult to adhere to the
TAS’s transmission gate on/off schedule [14]. Fortunately, the IEEE 802.11 standard also
provides mechanisms under the control of the access point (AP) to manage the access times
of the stations, allowing the timely transmission of TS streams if the WTSN system model
is well-designed to meet TS streams requirements [15,16].
Another challenge in implementing WTSN is the variability of wireless channel con-
ditions depending on the surrounding environment, a core issue addressed in this paper.
IEEE 802.11 devices select the optimal bit rate for data transmission based on changing
channel conditions, which can make the amount of data that can be transmitted per unit of
time inconsistent [17]. Using the fixed traffic scheduling from wired TSN in WTSN could
lead to delays in time-sensitive streams. WTSN requires a new scheduling algorithm that
considers variations in channel conditions.
Rule-based and optimal solution search algorithms, such as integer linear program-
ming (ILP) and satisfiability modulo theories (SMT), used for generating gate control lists
(GCL) in TSN, may not function effectively in the highly variable wireless communication
environment [8,18,19]. Rule-based algorithms rely on static rules, making it difficult to
adapt to changing conditions, and designing rules that consider all the variable factors in
wireless communication is challenging. Optimal solution-based algorithms can consider
environmental changes, but as the number of TS streams increases, the time required to
find the optimal solution can also increase.
In this paper, we address the issues mentioned by employing reinforcement learning.
Reinforcement learning is a method capable of adapting to dynamic environmental changes
and deriving optimal scheduling outcomes [20]. We design a wireless TSN model for
performing time-sensitive communications in IEEE 802.11 networks and propose WISE:
WTSN Intelligent SchEduler, a reinforcement learning-based WTSN scheduler that can
respond to changes in wireless channel conditions.
The main contributions of this paper are as follows:
1. To apply the TAS functionality of TSN to IEEE 802.11-based wireless networks, we
design a WTSN network model. In this model, wireless stations can receive exclusive
periods from the AP to transmit TS streams. We present potential delay issues caused
by changing wireless channel conditions and outline the problem scenarios that the
scheduler needs to address.
2. We propose WISE, a deep reinforcement learning (DRL)-based WTSN scheduler that
adapts to changes in wireless channel conditions and meets the latency requirements
of TS streams. Our DRL framework is designed to learn and adapt to these changing
conditions while also learning repetitive stream patterns to satisfy latency require-
ments in the WTSN model. By comparing variants of WISE, we identify the most
suitable DRL model for solving the given problem.
3. Through a comparative evaluation of non-ML schedulers and WISE variants in terms
of latency and algorithm processing time, we validate the effectiveness of the proposed
WISE and analyze the factors contributing to its performance. Our findings indicate
that WISE is the only scheduler that consistently meets the 99.9% latency satisfaction
requirement in all scenarios. This achievement, within an acceptable processing time
of approximately 95 ms in the evaluated WTSN network environment, is significant
compared to the ILP algorithm’s processing time of 3.3 s.
Sensors 2024, 24, 5281 3 of 21
The remainder of this paper is organized as follows: Section 2 discusses related works.
Section 3 presents the WTSN network model and the issues arising from changes in wireless
channel conditions. In Section 4, we introduce the WISE scheduler for the WTSN network.
To demonstrate the advantages of the proposed method, we compare WISE with other
schedulers in Section 5. Finally, Section 6 provides the conclusions. The main notations
used throughout the paper are summarized in Table 1.
Notation Definition
TS Set of time-sensitive streams (flows)
i-th stream in set TS;
tsi
each ts is defined < ts.D, ts.P, ts.L >
frame generated in the i-th stream;
f i,j
j ∈ [1, TH /tsi .P]
TH Time duration of hyperperiod
TS Time duration of a time slot within the hyperperiod
Th,i The i-th hyperperiod of hyperperiod cycles
Ts,i The i-th time slot in Th
start The start time of the j-th frame transmission in the
ϕi,j,k
i-th stream from node k
end The end time of the j-th frame transmission in the
ϕi,j,k
i-th stream from node k
Time when the j-th frame of i-th stream
τi,j,k
from node k enters the queue
s(t) The state at step t
a(t) The action at step t
r (t) The reward at step t
2. Related Works
2.1. TSN in Wireless Networks
The strengths of TSN technologies in providing time-sensitive communication services
have garnered significant attention from many researchers, and numerous studies are still
ongoing [4,8]. Recently, this trend has led to attempts to apply TSN technology to the field
of wireless communications [11,13,21]. Since the TSN standards is not a single standalone
standard, research on its expansion into wireless is also addressing various aspects such as
architecture and protocol design [12], time synchronization [9], scheduling [22], and frame
replication [23].
In [24], the authors considered the combination of TSN technology with 5G new radio
(NR)’s ultra-reliable low-latency communication for providing mission-critical communi-
cations, with a focus on industrial and factory automation. They designed architectures
and protocols that enable 5G to operate as part of a time-sensitive network, a TSN bridge.
Shrestha, D. [25] proposed an enhanced precision time protocol, extending its application to
industrial wireless sensor networks from its traditional use for time synchronization in TSN
networks. The results of this study confirmed the feasibility of time synchronization over
wireless networks. Fang. J. [23], considering the vulnerability of wireless channels to propa-
gation effects, noise, and interference, it was shown that TSN redundancy techniques could
be extended from Ethernet to WiFi. This addresses the challenge of maintaining low latency
and high reliability in networks faced with unmanaged interference in unlicensed bands.
The TSN standards operates at the link layer, and can use several standard tech-
nologies, including wired Ethernet as well as IEEE 802.11 and 5G, as its base media
technologies [12]. However, integrating the 5G standard technology with TSN networks
may involve complexities, as it requires considering the 5G network as a logical TSN bridge
for connection. On the other hand, IEEE 802.11 is based on the same IEEE 802 standard
family as IEEE 802.3, facilitating smoother extension of TSN functionalities compared to
Sensors 2024, 24, 5281 4 of 21
the 3GPP/5G standards, which are based on a different protocol stack [13]. Aligning with
this perspective, we also consider TSN over IEEE 802.11.
processing time increases with the complexity of the conditions, making it challenging to
respond promptly to environmental changes.
In the WTSN network, WTSN nodes (including access points, APs) facilitate time-
sensitive communications via TS streams. A TS stream, or flow, entails periodic data
transmissions from one sender (talker) to one or more receivers (listeners) [38]. Each TS
stream exhibits different data characteristics with specific delay requirements. These data
characteristics of the TS stream can be defined as a tuple [39]:
tsi = ⟨tsi .D, tsi .P, tsi .e2e⟩, tsi ∈ TS, (1)
where tsi .D, tsi .P, and tsi .e2e represents the data size, the period of the stream, and the
maximum allowed end-to-end latency, respectively.
Considering the connection to wired TSN, a TS stream can traverse the Ethernet link
of the wired TSN [40]. Hence, tsi .e2e includes latency requirements from both wireless and
wired links. Our primary focus in the WTSN is the delay in the wireless area, denoted as
tsi .L. Consequently, the WTSN tuple is redefined, with the latency requirement specified
solely for wireless links:
streams tsi ∈ TS passing through wireless links, the schedule for each stream needs to
be determined.
start end
Ψ(tsi , E) = {(ϕi,j,k ), (ϕi,j,k ) | f i,j,k }, (3)
where E is set of wireless link, ek ∈ E, and f i,j is the j-th frame generated from the i-th
start and ϕend are the start and end times of the j-th frame transmission in the i-th
stream. ϕi,j,k i,j,k
stream from node k.
Unlike switched Ethernet, where TSN is commonly applied and provides indepen-
dent connections between network devices through switch ports, IEEE 802.11 uses the
CSMA/CA mechanism for medium access, with nodes sharing the medium. Therefore,
scheduling in WTSN must consider these differences.
As mentioned in the introduction, the IEEE 802.11 standard offers features such as
point coordination function (PCF), trigger frame, and TWT that provide nodes with exclu-
sive channel access time periods under the management of an AP within the CSMA/CA
mechanism. In a shared medium, nodes can be granted exclusive time durations to transmit
TS frames. Time synchronization between communication nodes becomes crucial since
channel access is timed. Such synchronization can be achieved through beacon signals or
fine-timing measurement protocols [41,42].
In the subsequent network model design process, we will examine how to configure
the channel for scheduling frames generated from TS streams in a shared medium, based
on the features provided by IEEE 802.11.
In the WTSN network, the AP configures an exclusive channel for managing nodes
using the period, ts.P, from the tuples of registered TS streams. As defined earlier, a ts
stream generates TS data periodically every ts.P. Similar to TSN, WTSN uses the periods of
all registered streams to configure and manage an exclusive channel based on a repetitive
channel management cycle called the hyperperiod, using the least common multiple of all
registered stream periods. The hyperperiod is determined by the following equation:
In terms of the hyperperiod, f i,j represents the j-th frame generated from the i-th
stream, where j ∈ [1, TH /tsi .P]. Within the hyperperiod, WTSN nodes are allowed to
transmit ts frames during a specific time duration, TS , within the repeating hyperperiod by
the AP.
In our WTSN design, the channel management resolution based on the hyperperiod is
TS . TS is a parameter set by the WTSN network designer, considering the size of the data
to be sent and the data rate of the channel, DR. The network designer also considers the
overhead caused by the control data required to send TS data. This overhead can vary
depending on the media access technique of the IEEE 802.11 standard and the presence
of ACK.
D
TS > i where Di = Doverhead + tsi .D. (5)
DR
Unlike TSN based on full-duplex Ethernet, nodes in IEEE 802.11-based wireless TSN
cannot always occupy the channel due to the competitive nature of channel access. To
maximize the efficiency of wireless resource utilization and minimize potential delays,
TS data with significantly smaller sizes than TS can be sent sequentially if they share the
same destination.
DI
TS > where D I = Doverhead + ∑ tsi .D, (6)
DR i∈ I
where I represents a set of indices for streams that not only share the same source and
destination, but also have identical periods P.
Based on the designed model, the WTSN AP manages node access in each TH cycle
according to the characteristics of the registered TS streams. The access times of the nodes
are managed based on the scheduling results of the TS frames by the WTSN scheduler for
Sensors 2024, 24, 5281 7 of 21
each Th,i . The AP can support time-sensitive communication by distributing access times
to the nodes or triggering exclusive period usage according to the scheduling results. Th is
a set of Ts , and Ts,i ∈ Th .
We have modified the scheduling problem to find the schedule for all periodic streams
tsi ∈ TS transmitted over the wireless channel so that each stream is assigned to an
appropriate time slot within the Th . Therefore, the problem that the scheduler solves in our
WTSN network model is as follows:
start end
Ψ(tsi , Th ) = {(τi,j,k ), (ϕi,j,k ), (ϕi,j,k ) | f i,j,k }, (7)
where τi,j,k is the time when the TS frame of node k enters the queue. Since it is a shared
medium, this variable accounts for the delay caused by the frame being queued when node
k is not granted a time slot.
Variables
• start : start time of the j-th frame transmission in the i-th stream from node k.
ϕi,j,k
• end : end time of the j-th frame transmission in the i-th stream from node k.
ϕi,j,k
• τi,j,k : time when the j-th frame of the i-th stream from node k enters the queue.
• xs,k : binary variable indicating whether slot s ∈ Th is allocated to node k (1 if allocated,
0 otherwise).
• δi,j,k : binary variable indicating whether the j-th frame of the i-th stream from node k
meets the deadline (1 if met, 0 otherwise).
Objective Function
Constraints
1. Slot allocation constraint: Each slot s ∈ Th can be allocated to at most one node k.
∑ xs,k ≤ 1 ∀s ∈ Th . (9)
k
3. Deadline constraint: Each frame in the stream must complete transmission within
its allowed latency.
end
ϕi,j,k − τi,j,k ≤ tsi .L ∀i, j, k. (11)
4. Periodic transmission constraint: The frames of each stream must be transmitted
periodically according to their period.
5. Transmission time constraint: The end time of a frame transmission must be greater
than or equal to the start time plus the transmission duration.
Sensors 2024, 24, 5281 8 of 21
∑(ϕi,j,k
end start
− ϕi,j,k ) ≤ TS ∀s ∈ Th , k. (14)
i,j
start ) represents the data rate for each link k at the start time of the frame’s
where DRk (ϕi,j,k
transmission, reflecting the dynamic wireless channel conditions.
However, performing adaptive scheduling in WTSN based on channel conditions is
not easy. Firstly, changes in wireless conditions are linked to the individual links of each
node, making it difficult to design rules that consider the number of nodes and the various
MCS indices each node can select. Additionally, simple rule-based algorithms may not be
sufficient to meet the registered delay requirements.
An alternative approach is to add the changes in channel conditions as constraints in
the previously mentioned optimal solution-based algorithm. However, designing these
constraints is also challenging due to the number of nodes and the various MCS indices
they can select. Moreover, ILP inherently has a long processing time, making it difficult
for the scheduler to quickly produce scheduling results in response to changing channel
conditions. The limitations of these scheduling algorithms can be seen in Section 5.
To address these challenges, we aim to explore an approach that uses deep reinforce-
ment learning to learn and adapt to channel changes, producing adaptive scheduling results.
where
Sensors 2024, 24, 5281 9 of 21
• ChannelStatet = [mcs1 , mcs2 , . . . , mcs N ], representing the current MCS index for N
nodes at time t, where each mcsk is the MCS index for the node k.
• FutureAlloct = [(idt , f nt ), (idt+1 , f nt+1 ), . . . , (idt+i−1 , f nt+i−1 )], detailing the alloca-
tions for the next i steps, where each pair (idt+ j , f nt+ j ) consists of the node ID idt+ j and
the number of frames f nt+ j that node is scheduled to transmit in the (t + j)-th step.
• TransHistoryt = [ht−i , ht−i−1 , . . . , ht−1 ], where each ht−k = [ f nt−k,1 , f nt−k,2 , . . . , f nt−k,N ]
records the number of frames transmitted by each of the N nodes at time t − k, reflect-
ing the transmission history over the past i steps.
The channel state information can be obtained from the physical layer, and it is as-
sumed that the AP periodically acquires channel state information through communication
with the nodes [43]. The information captures the current MCS index for the link state of
each node, allowing the agent to adjust its scheduling decisions based on the condition
start ) in Equation (15), taking into account changes in transmission rates. The
of DRk (ϕi,j,k
FutureAlloct represents the scheduled transmission plan, indicating the node IDs and the
number of frames that each node is scheduled to transmit in each slot for the next i steps,
including step t. This information helps the agent learn the TS stream pattern. The third
component of st , TransHistoryt , is the history of frames transmitted in the past, which
the RL agent uses to reference past transmission data and perform actions at step t. The
TransHistoryt has a size of N by i (the number of past steps to be observed).
The agent determines what action to take at step t based on st . The action at is
defined as:
at ∈ {1, 2, . . . , N }. (17)
The agent determines which station to allocate at step t as its action. In designing the
agent, we considered defining the possible actions not only as which node to allocate at
step t, but also how much TS data to send. Alternatively, actions could be defined to derive
a schedule for a set of slots corresponding to a hyperperiod all at once. (This approach
might be more intuitive from the perspective of synchronizing with the hyperperiod cycle
of the WTSN network). However, these configurations can result in an excessively large
action space, leading to issues with the agent’s model convergence [44]. Large action spaces
require additional strategies for learning convergence. For the sake of ease of learning, we
minimized the action space by selecting only the allocation of stations to a single slot as the
action. Configurations based on a large action space remain an open challenge.
The reward received from the environment after performing an action is shown in
Equation (18).
N
r (t) = ∑∑ −w(s j ) · penalty(τj , t, L j ), (18)
i =1 j ∈ Q i
Sensors 2024, 24, 5281 10 of 21
where: (
1 if (t − τj ) ≥ L j ,
penalty(τj , t, L j ) =
0 otherwise.
The components of the equation are defined as follows:
• r (t) represents the reward at step t.
• N is the number of nodes in the system.
• Qi represents the set of frames in the queue of node i.
• s j is the size of frame j.
• τj is the time when frame j entered the queue.
• L j is the maximum allowed latency of frame j.
• w(s j ) is a weight function that adjusts the penalty based on the size of the frame, e.g.,
sj
w(s j ) = 100 .
To clearly define the goal for the agent, we designed the reward function to count
the number of TS frames in all nodes’ queues that exceed the latency requirements as a
negative reward at each step. The reward function is designed to directly correlate with
the objective function in Equation (8), which represents the scheduling problem’s goal in
the ILP model. By minimizing the penalty, the RL agent effectively maximizes the sum of
δi,j,k . The reward function penalizes the agent whenever a frame misses its deadline, thus
encouraging the agent to learn policies that maximize the number of frames meeting their
deadlines. Additionally, we have introduced a weight function to address the issue where
the RL agent more frequently allocates TS streams with smaller TS frame sizes over those
with larger TS frame sizes.
Based on the defined state, action, and reward, the RL agent follows the RL framework
to learn and perform scheduling. We applied proximal policy optimization (PPO) as the
DRL agent. The WISE algorithm based on PPO is outlined in Algorithm 1. The PPO
algorithm takes as input the hyperperiod TH , the number of nodes N, Rreset (which resets
the environment when the cumulative reward drops below a certain threshold), and MCSlist
(which defines the range of transmission rates).
The model is trained over a total number of training steps, with the transmission
rate for each node’s link being randomly selected from the MCS list at every TH . At each
step, the agent takes an action, and the total number of frames in the nodes’ queues that
fail to meet the stream-defined latency requirements results in a negative reward. This
negative reward accumulates in Rcum , and if it falls below the specific threshold Rreset , the
environment is reset to facilitate model convergence. The remainder of the algorithm’s is
identical to the PPO-clip [45].
The computational complexity of the algorithm is calculated by the product of the
number of nodes Nl in each layer and the number of nodes Nl −1 in the previous layer of
a fully connected neural network model. The total complexity is given by the sum of the
computational complexities of all layers,
L
Total Complexity = ∑ Nl × Nl−1 . (19)
l =1
Given the input dimension din , the hidden layer dimension H, and the output dimen-
sion dout , the total computational complexity is expressed as follows:
!
L −1
O H × din + ∑ Hi × Hi−1 + dout × HL−1 . (20)
i =2
In our WTSN network, the agent aims to reschedule every TH cycle as the rate changes.
According to the computational complexity, the processing time for deriving the schedule
is related to the sizes of the input and output dimensions, i.e., |st | and | at |. This implies
that the network architecture, including the number of layers and the dimensions of each
Sensors 2024, 24, 5281 11 of 21
layer, should be carefully selected to ensure efficient scheduling, taking into account the
state and action space parameters.
5. Evaluation
To evaluate the proposed WISE, we conducted a comparison between non-ML al-
gorithms and WISE. For non-ML algorithms, we selected earliest deadline first (EDF),
credit-based scheduler (CBS), and ILP algorithms. The EDF algorithm selects the node with
the shortest deadline for the first frame in each communication node’s queue at every step.
We also included a modified version of EDF called weighted EDF (WEDF), which assigns
higher weights to nodes with larger amounts of queued frames. In WEDF, the weight is
simply calculated by multiplying the deadline by the reciprocal of the total length of frames
in the queue, making nodes with more frames more likely to be selected.
The CBS is adapted from the credit-based shaper [46] commonly used in TSN and
applied to our WTSN network. In this scheduler, at each step, the node with frames to send
and the highest credit (with credit > 0) is selected. The credit increases as the frame waits
to be sent and decreases when the frame is sent, with the rate of increase and decrease set
equal to the data rate of that step. The ILP algorithm was previously mentioned in Section 3.
Although EDF and CBS algorithms cannot instantly know the deadlines of frames or the
Sensors 2024, 24, 5281 12 of 21
credit of communication nodes in real situations, we assumed they could for the purpose
of deriving scheduling results.
Performance evaluation first compared the latency requirement satisfaction rates of
non-ML models and WISE models (Section 5.2). We constructed WISE algorithms based
on PPO, as well as deep Q-Learning (DQN) and advantage actor-critic (A2C), to analyze
which DRL model adopted by WISE performs best under varying channel conditions. This
analysis focused on how the characteristics of each DRL model influence the achievement
of the given objectives.
Based on the DRL model with the highest performance, Section 5.3 compares the
empirical cumulative distribution function (ECDF) of latency for both ST-A and ST-B under
varying channel conditions between non-ML models and WISE. The ECDF helps infer the
probability that a random variable will not exceed a certain value through repeated trials.
This allows us to verify if at least 99.9% of the frames generated in the scenario meet the
delay requirements and observe the impact of scenario changes on the scheduling results
for different stream types through a comparison of their ECDF graphs. At this point, we
compared two versions of WISE, WISE with MCS and WISE without MCS (WISE w/o
MCS), to determine the importance of considering channel conditions in the MDP design.
Finally, the evaluation of algorithm processing time addresses whether the algorithms
can immediately derive scheduling results in response to changing channel conditions
in Section 5.4. Under the assumption that the WTSN network manages the network in
hyperperiod cycles, the ability to produce scheduling results within this time period is
crucial for the evaluation.
Parameter Value
Medium Access Method Point Coordination Function
Channel Bandwidth 20 MHz
Spatial Stream 1
Guard Interval 800 ns
Slot Size TS 1 ms
Hypercycle TH 100 ms
SIFS: 16 µs
Tx Overhead
Poll Frame: 22 byte
For the experiments, we designed evaluation scenarios for three different channel con-
ditions and number of streams. The detailed configurations of the scenarios are provided
in Table 3, and the flowcharts of the scenarios are illustrated in Figure 3. Two nodes register
a type A stream (ST-A), and the other two nodes register a type B stream (ST-B).
Sensors 2024, 24, 5281 13 of 21
Scenario Channel Condition Variation Num of Streams Stream Type A (ST-A) Stream Type B (ST-B)
Constant channel conditions ST-A: 400/400
S1 Data Size: 100 byte Data Size: 1000 byte
MCS6 ST-B: 50/50
Period: 10 ms Period: 100 ms
Sequential changes in channel conditions ST-A: 300/300
S2 Latency: ≤ 3 ms Latency: ≤ 10 ms
MCS4->MCS2->MCS4 per each STA ST-B: 40/40
Overall decline in channel conditions ST-A: 200/200
S3
MCS3->MCS1 ST-B: 30/30
In each scenario, after the WTSN network is initialized, the channel conditions between
each WTSN station and the AP change every TH until the simulation ends. Streams are
registered, and frames begin to be generated at each node according to the scheduling
results derived from the ILP algorithm using the initial channel condition values. Each
stream type has different delay requirements of 3 ms and 10 ms, respectively, and frames
are generated repetitively at defined periods from the moment the first frame is transmitted.
The timing of stream registration and frame generation varies with each simulation. We
performed 10 simulations for each of the three scenarios. Frame delay data for the first TH
cycle of this initial period are excluded.
The first scenario simulates a constant channel condition over 10 s with a fixed MCS
index of 6, serving as an ideal baseline to observe the ECDF delay distribution for each
algorithm without channel variation. The second scenario involves sequential changes in
channel conditions for each node over the same duration, with the MCS index decreasing
from 4 to 2 and then increasing back to 4, simulating environments like factories with
moving objects. In the third scenario, all nodes experience a simultaneous gradual reduction
in channel conditions, with the MCS index dropping from 3 to 1 over 10 s, representing a
deteriorating network environment due to factors like dust.
In developing and implementing WISE, we adopted DQN, A2C, and PPO models
using Gymnasium [48] and Stable-Baselines3 [49]. The corresponding model parameters
are detailed in Table 4. Since the agent uses state information comprising channel status
for four WTSN nodes, future allocation for the last 10 time slots, and transmission history
from the past 10 slots, the input layer consists of 4 + 10 × 2 + 4 × 10 = 64 units. WISE is
Sensors 2024, 24, 5281 14 of 21
also evaluated in a version that does not consider MCS in the state, referred to as WISE
w/o MCS, which has a state size of 60. The output layer assigns one of four stations, thus
it includes four units. We conducted training over 1,000,000 total steps. During training,
if the cumulative reward (negative reward) of a single episode drops below 0.1% of the
total number of frames generated in the scenario (total frames × 0.001), the environment is
reinitialized. For WISE-DQN, which is prone to significant Q value estimation errors, we
used Huber Loss, while WISE-A2C and WISE-PPO employed mean squared error (MSE)
as the loss function. The replay buffer size for WISE-DQN is 1,000,000, and the target
update interval is set to 100, corresponding to the TH . The clip range for PPO is 0.2. The
remaining parameters for each model followed the default settings in Stable Baseline3. To
accommodate learning about changes in channel conditions, the MCS level of each node in
the environment is randomly set between 0 and 9 at each hyperperiod.
WISE-PPO and WISE-A2C, as shown in the results table, effectively learn the traffic
patterns of TS streams, unlike WISE-DQN. PPO and A2C, which collect data on relatively
short transitions in a rollout buffer for batch learning, demonstrate advantageous results in
learning repetitive patterns. Among the compared algorithms, WISE-PPO and WISE-A2C
exhibited the highest performance; however, only WISE-PPO met the 99.9% threshold in all
scenarios, whereas WISE-A2C fell short in scenarios 2 and 3. This comparison highlighted
that the stability provided by PPO’s clipping mechanism led to better scheduling results.
Non-ML WISE
EDF WEDF ILP CBS DQN A2C PPO
Scenario 1 100% 100% 100% 100% 93.4% 99.9% 100%
Scenario 2 99.5% 99.6% 99.4% 99.5% 91.3% 99.6% 99.9%
Scenario 3 96.4% 98.8% 96.4% 92.7% 90.5% 99.4% 99.9%
(a) (b)
Figure 4. ECDF graph of latency in Scenario 1: (a) Stream Type A. (b) Stream Type B.
ILP results show that all frames for ST-A fall within the 0 to 1 delay range. The shape
of the graph represents a cumulative distribution function of an exponential distribution,
decreasing as the delays increase. This is an ideal graph shape, since channel conditions
do not change. The results for ST-B stream, shown in Figure 4b, are similar (due to fewer
streams, all frames show a 100% delay probability distribution at 0.2 or less).
The WISE algorithm, particularly WISE with MCS, which includes channel condition
information in its state, shows a graph shape almost similar to ILP. However, WISE without
MCS sees about 25% of frames experiencing delays at later time points. Both models, being
ML-based, find it challenging to display ideal probability distribution graphs like ILP
because they do not repeat the same scheduling result at each time step cycle, as long as the
stream’s latency requirements are met without incurring a negative reward. Nevertheless,
WISE with MCS, which can account for individual node channel condition changes every
TH , shows a result graph similar to ILP. This similarity can be interpreted as recognizing
from the state information that there is no change in channel conditions. This is also evident
in ST-B’s results, closely following ILP’s graph, indicating that the agent effectively utilizes
channel condition change information.
(a) (b)
Figure 5. ECDF graph of latency in Scenario 2: (a) Stream Type A. (b) Stream Type B.
(a) (b)
Figure 6. ECDF graph of latency in Scenario 3: (a) Stream Type A. (b) Stream Type B.
Similarly, the ST-B graph shows an increase in later delay times compared to Scenarios
1 and 2, highlighting the same underlying causes. ILP, with frame delays extending up
to 5.3 ms, only achieved a distribution up to 96.4% for delays within 3, suggesting that
the algorithm struggles to adapt dynamically to changing channel conditions based on its
initial scheduling decisions.
WISE with MCS and WISE w/o MCS recorded delay frequencies of up to 3 at 99.9%
and 98.6%, respectively. Observing the graph’s curve changes, both algorithms show longer
transmission delays for frames compared to Scenarios 1 and 2. Notably, WISE w/o MCS in
Sensors 2024, 24, 5281 18 of 21
Scenario 3 showed an improvement to 98.6% from 98.4% in Scenario 2 for delays within 3,
suggesting it can somewhat infer channel condition changes based on the number of frames
generated and the transmission history, even without specific channel condition information.
However, to achieve a high performance level of 99.9%, it is evident that explicit
information about channel conditions, as provided in WISE with MCS, is necessary. From
the results in the ST-B graph, it appears that WISE with MCS dynamically adjusted the
frequency of delays, possibly by reallocating transmission opportunities from nodes under
ST-B, which had more lenient delay requirements, to those under ST-A to meet the stricter
delay requirements.
The graph illustrates two views: one with a time unit of seconds to show all three
algorithms, reflecting the high processing time of ILP, and a zoomed-in view with a time
unit of milliseconds for EDF, CBS, and WISE. According to the experimental results, unlike
ILP, whose processing time increases exponentially with the number of streams, WISE,
EDF, and CBS do not show a significant increase in processing time as the number of
streams increases. WISE, EDF, and CBS all complete processing within 100 ms even as
the number of streams increases (approximately 95 ms, 4 ms, and 5 ms, respectively, for
500 streams). Unlike EDF, the processing time for CBS slightly increases as the number
of streams grows. This is attributed to the additional time required to calculate credits as
the number of frames to be sent by the selected nodes increases. All three algorithms can
derive scheduling results within one hyperperiod cycle suggests that they can adapt to
changing channel conditions and support time-sensitive communication in sync with the
Sensors 2024, 24, 5281 19 of 21
exclusive channel management cycle. However, ILP takes approximately 3.3 s (equivalent
to 33 cycles) for 500 streams, making immediate response to channel changes impractical.
6. Conclusions
In this paper, we designed an IEEE 802.11-based WTSN network model to support
time-sensitive communication services in the wireless domain. In this model, communica-
tion nodes in the WTSN model are granted exclusive channel access to transmit registered
TS streams under the control of the AP. We highlighted the delay issues caused by reduced
resources per slot due to changes in the wireless channel environment and proposed WISE,
an intelligent scheduler based on deep reinforcement learning, to maintain time-sensitive
communication services despite these changes.
We first compared three variants of WISE (WISE-PPO, WISE-A2C, WISE-DQN) and
three non-ML models (EDF, rule-based; CBS, rule-based; ILP, optimal solution-based) in
terms of latency requirement satisfaction rates. According to our experiments, WISE-PPO
was the only algorithm that met the latency requirements across all three scenarios. The non-
ML models exhibited significant performance degradation as channel conditions worsened,
with EDF, CBS, and ILP achieving 96.4%, 92.7%, and 96.4% performance, respectively, in
the most challenging scenario. Despite WISE-DQN being provided with state information
capable of learning the environment, it did not perform well because it failed to learn the
repetitive stream patterns of the WTSN. Although WISE-A2C showed high performance,
it could not achieve the 99.9% threshold in inference due to a lack of stability in training
compared to WISE-PPO.
To conduct a more detailed comparative evaluation of the algorithms, we examined
and compared the ECDF graphs of the delays for two types of streams across different
scenarios in the same environment. We analyzed both WISE with MCS, which directly re-
ceives channel state information, and WISE without MCS. The results indicated that WISE’s
superior performance resulted not only from receiving direct channel state information
as part of its state, but also from implicitly learning the relationship between the delay
requirements of ST-A and ST-B.
In the final experiment, we evaluated the processing time of the comparison algo-
rithms as the number of streams increased to determine if they could produce immediate
scheduling results in response to channel changes. The results showed that WISE main-
tained an acceptable processing time of approximately 95 ms even with an increase to
500 streams. In contrast, ILP’s processing time increased exponentially with the number of
streams, taking about 3.3 s.
In conclusion, the proposed WISE demonstrates its potential as an adaptive scheduler
for wireless TSN by meeting delay requirements and maintaining acceptable processing
times across various wireless communication scenarios. We anticipate that our WTSN
model and WISE algorithm will contribute to providing wireless time-sensitive communi-
cation services for time-critical applications in both industrial and consumer sectors.
Author Contributions: Conceptualization, H.K.; methodology, H.K.; software, H.K.; validation, H.K.
and Y.-J.K.; formal analysis, H.K.; investigation, H.K. and Y.-J.K.; resources, H.K.; data curation, H.K.
and Y.-J.K.; writing—original draft preparation, H.K. and Y.-J.K.; writing—review and editing, H.K.,
Y.-J.K. and W.-T.K.; visualization, H.K.; supervision, W.-T.K.; project administration, W.-T.K.; funding
acquisition, W.-T.K. All authors have read and agreed to the published version of the manuscript.
Funding: This work was partly supported by the MSIT (Ministry of Science and ICT), Korea,
under the ITRC (Information Technology Research Center) support program (IITP-2024-2021-0-
01816) supervised by the IITP (Institute for Information & Communications Technology Planning
& Evaluation) and the Star Professor Research Program of Korea University of Technology and
Education in 2024.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Sensors 2024, 24, 5281 20 of 21
References
1. Kang, Y.; Lee, S.; Gwak, S.; Kim, T.; An, D. Time-sensitive networking technologies for industrial automation in wireless
communication systems. Energies 2021, 14, 4497. [CrossRef]
2. Lu, Y.; Zhao, G.; Chakraborty, C.; Xu, C.; Yang, L.; Yu, K. Time sensitive networking-driven deterministic low-latency communica-
tion for real-time telemedicine and e-health services. IEEE Trans. Consum. Electron. 2023, 69, 734–744. [CrossRef]
3. Hazarika, A.; Rahmati, M. Towards an evolved immersive experience: Exploring 5G-and beyond-enabled ultra-low-latency
communications for augmented and virtual reality. Sensors 2023, 23, 3682. [CrossRef] [PubMed]
4. Peng, Y.; Shi, B.; Jiang, T.; Tu, X.; Xu, D.; Hua, K. A survey on in-vehicle time sensitive networking. IEEE Internet Things J. 2023,
10, 14375–14396. [CrossRef]
5. Fedullo, T.; Morato, A.; Tramarin, F.; Rovati, L.; Vitturi, S. A comprehensive review on time sensitive networks with a special
focus on its applicability to industrial smart and distributed measurement systems. Sensors 2022, 22, 1638. [CrossRef] [PubMed]
6. Silvestre-Blanes, J.; Almeida, L.; Marau, R.; Pedreiras, P. Online QoS management for multimedia real-time transmission in
industrial networks. IEEE Trans. Ind. Electron. 2010, 58, 1061–1071. [CrossRef]
7. Farkas, J.; Bello, L.L.; Gunther, C. Time-sensitive networking standards. IEEE Commun. Stand. Mag. 2018, 2, 20–21. [CrossRef]
8. Stüber, T.; Osswald, L.; Lindner, S.; Menth, M. A survey of scheduling algorithms for the time-aware shaper in time-sensitive
networking (TSN). IEEE Access 2023, 11, 61192–61233. [CrossRef]
9. Seijo, Ó.; Val, I.; Luvisotto, M.; Pang, Z. Clock synchronization for wireless time-sensitive networking: A march from microsecond
to nanosecond. IEEE Ind. Electron. Mag. 2021, 16, 35–43. [CrossRef]
10. Sudhakaran, S.; Montgomery, K.; Kashef, M.; Cavalcanti, D.; Candell, R. Wireless time sensitive networking impact on an
industrial collaborative robotic workcell. IEEE Trans. Ind. Inform. 2022, 18, 7351–7360. [CrossRef]
11. Bush, S.F.; Mantelet, G.; Thomsen, B.; Grossman, E. Industrial Wireless Time-Sensitive Networking: RFC on the Path Forward; Avnu
Alliance White Paper; Avnu Alliance: Beaverton, OR, USA, 2018.
12. Atiq, M.K.; Muzaffar, R.; Seijo, Ó.; Val, I.; Bernhard, H.P. When IEEE 802.11 and 5G meet time-sensitive networking. IEEE Open J.
Ind. Electron. Soc. 2021, 3, 14–36. [CrossRef]
13. Cavalcanti, D.; Cordeiro, C.; Smith, M.; Regev, A. WiFi TSN: Enabling deterministic wireless connectivity over 802.11. IEEE
Commun. Stand. Mag. 2022, 6, 22–29. [CrossRef]
14. Haxhibeqiri, J.; Jiao, X.; Aslam, M.; Moerman, I.; Hoebeke, J. Enabling TSN over IEEE 802.11: Low-overhead time synchronization
for Wi-Fi clients. In Proceedings of the 2021 22nd IEEE International Conference on Industrial Technology (ICIT), Virtual,
10–12 March 2021; Volume 1, pp. 1068–1073.
15. Alnazir, A.; Mokhtar, R.A.; Alhumyani, H.; Ali, E.S.; Saeed, R.A.; Abdel-Khalek, S. Quality of services based on intelligent IoT
WLAN MAC protocol dynamic real-time applications in smart cities. Comput. Intell. Neurosci. 2021, 2021, 2287531. [CrossRef]
16. Chen, C.; Chen, X.; Das, D.; Akhmetov, D.; Cordeiro, C. Overview and performance evaluation of Wi-Fi 7. IEEE Commun. Stand.
Mag. 2022, 6, 12–18. [CrossRef]
17. Yin, W.; Hu, P.; Indulska, J.; Portmann, M.; Mao, Y. MAC-layer rate control for 802.11 networks: A survey. Wirel. Netw. 2020,
26, 3793–3830. [CrossRef]
18. Jin, X.; Xia, C.; Guan, N.; Xu, C.; Li, D.; Yin, Y.; Zeng, P. Real-time scheduling of massive data in time sensitive networks with a
limited number of schedule entries. IEEE Access 2020, 8, 6751–6767. [CrossRef]
19. Stüber, T.; Eppler, M.; Osswald, L.; Menth, M. Performance Comparison of Offline Scheduling Algorithms for the Time-Aware
Shaper (TAS). IEEE Trans. Ind. Inform. 2024, 20, 9736–9748. [CrossRef]
20. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal
Process. Mag. 2017, 34, 26–38. [CrossRef]
21. Avila-Campos, P.; Haxhibeqiri, J.; Jiao, X.; Van Herbruggen, B.; Moerman, I.; Hoebeke, J. Unlocking Mobility for Wi-Fi-based
Wireless Time-Sensitive Networks. IEEE Access 2024, 12, 30687–30699. [CrossRef]
22. Jayabal, R.J.; Wong, D.T.C.; Goh, L.K.; Pang, C.M.; Sun, S.; Jin, B.; Ma, Y.; Goh, L.M.; Cheng, W.C. TGT-HC: A time-aware shaper
scheduled hyperchannel protocol for wireless time sensitive networks (TSNs). In Proceedings of the 2021 IEEE 94th Vehicular
Technology Conference (VTC2021-Fall), Virtual, 27 September–28 October 2021; pp. 1–6.
23. Fang, J.; Sudhakaran, S.; Cavalcanti, D.; Cordeiro, C.; Chen, C. Wireless TSN with multi-radio wi-fi. In Proceedings of the 2021
IEEE Conference on Standards for Communications and Networking (CSCN), Virtual, 15–17 December 2021; pp. 105–110.
24. Khoshnevisan, M.; Joseph, V.; Gupta, P.; Meshkati, F.; Prakash, R.; Tinnakornsrisuphap, P. 5G industrial networks with CoMP for
URLLC and time sensitive network architecture. IEEE J. Sel. Areas Commun. 2019, 37, 947–959. [CrossRef]
25. Shrestha, D.; Pang, Z.; Dzung, D. Precise clock synchronization in high performance wireless communication for time sensitive
networking. IEEE Access 2018, 6, 8944–8953. [CrossRef]
26. Avallone, S.; Imputato, P.; Magrin, D. Controlled channel access for IEEE 802.11-based wireless tsn networks. IEEE Internet Things
Mag. 2023, 6, 90–95. [CrossRef]
Sensors 2024, 24, 5281 21 of 21
27. Peón, P.G.; Karachatzis, P.; Steiner, W.; Uhlemann, E. Time-Sensitive Networking’s Scheduled Traffic Implementation on IEEE
802.11 COTS Devices. In Proceedings of the 2023 IEEE 29th International Conference on Embedded and Real-Time Computing
Systems and Applications (RTCSA), Niigata, Japan, 30 August–1 September 2023; pp. 167–175.
28. Schneider, B.; Sofia, R.C.; Kovatsch, M. A proposal for time-aware scheduling in wireless industrial iot environments. In
Proceedings of the NOMS 2022—2022 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary,
25–29 April 2022; pp. 1–6.
29. Li, Z.; Yang, J.; Guo, C.; Xiao, J.; Tao, T.; Li, C. A Joint Scheduling Scheme for WiFi Access TSN. Sensors 2024, 24, 2554. [CrossRef]
[PubMed]
30. Tan, Q.; He, J.; Gao, Y. Deep Reinforcement Learning based OFDMA Scheduling for WiFi Networks with Coexisting Latency-
Sensitive and High-Throughput Services. In Proceedings of the 2024 5th Information Communication Technologies Conference
(ICTC), Nanjing, China, 10–12 May 2024; pp. 146–150. [CrossRef]
31. Adame, T.; Carrascosa-Zamacois, M.; Bellalta, B. Time-sensitive networking in IEEE 802.11 be: On the way to low-latency WiFi 7.
Sensors 2021, 21, 4954. [CrossRef] [PubMed]
32. Chen, Q.; Zhu, Y.H. Scheduling channel access based on target wake time mechanism in 802.11 ax WLANs. IEEE Trans. Wirel.
Commun. 2020, 20, 1529–1543. [CrossRef]
33. Sangdeh, P.K.; Zeng, H. DeepMux: Deep-learning-based channel sounding and resource allocation for IEEE 802.11 ax. IEEE J. Sel.
Areas Commun. 2021, 39, 2333–2346. [CrossRef]
34. Han, M.; Sun, X.; Zhan, W.; Gao, Y.; Jiang, Y. Multi-Agent Reinforcement Learning based Uplink OFDMA for IEEE 802.11ax
Networks. IEEE Trans. Wirel. Commun. 2024, 23, 8868–8882. [CrossRef]
35. Liu, X.; Li, X.; Zheng, K.; Liu, J. AoI minimization of ambient backscatter-assisted EH-CRN with cooperative spectrum sensing.
Comput. Netw. 2024, 245, 110389. : 10.1016/j.comnet.2024.110389 [CrossRef]
36. Kong, W.; Nabi, M.; Goossens, K. Run-time recovery and failure analysis of time-triggered traffic in time sensitive networks.
IEEE Access 2021, 9, 91710–91722. [CrossRef]
37. Akram, B.O.; Noordin, N.K.; Hashim, F.; Rasid, M.F.A.; Salman, M.I.; Abdulghani, A.M. Joint Scheduling and Routing
Optimization for Deterministic Hybrid Traffic in Time-Sensitive Networks using Constraint Programming. IEEE Access 2023, 11,
142764–142779. [CrossRef]
38. Bello, L.L.; Steiner, W. A perspective on IEEE time-sensitive networking for industrial communication and automation systems.
Proc. IEEE 2019, 107, 1094–1120. [CrossRef]
39. Craciunas, S.S.; Oliver, R.S.; Ag, T. An overview of scheduling mechanisms for time-sensitive networks. In Proceedings of the
Real-Time Summer School LÉcole dÉté Temps Réel (ETR), Paris, France, 28 August–1 September 2017; pp. 1551–3203.
40. Cavalcanti, D.; Bush, S.; Illouz, M.; Kronauer, G.; Regev, A.; Venkatesan, G. Wireless TSN–Definitions, Use Cases & Standards
Roadmap; Avnu Alliance: Beaverton, OR, USA, 2020; pp. 1–16.
41. Mahmood, A.; Exel, R.; Trsek, H.; Sauter, T. Clock synchronization over IEEE 802.11—A survey of methodologies and protocols.
IEEE Trans. Ind. Inform. 2016, 13, 907–922. [CrossRef]
42. Vales, V.B.; Fernández, O.C.; Domínguez-Bolaño, T.; Escudero, C.J.; García-Naya, J.A. Fine time measurement for the Internet of
Things: A practical approach using ESP32. IEEE Internet Things J. 2022, 9, 18305–18318. [CrossRef]
43. Gringoli, F.; Schulz, M.; Link, J.; Hollick, M. Free your CSI: A channel state information extraction platform for modern Wi-
Fi chipsets. In Proceedings of the 13th International Workshop on Wireless Network Testbeds, Experimental Evaluation &
Characterization, Los Cabos, Mexico, 25 October 2019; pp. 21–28.
44. Dulac-Arnold, G.; Evans, R.; van Hasselt, H.; Sunehag, P.; Lillicrap, T.; Hunt, J.; Mann, T.; Weber, T.; Degris, T.; Coppin, B. Deep
Reinforcement Learning in Large Discrete Action Spaces. arXiv 2016, arXiv:1512.07679
45. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017,
arXiv:1707.06347.
46. Maile, L.; Hielscher, K.S.J.; German, R. Delay-Guaranteeing Admission Control for Time-Sensitive Networking Using the
Credit-Based Shaper. IEEE Open J. Commun. Soc. 2022, 3, 1834–1852. [CrossRef]
47. IEEE 802.11ac-2013; IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems
Local and Metropolitan Area Networks–Specific Requirements Part 11: Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specifications–Amendment 4: Enhancements for Very High Throughput for Operation in Bands below
6 GHz. IEEE Standard: Piscataway, NJ, USA, 2013. [CrossRef]
48. Towers, M.; Terry, J.K.; Kwiatkowski, A.; Balis, J.U.; Cola, G.d.; Deleu, T.; Goulão, M.; Kallinteris, A.; KG, A.; Krimmel, M.; et al.
Gymnasium. OpenAI Gym. 2023, arXiv:1606.01540. https://doi.org/10.5281/zenodo.8127026.
49. Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-baselines3: Reliable reinforcement learning
implementations. J. Mach. Learn. Res. 2021, 22, 1–8.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.